Versions Compared

    Key

    • This line was added.
    • This line was removed.
    • Formatting was changed.

    Synopsis

    As a job moves from submission to execution to completion, it goes through a variety of states - and at any given moment every job is in exactly one of several possible states. Various commands issued either from the command line or through a Qube! GUI instruct the Supervisor to generate an event that changes the state of the job; this is called a transition. The description of all possible states and their transitions is called a state machine.

    Initial Job States

    The key to understand how to effectively use Qube! to manage jobs is to see how different commands change the state of a job. Qube! jobs can be submitted in one of two initial states: pending and blocked. Pending is the default and signals the The starting state of a job can be specified by the user or developer through the job structure in the API, or through the command line.

    StateMeaning
    pendingDefault state for submitted jobs. Signals to the Supervisor that the job may be started at any time.

    ...

    Jobs which have been suspended will also be marked as pending
    blockedAlternate state for submitted jobs. Tells the system to hold the job until it is unblocked by something, usually another job that this one depends on.

    ...

    Intermediate Job States

    Normally, the submission of a job will place it in the pending state. The Supervisor will take over from there, and without any other intervention, will place the job in the running state when it executes. A running job can be killed or suspended by the user who submitted it, or by other users with the appropriate permissions. A job which is killed can never be run again unless it is retried. A job can also be interrupted, which requests the Supervisor to force a job off a host, immediately killing it. The job is then placed back in the queue in the pending state, to be executed on another qualified host.

    Final Job States

    A job that completes successfully is marked as done, and a job that completes unsuccessfully is marked as failed.

    Example

    % qbsub --state blocked ls

     

    runningJob that is doing work, with no failures.
    failingJob that has not finished, but has at least one frame or instance that has failed.
    retryingJobs that have retry counts greater than zero, and have been retried (automatically) at least once, are marked as retrying.
    killedJob that has been killed by a user. Killed jobs must be manually retried or resubmitted.
    completeJob is no longer running, and all frames have succeeded.
    failedJob is no longer running, and at least one frame or instance has failed.

    Actions

    States can be changed due to various actions taken by users or the Supervisor.

    ActionMeaning
    blockTypically done by users, but auto-wrangling will also block instances and jobs.
    interruptKill the current frame and put the job into a pending state, where it can be picked up and rerun.
    killEnd the current frame and don't restart the job. A user must retry or resubmit this job.
    resubmitBring up the submission UI and possibly modify the job's parameters before sending it back to the Supervisor.
    retryPut the job back onto the queue as-is, without modifying any of the submission parameters.
    suspendLike "interrupt" except that it allows the current frame to finish first.