Versions Compared

    Key

    • This line was added.
    • This line was removed.
    • Formatting was changed.

    Since there are so many variables that an affect when and whether a particular job is dispatched to a worker, it might help if some of the decisions involved in this process were described in a bit of detail.

    There are 3 different times that selection occurs:

    1. when When a job slot becomes available
    2. when When a job is submitted, retried, unblocked, or shoved (supervisor Supervisor is explicitly instructed to re-evaluate the job)
    3. when When a global resource becomes available

    When a slot on a worker becomes available:

    Selection criteria:
      • pending Pending state: jobs with pending instances are selected
      • job Job requirements: these pending instances are filtered by requirements; can the worker Worker meet them?
      • job Job reservations: can the worker Worker honour the reservations for the resources specified in the reservations?

    ...

    The default sorting criteria are:
      • job Job cluster: jobs whose cluster matches the worker's are moved to the top of the list
      • job Job priority: priority is used as the tie-breaker when cluster is the same
      • job Job ID: since the jobID is based on submission time, the tie-breaker when cluster and priority are the same is essentially "first come, first served".

    When a job is submitted, retried, unblocked, or shoved:

    The job's cluster and priority are compared to all running job instances to see if there are any running jobs that this new job can

    ...

    pre-empt. Some jobs have a flag set that indicates that they can never be

    ...

    pre-empted, these are filtered out of the list.

    If there are preemptable pre-emptable jobs, the workers that these jobs are running on are checked to see if they can satisfy the job's requirements and reservations.

    if the requirements and reservations can be met by a particular workerWorker, then the job instance that is running on that worker Worker is marked for preemptionpre-emption, and depending on the supervisor's preemption pre-emption policy (passive or aggressive), the job instance is preempted pre-empted as follows:

    • passive preemptionpre-emption: when the job instance finishes the frame it's currently working on
    • aggressive premptionpre-emption: the job instance is killed immediately, and both the instance and the frame get put back into the queue in a pending state

    There's a couple of major things that happen before preemptionpre-emption:

    1. the The list of hosts are filtered down, just like the list of jobs are filtered above, using criteria such as the job's requirements and the queuing algorithm's host-job pair match/reject routines.
    2. then Then the supervisor Supervisor tries to find open/idle slots for the job in consideration, from the filtered down list of hosts, and dispatches instances if slots are found.
    3. if If there are still pending instances remaining for the job, preemption  pre-emption occurs.