Job and Host Assignment

Since there are many variables that an affect when and whether a particular job is dispatched to a worker, it might help if some of the decisions involved in this process were described in a bit of detail.

There are 3 different times that host selection occurs

...

When a job slot becomes available on a worker
When a job is submitted, retried, unblocked, or shoved (Supervisor is explicitly instructed to re-evaluate the job)
When a global resource becomes available

When a slot on a worker becomes available

...

Selection criteria:

- Pending state: jobs with pending instances are selected
- Job requirements: these pending instances are filtered by requirements; can the Worker meet them?
- Job reservations: can the Worker honour honor the reservations for the resources specified in the reservations?

...

- Job cluster: jobs whose cluster matches the worker's are moved to the top of the list
- Job priority: priority is used as the tie-breaker when cluster is the same
- Job ID: since the jobID is based on submission time, the tie-breaker when cluster and priority are the same is essentially "first come, first served".

When a job is submitted, retried, unblocked, or shoved

...

The job's cluster and priority are compared to all running job instances to see if there are any running jobs that this new job can pre-emptpreempt. Some jobs have a flag set that indicates that they can never be pre-emptedpreempted, these are filtered out of the list.
If there are pre-emptable preemptable jobs, the workers that these jobs are running on are checked to see if they can satisfy the job's requirements and reservations.
if the requirements and reservations can be met by a particular Worker, then the job instance that is running on that Worker is marked for pre-emptionpreemption, and depending on the supervisor's pre-emption preemption policy (passive or aggressive), the job instance is pre-empted as follows:
passive pre-emptionpreemption: when the job instance finishes the frame it's currently working on
aggressive pre-emptionpreemption: the job instance is killed immediately, and both the instance and the frame get put back into the queue in a pending state
There are a couple of major things that happen before pre-emptionpreemption:
The list of hosts are filtered down, just like the list of jobs are filtered above, using criteria such as the job's requirements and the queuing algorithm's host-job pair match/reject routines.
Then the Supervisor tries to find open/idle slots for the job in consideration, from the filtered down list of hosts, and dispatches instances if slots are found.
If there are still pending instances remaining for the job, pre-emption occurs. preemption occurs.

When a Global Resource Becomes Available

When a global resource becomes available, an instance from a job pending for that specific resource must be considered for dispatch. In a nutshell, here’s the sequence of events:

Get a list of all “ready" jobs (“pending”, “running” with pending instances, or active jobs with the “expand” flag set)
Filter down the list to only jobs that have reserved the specific global resource
Loop thru these jobs to find a suitable job to start
Start the job

Versions Compared

Old Version 3

New Version 4

Key

There are 3 different times that host selection occurs

When a job slot becomes available on a worker
When a job is submitted, retried, unblocked, or shoved (Supervisor is explicitly instructed to re-evaluate the job)
When a global resource becomes available

When a slot on a worker becomes available

Selection criteria:

When a job is submitted, retried, unblocked, or shoved

When a Global Resource Becomes Available

Qube Job and Host Assignment Job and Host Assignment

Versions Compared

Old Version 3

New Version 4

Key

There are 3 different times that host selection occurs

When a job slot becomes available on a workerWhen a job is submitted, retried, unblocked, or shoved (Supervisor is explicitly instructed to re-evaluate the job)When a global resource becomes available

When a slot on a worker becomes available

Selection criteria:

When a job is submitted, retried, unblocked, or shoved

When a Global Resource Becomes Available

Qube
Job and Host Assignment

Job and Host Assignment

When a job slot becomes available on a worker
When a job is submitted, retried, unblocked, or shoved (Supervisor is explicitly instructed to re-evaluate the job)
When a global resource becomes available