Since there are many variables that an affect when and whether a particular job is dispatched to a worker, it might help if some of the decisions involved in this process were described in a bit of detail.
There are 3 different times that host selection occurs
- When a job slot becomes available on a worker
- When a job is submitted, retried, unblocked, or shoved (Supervisor is explicitly instructed to re-evaluate the job)
- When a global resource becomes available
When a slot on a worker becomes available
- Pending state: jobs with pending instances are selected
- Job requirements: these pending instances are filtered by requirements; can the Worker meet them?
- Job reservations: can the Worker honour honor the reservations for the resources specified in the reservations?
- Job cluster: jobs whose cluster matches the worker's are moved to the top of the list
- Job priority: priority is used as the tie-breaker when cluster is the same
- Job ID: since the jobID is based on submission time, the tie-breaker when cluster and priority are the same is essentially "first come, first served".
When a job is submitted, retried, unblocked, or shoved
The job's cluster and priority are compared to all running job instances to see if there are any running jobs that this new job can pre-emptpreempt. Some jobs have a flag set that indicates that they can never be pre-emptedpreempted, these are filtered out of the list.
If there are pre-emptable preemptable jobs, the workers that these jobs are running on are checked to see if they can satisfy the job's requirements and reservations.
if the requirements and reservations can be met by a particular Worker, then the job instance that is running on that Worker is marked for pre-emptionpreemption, and depending on the supervisor's pre-emption preemption policy (passive or aggressive), the job instance is pre-empted as follows:
- passive pre-emptionpreemption: when the job instance finishes the frame it's currently working on
- aggressive pre-emptionpreemption: the job instance is killed immediately, and both the instance and the frame get put back into the queue in a pending state
There are a couple of major things that happen before pre-emptionpreemption:
- The list of hosts are filtered down, just like the list of jobs are filtered above, using criteria such as the job's requirements and the queuing algorithm's host-job pair match/reject routines.
- Then the Supervisor tries to find open/idle slots for the job in consideration, from the filtered down list of hosts, and dispatches instances if slots are found.
- If there are still pending instances remaining for the job, pre-emption occurs. preemption occurs.
When a Global Resource Becomes Available
When a global resource becomes available, an instance from a job pending for that specific resource must be considered for dispatch. In a nutshell, here’s the sequence of events:
- Get a list of all “ready" jobs (“pending”, “running” with pending instances, or active jobs with the “expand” flag set)
- Filter down the list to only jobs that have reserved the specific global resource
- Loop thru these jobs to find a suitable job to start
- Start the job