Job commands
What Are Job Commands?
Job commands implement a per-job control queue between the WMS and the pilot (JobAgent).
They allow the WMS to send control instructions to running jobs via the heartbeat mechanism.
Currently used for:
- Remote job termination (
Killcommand)
The mechanism is generic and could support additional commands in the future (e.g. debug actions, core dump requests, etc.).
Behaviour Summary
Command Creation
When a job transitions to a terminal state (KILLED or DELETED),
set_job_statuses() enqueues a Kill command in the JobDB.
Command Delivery
Commands are delivered during:
Flow:
- Heartbeat updates job state.
get_job_commands()retrieves pending commands.- Commands are marked as sent.
- Commands are returned to the pilot.
This guarantees one-shot delivery semantics:
- A command is delivered exactly once.
- It is not re-delivered on subsequent heartbeats.
Sequence Diagram (Kill Command Lifecycle)
sequenceDiagram
autonumber
participant User
participant Router
participant JobDB
participant WatchDog
User->>Router: PATCH /api/jobs/status (Killed)
Router->>JobDB: update status → KILLED
Router->>JobDB: enqueue "Kill"
Router-->>User: 200 OK
WatchDog->>Router: PATCH /api/jobs/heartbeat
Router->>JobDB: fetch + mark sent
Router-->>WatchDog: [Kill]
WatchDog->>Router: PATCH /api/jobs/heartbeat
Router-->>WatchDog: []
Activity View
flowchart TD
A[Status change → KILLED] --> B[Enqueue Kill command]
B --> C[Stored in JobDB]
D[Heartbeat] --> E[Fetch commands]
E --> F[Mark as sent]
F --> G[Return to pilot]
G --> D