Execution#

PYSTILT supports three execution backends. All three share the same output project model — the same config, the same output layout, and the same CLI commands. The backend controls only how work gets dispatched to workers.

Backends#

Dispatch models#

Push dispatch (local, slurm): The coordinator enumerates pending simulation IDs and sends work directly to workers — either inline in the current process or by writing chunk files for a Slurm array.
Pull dispatch (kubernetes): Workers independently claim pending simulations from a shared output index backend. The coordinator registers work and returns; pods drain the queue autonomously.

Choosing a backend#

local: Default. Best for notebooks, workstation runs, and small receptor sets. Runs inline with n_workers: 1 or uses a local process pool. No infrastructure required.
slurm: Best for large receptor sets on HPC clusters with shared filesystems. Writes immutable chunk files and submits a Slurm array job whose tasks each call stilt push-worker. Project and output roots must be local or shared-filesystem paths.
kubernetes: For cloud-native or container-scale deployments backed by a PostgreSQL index and object-store outputs. Requires more infrastructure than the other two backends.

Note

The Kubernetes backend is not yet fully implemented. See Kubernetes for the current status.

CLI primitives#

These commands surface the executor model regardless of backend:

stilt run: Register pending simulations and launch workers using the configured executor. For local, blocks until done. For slurm, submits the array and returns (fire-and-forget); use --wait to block.
stilt register: Publish project inputs and register simulations without launching any workers. Useful for separating the planning step from execution.
stilt push-worker: Execute one immutable chunk of simulation IDs without queue polling or heartbeats. Used by Slurm task array elements.
stilt pull-worker: Claim and execute pending simulations from the output index. Used by Kubernetes pods and long-lived local workers.
stilt serve: Like pull-worker --follow: keeps polling indefinitely for new claimable work. Use for always-on queue consumers.

Simulation state and delivery guarantees#

These semantics apply across all backends.

Area	Current behavior
Delivery guarantee	At-least-once processing. A simulation can be retried after interruption or failure.
Trajectory status	`pending → running → complete` or `failed`.
Footprint status	`complete`, `complete-empty`, or `failed` per footprint name.
Empty footprint	Treated as terminal success (`complete-empty`), not failure. No NetCDF file is written or expected for empty footprints.
Reruns	`skip_existing=True` avoids rework for already complete outputs. `skip_existing=False` forces a full rerun regardless of prior state.