Slurm#

The slurm backend is the HPC path for large receptor sets on shared filesystems. It uses push dispatch: the coordinator writes immutable chunk files and submits a Slurm array job whose tasks each call stilt push-worker.

How it works#

Running stilt run with backend: slurm:

  1. Registers pending simulations in the output index.

  2. Writes immutable chunk files under <output>/chunks/<batch_id>/.

  3. Renders a submission script under <project>/slurm/.

  4. Submits a Slurm array job — one task per chunk — via sbatch.

Workers run stilt push-worker independently; no inter-task communication is required after submission.

Configuration#

Minimal config:

execution:
  backend: slurm
  n_workers: 200
  partition: mypartition
  account: myaccount
  time: "00:20:00"

Full example with common knobs:

execution:
  backend: slurm
  n_workers: 200
  partition: mypartition
  account: myaccount
  time: "00:20:00"
  mem: 2G
  cpus-per-task: 2
  array_parallelism: 50

Key options#

n_workers

Number of chunk shards to create, and therefore the maximum array-task count. Each worker processes its chunk sequentially; tune this alongside array_parallelism to control cluster load.

cpus-per-task

Passed through both to Slurm (#SBATCH --cpus-per-task) and to stilt push-worker --cpus so that each task uses a matching local process pool for within-chunk parallelism.

array_parallelism

Limits simultaneously active array tasks via the %N Slurm syntax (e.g. --array=0-199%50). Useful for staying within fair-share limits.

Any additional keys in the execution block are forwarded to sbatch as --key=value flags, with underscores converted to dashes.

Submitting from the CLI#

Fire-and-forget (common for production runs):

stilt run /path/to/project

The CLI prints the submitted job ID and returns after sbatch accepts the array.

Block until the array finishes (useful for debugging or scripted workflows):

stilt run /path/to/project --wait

Monitoring and reruns#

Use the Slurm scheduler and the output index together:

squeue -u "$USER"
stilt status /path/to/project

Rerunning the same stilt run is safe — completed simulations are skipped by default. Use --no-skip only when you want to force a full rerun.

Current constraint#

The Slurm backend currently requires both the project root and the output output root to be local or shared filesystem paths. Cloud URIs (s3://, gs://, etc.) are not supported for this backend.