Concatenating ARL Files#

Use arlmet.concat() to join several ARL files into one, and arlmet.concat_by_time() to do it in batch across a directory. The common reason is HYSPLIT’s input limit: a simulation accepts at most 12 meteorological files when a single grid is used (Compilation Limits). Combining many short files (e.g. 6-hourly) into fewer long ones (e.g. daily) keeps a long run under that cap.

ARL files are flat streams of fixed-size records, so joining them is a byte-level append — the same result as cat a.arl b.arl > out.arl — with no repacking. Every record, including DIF* records and checksums, is preserved exactly.

Joining a list of files#

Pass the input paths and an output path. By default the inputs are ordered by their earliest valid time, so the output is chronological regardless of the order you list them in.

import arlmet

arlmet.concat(
    ["20240101_00_hrrr", "20240101_06_hrrr", "20240101_12_hrrr"],
    "20240101_hrrr",
)

concat() returns the new file opened in read mode, so you can chain straight into analysis. Use it as a context manager (or call .close()) when you keep the return value; ignore it if you only need the file on disk.

with arlmet.concat(paths, "20240101_hrrr") as combined:
    print(combined.times)

Pass sort=False to join the files in exactly the order given, like cat.

What is validated#

Before writing, the inputs are scanned and joined only if they form one coherent record stream. concat() raises ValueError when:

the inputs disagree on grid or vertical axis (different grids produce different record lengths, which would corrupt the stream)
the same valid time appears in more than one input (arl-met cannot read a file with duplicate times, and HYSPLIT behaviour on repeats is undefined)
a source file is empty, or the output path is also one of the inputs

Batch concatenation by time#

arlmet.concat_by_time() groups every ARL file in a directory into time-binned chunks and concatenates each group. Each file is assigned to a bin by its first valid time, read from the file’s index record — not parsed from the filename — so it is robust to any naming scheme.

import arlmet

arlmet.concat_by_time(
    "hrrr/",                       # directory to scan
    "daily/",                      # output directory (created if missing)
    freq="1D",                     # one output file per day
    pattern="*_hrrr",              # which files to read
    template="{time:%Y%m%d}_hrrr",  # how to name each output
)

freq is a fixed-frequency pandas offset alias giving the size of each output chunk: "1D" is one file per day, "6h" one per six hours, and so on. Each file is binned by its first valid time floored to this frequency, so freq should be at least as long as any single input file’s time span.

template is a str.format string for the output filenames, given the bin start time as time (a pandas.Timestamp). concat_by_time() returns the list of written paths, in time order.

Limit the range with time_range to skip files whose first valid time falls outside an inclusive (start, end) window:

arlmet.concat_by_time(
    "hrrr/",
    "daily/",
    freq="1D",
    pattern="*_hrrr",
    template="{time:%Y%m%d}_hrrr",
    time_range=("2024-01-01", "2024-01-31 23:00"),
)

Limitations#

Concatenated files must share one grid and one vertical axis.
Valid times must not repeat across the inputs of a single output file.
concat_by_time bins each file by its first valid time, so freq should be no shorter than a single input file’s span (e.g. use freq="1D" for 6-hourly inputs, not freq="1h").
pattern should match only ARL files; a matched file that cannot be read as ARL raises ValueError.