Concatenating ARL Files ======================= Use :func:`arlmet.concat` to join several ARL files into one, and :func:`arlmet.concat_by_time` to do it in batch across a directory. The common reason is HYSPLIT's input limit: a simulation accepts at most 12 meteorological files when a single grid is used (`Compilation Limits `_). Combining many short files (e.g. 6-hourly) into fewer long ones (e.g. daily) keeps a long run under that cap. ARL files are flat streams of fixed-size records, so joining them is a byte-level append — the same result as ``cat a.arl b.arl > out.arl`` — with no repacking. Every record, including ``DIF*`` records and checksums, is preserved exactly. Joining a list of files ----------------------- Pass the input paths and an output path. By default the inputs are ordered by their earliest valid time, so the output is chronological regardless of the order you list them in. .. code-block:: python import arlmet arlmet.concat( ["20240101_00_hrrr", "20240101_06_hrrr", "20240101_12_hrrr"], "20240101_hrrr", ) ``concat()`` returns the new file opened in read mode, so you can chain straight into analysis. Use it as a context manager (or call ``.close()``) when you keep the return value; ignore it if you only need the file on disk. .. code-block:: python with arlmet.concat(paths, "20240101_hrrr") as combined: print(combined.times) Pass ``sort=False`` to join the files in exactly the order given, like ``cat``. What is validated ----------------- Before writing, the inputs are scanned and joined only if they form one coherent record stream. ``concat()`` raises ``ValueError`` when: - the inputs disagree on grid or vertical axis (different grids produce different record lengths, which would corrupt the stream) - the same valid time appears in more than one input (arl-met cannot read a file with duplicate times, and HYSPLIT behaviour on repeats is undefined) - a source file is empty, or the output path is also one of the inputs Batch concatenation by time --------------------------- :func:`arlmet.concat_by_time` groups every ARL file in a directory into time-binned chunks and concatenates each group. Each file is assigned to a bin by its **first valid time, read from the file's index record** — not parsed from the filename — so it is robust to any naming scheme. .. code-block:: python import arlmet arlmet.concat_by_time( "hrrr/", # directory to scan "daily/", # output directory (created if missing) freq="1D", # one output file per day pattern="*_hrrr", # which files to read template="{time:%Y%m%d}_hrrr", # how to name each output ) ``freq`` is a fixed-frequency pandas offset alias giving the size of each output chunk: ``"1D"`` is one file per day, ``"6h"`` one per six hours, and so on. Each file is binned by its first valid time floored to this frequency, so ``freq`` should be at least as long as any single input file's time span. ``template`` is a ``str.format`` string for the output filenames, given the bin start time as ``time`` (a :class:`pandas.Timestamp`). ``concat_by_time()`` returns the list of written paths, in time order. Limit the range with ``time_range`` to skip files whose first valid time falls outside an inclusive ``(start, end)`` window: .. code-block:: python arlmet.concat_by_time( "hrrr/", "daily/", freq="1D", pattern="*_hrrr", template="{time:%Y%m%d}_hrrr", time_range=("2024-01-01", "2024-01-31 23:00"), ) Limitations ----------- - Concatenated files must share one grid and one vertical axis. - Valid times must not repeat across the inputs of a single output file. - ``concat_by_time`` bins each file by its first valid time, so ``freq`` should be no shorter than a single input file's span (e.g. use ``freq="1D"`` for 6-hourly inputs, not ``freq="1h"``). - ``pattern`` should match only ARL files; a matched file that cannot be read as ARL raises ``ValueError``.