The ``Hyrax`` Configuration System
==================================

``hyrax`` makes extensive use of the config variables to manage the runtime environment of training and inference runs. There is a ``hyrax_default_config.toml``  file (full contents listed :ref:`here<complete_default_config>`), included with ``hyrax``, that contains every variable that ``hyrax`` could need to operate. To create a custom configuration file, simply create a ``.toml`` file and change variables as you see fit, or if you’re running with a custom dataset or model, add your own variables.
For a practical walkthrough that starts from a minimal override file, see :doc:`configuration`.

Config variables are inherited from a hierarchy of sources, similar to ``python`` classes. First, ``hyrax`` will read the variables set in the default configuration. Next, it will load the relevant default config of any custom ``hyrax`` packages that the user is utilizing (see :doc:`/external_library_package` for how to set up package-level defaults). It determines what packages to include by checking what custom classes are loaded in initially and looking for the relevant default configs. If a package doesn’t have a default, ``hyrax`` will throw a warning. Finally, it will use whatever variables have been declared in the user defined config toml (see the :doc:`config basics notebook </notebooks/config_basics>` for how to load those through a notebook or script). Config variables at each step can overwrite config variables from previous steps which leads to the following priority:
- Variables from a user defined config toml are used
- Default configs from custom ``hyrax`` packages are used for those variables which the user has not defined
- The base default config is used for those variables which the user has not defined and don't exist in any packages

.. figure:: _static/hyrax_config_system.png
   :width: 100%
   :alt: The inheritance hierarchy of the hyrax configuration system.

``hyrax`` will pass along all the configuration variables to the relevant models and dataset classes and allows them to configure the runtime through one system. This allows for extensibility and cross-compatibility within the broader “hyrax ecosystem”. From the point of view of the code, these configuration variables should be static. This makes it easier for researchers to develop code separate from the runtime environment.

A core design principle of ``hyrax`` is "code by config", meaning that all runtime parameters should be set through configuration files rather than hard-coded values. This approach enhances flexibility, reproducibility, and ease of experimentation, as users can modify configurations without altering the underlying codebase. This also facilitates sharing and collaboration, as configurations can be easily shared and adapted for different use cases while keeping fundamental models and datasets consistent.

Typed configuration schemas
---------------------------

Hyrax uses Pydantic internally to validate the ``[data_request]`` configuration table,
which describes datasets for training, validation, and inference (see the
:doc:`data requests notebook </notebooks/data_requests>` for a hands-on walkthrough).
This validation helps catch configuration errors early by ensuring required fields
like ``primary_id_field`` are present and properly structured. The exact field-level
expectations for datasets are documented in :doc:`dataset_class_reference`.

The validation happens automatically when you load a TOML configuration or use
``set_config()``. If there are validation errors, Hyrax will log a warning but continue
to use the configuration as-is for backward compatibility.

Backward compatibility for the legacy ``[model_inputs]`` table name is maintained at
the configuration loading layer.

After training is completed, ``hyrax`` will write out all of the variables (combined from all the various source configs) used at runtime in the runtime directory as a ``runtime_config.toml`` file, so that the user can see what variables were actually used in one place.
