The Hyrax Configuration System

The Hyrax Configuration System#

hyrax makes extensive use of the config variables to manage the runtime environment of training and inference runs. There is a hyrax_default_config.toml file (full contents listed here), included with hyrax, that contains every variable that hyrax could need to operate. To create a custom configuration file, simply create a .toml file and change variables as you see fit, or if you’re running with a custom dataset or model, add your own variables. For a practical walkthrough that starts from a minimal override file, see Configuration.

Config variables are inherited from a hierarchy of sources, similar to python classes. First, hyrax will read the variables set in the default configuration. Next, it will load the relevant default config of any custom hyrax packages that the user is utilizing (see External package setup for how to set up package-level defaults). It determines what packages to include by checking what custom classes are loaded in initially and looking for the relevant default configs. If a package doesn’t have a default, hyrax will throw a warning. Finally, it will use whatever variables have been declared in the user defined config toml (see the config basics notebook for how to load those through a notebook or script). Config variables at each step can overwrite config variables from previous steps which leads to the following priority: - Variables from a user defined config toml are used - Default configs from custom hyrax packages are used for those variables which the user has not defined - The base default config is used for those variables which the user has not defined and don’t exist in any packages

The inheritance hierarchy of the hyrax configuration system.

hyrax will pass along all the configuration variables to the relevant models and dataset classes and allows them to configure the runtime through one system. This allows for extensibility and cross-compatibility within the broader “hyrax ecosystem”. From the point of view of the code, these configuration variables should be static. This makes it easier for researchers to develop code separate from the runtime environment.

A core design principle of hyrax is “code by config”, meaning that all runtime parameters should be set through configuration files rather than hard-coded values. This approach enhances flexibility, reproducibility, and ease of experimentation, as users can modify configurations without altering the underlying codebase. This also facilitates sharing and collaboration, as configurations can be easily shared and adapted for different use cases while keeping fundamental models and datasets consistent.

Typed configuration schemas#

Hyrax uses Pydantic internally to validate the [data_request] configuration table, which describes datasets for training, validation, and inference (see the data requests notebook for a hands-on walkthrough). This validation helps catch configuration errors early by ensuring required fields like primary_id_field are present and properly structured. The exact field-level expectations for datasets are documented in Dataset class reference.

The validation happens automatically when you load a TOML configuration or use set_config(). If there are validation errors, Hyrax will log a warning but continue to use the configuration as-is for backward compatibility.

Backward compatibility for the legacy [model_inputs] table name is maintained at the configuration loading layer.

After training is completed, hyrax will write out all of the variables (combined from all the various source configs) used at runtime in the runtime directory as a runtime_config.toml file, so that the user can see what variables were actually used in one place.