hyrax.config_schemas#

Typed configuration schemas for Hyrax.

This package will house Pydantic models that describe and validate Hyrax configuration files. For now it exposes the base schema stub to allow incremental adoption in downstream modules and tests.

Submodules#

Classes#

BaseConfigModel

Base class for future Hyrax configuration schemas.

DataRequestConfig

Per-dataset configuration used within data_request.

DataRequestDefinition

Typed representation of the full data_request table.

Package Contents#

class BaseConfigModel(/, **data: Any)[source]#

Bases: pydantic.BaseModel

Base class for future Hyrax configuration schemas.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DataRequestConfig(/, **data: Any)[source]#

Bases: hyrax.config_schemas.base.BaseConfigModel

Per-dataset configuration used within data_request.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

dataset_class: str = None#
data_location: str = None#
fields: list[str] | None = None#
primary_id_field: str | None = None#
split_fraction: float | None = None#
join_field: str | None = None#
dataset_config: dict | None = None#
classmethod resolve_data_location(v: str) str[source]#

Fully resolve the data_location path, expanding user home directories and converting relative paths to absolute paths.

require_primary_id_for_split_fraction() DataRequestConfig[source]#

Ensure that split_fraction is only set when primary_id_field is also provided.

join_field_excludes_primary() DataRequestConfig[source]#

Ensure that join_field and primary_id_field are mutually exclusive.

as_dict(*, exclude_unset: bool = False) dict[str, Any][source]#

Return the configuration as a plain dictionary.

class DataRequestDefinition[source]#

Bases: pydantic.RootModel[dict[str, DatasetGroupValue]]

Typed representation of the full data_request table.

Accepts any number of arbitrarily-named dataset groups (e.g. train, validate, infer, test, finetune, …). Each group value is a dict of friendly-named DataRequestConfig instances. A friendly name must always be provided explicitly — the schema will raise a validation error if a dataset source is specified without one.

Example (Python):

{
    "train": {
        "my_dataset": {
            "dataset_class": "HyraxRandomDataset",
            "data_location": "/path/to/data",
            "primary_id_field": "object_id",
        }
    }
}

Example (TOML):

[data_request.train.my_dataset]
dataset_class = "HyraxRandomDataset"
data_location = "/path/to/data"
primary_id_field = "object_id"
classmethod normalize_all_groups(value: Any) dict[str, DatasetGroupValue][source]#

Parse every top-level key into the expected group format.

require_at_least_one_dataset() DataRequestDefinition[source]#

Ensure at least one dataset group is provided.

validate_primary_id_fields() DataRequestDefinition[source]#

Validate that exactly one DataRequestConfig in each dataset group has a non-None primary_id_field.

This ensures that when multiple datasets are requested (e.g., a group contains a dict of multiple DataRequestConfig instances), exactly one of them specifies which field to use as the primary identifier.

validate_cross_group(groups: set[str]) None[source]#

Run cross-group split_fraction checks restricted to the specified groups.

This method is intended to be called by verb classes at instantiation time, scoped to only the dataset groups the verb actually uses (via REQUIRED_DATA_GROUPS and OPTIONAL_DATA_GROUPS). By restricting validation to active groups, configs that contain groups irrelevant to the current verb do not cause false validation failures.

Parameters:

groups (set[str]) – Set of active group names to validate. Only configs belonging to these groups are considered.

Raises:

ValueError – If split_fraction values for a given data_location sum to more than 1.0, or if split_fraction consistency is violated (some configs for a location set it while others do not).

__contains__(key: str) bool[source]#

Return True if the group name is present in the definition.

__getitem__(key: str) DatasetGroupValue[source]#

Return the dataset group value for the given group name.

as_dict(*, exclude_unset: bool = False) dict[str, Any][source]#

Export as a nested dictionary compatible with existing configs.

Each group value is a dict of {friendly_name: flat_config_dict}. No implicit "data" wrapper is added — the friendly names supplied by the user are preserved verbatim.