hyrax.config_schemas

hyrax.config_schemas#

Typed configuration schemas for Hyrax.

This package will house Pydantic models that describe and validate Hyrax configuration files. For now it exposes the base schema stub to allow incremental adoption in downstream modules and tests.

Submodules#

Classes#

`BaseConfigModel`	Base class for future Hyrax configuration schemas.
`DataRequestConfig`	Per-dataset configuration used within `data_request`.
`DataRequestDefinition`	Typed representation of the full `data_request` table.

Package Contents#

class BaseConfigModel(/, **data: Any)[source]#

Bases: pydantic.BaseModel

Base class for future Hyrax configuration schemas.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DataRequestConfig(/, **data: Any)[source]#

Bases: hyrax.config_schemas.base.BaseConfigModel

Per-dataset configuration used within data_request.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

dataset_class: str = None#

data_location: str = None#

fields: list[str] | None = None#

primary_id_field: str | None = None#

join_field: str | None = None#

dataset_config: dict | None = None#

augment: bool | list[str] | None = None#

classmethod resolve_data_location(v: str) → str[source]#: Fully resolve the data_location path, expanding user home directories and converting relative paths to absolute paths.

join_field_excludes_primary() → DataRequestConfig[source]#: Ensure that join_field and primary_id_field are mutually exclusive.

validate_augment_list() → DataRequestConfig[source]#: Validate the list form of augment against fields and primary_id_field.

as_dict(*, exclude_unset: bool = False) → dict[str, Any][source]#: Return the configuration as a plain dictionary.

class DataRequestDefinition[source]#

Bases: pydantic.RootModel[dict[str, DatasetGroupValue]]

Typed representation of the full data_request table.

Accepts any number of arbitrarily-named dataset groups (e.g. train, validate, infer, test, finetune, …). Each group value is a dict of friendly-named DataRequestConfig instances. A friendly name must always be provided explicitly — the schema will raise a validation error if a dataset source is specified without one.

Example (Python):

{
    "train": {
        "my_dataset": {
            "dataset_class": "HyraxRandomDataset",
            "data_location": "/path/to/data",
            "primary_id_field": "object_id",
        }
    }
}

Example (TOML):

[data_request.train.my_dataset]
dataset_class = "HyraxRandomDataset"
data_location = "/path/to/data"
primary_id_field = "object_id"

classmethod normalize_all_groups(value: Any) → dict[str, DatasetGroupValue][source]#: Parse every top-level key into the expected group format.

reject_augment_on_infer() → DataRequestDefinition[source]#: Augmentation cannot be enabled on the ‘infer’ data group.

require_at_least_one_dataset() → DataRequestDefinition[source]#: Ensure at least one dataset group is provided.

validate_primary_id_fields() → DataRequestDefinition[source]#

Validate that exactly one DataRequestConfig in each dataset group has a non-None primary_id_field.

This ensures that when multiple datasets are requested (e.g., a group contains a dict of multiple DataRequestConfig instances), exactly one of them specifies which field to use as the primary identifier.

validate_cross_group(groups: set[str]) → None[source]#: No-op: cross-group split validation is now handled by splitting_utils.validate_split_config.

__contains__(key: str) → bool[source]#: Return True if the group name is present in the definition.

__getitem__(key: str) → DatasetGroupValue[source]#: Return the dataset group value for the given group name.

as_dict(*, exclude_unset: bool = False) → dict[str, Any][source]#

Export as a nested dictionary compatible with existing configs.

Each group value is a dict of {friendly_name: flat_config_dict}. No implicit "data" wrapper is added — the friendly names supplied by the user are preserved verbatim.