hyrax.config_schemas.data_request#

Pydantic models describing the structure of the data_request configuration.

These schemas validate and enforce the structure of dataset requests used throughout the Hyrax framework.

Attributes#

Classes#

DataRequestConfig

Per-dataset configuration used within data_request.

DataRequestDefinition

Typed representation of the full data_request table.

Functions#

_normalize_dataset_group(→ DatasetGroupValue)

Normalize a single dataset group value into a dict[str, DataRequestConfig].

_iter_all_configs(→ list[tuple[str, DataRequestConfig]])

Yield (group_name, config) pairs across all groups.

Module Contents#

class DataRequestConfig(/, **data: Any)[source]#

Bases: hyrax.config_schemas.base.BaseConfigModel

Per-dataset configuration used within data_request.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

dataset_class: str = None[source]#
data_location: str = None[source]#
fields: list[str] | None = None[source]#
primary_id_field: str | None = None[source]#
join_field: str | None = None[source]#
dataset_config: dict | None = None[source]#
augment: bool | list[str] | None = None[source]#
classmethod resolve_data_location(v: str) str[source]#

Fully resolve the data_location path, expanding user home directories and converting relative paths to absolute paths.

join_field_excludes_primary() DataRequestConfig[source]#

Ensure that join_field and primary_id_field are mutually exclusive.

validate_augment_list() DataRequestConfig[source]#

Validate the list form of augment against fields and primary_id_field.

as_dict(*, exclude_unset: bool = False) dict[str, Any][source]#

Return the configuration as a plain dictionary.

DatasetGroupValue[source]#
_normalize_dataset_group(value: Any) DatasetGroupValue[source]#

Normalize a single dataset group value into a dict[str, DataRequestConfig].

Every dataset source within a group must be identified by a user-supplied friendly name. The friendly name is the key in the returned dict and is used by DataProvider to reference the dataset at runtime.

Accepted inputs#

  • A dict whose values are DataRequestConfig instances or plain dicts that can be validated as one. The keys become the friendly names.

Rejected inputs (raise ValueError)#

  • A flat dict that contains dataset_class at the top level (no friendly name wrapper).

  • A bare DataRequestConfig instance (no friendly name wrapper).

_iter_all_configs(groups: dict[str, DatasetGroupValue]) list[tuple[str, DataRequestConfig]][source]#

Yield (group_name, config) pairs across all groups.

class DataRequestDefinition[source]#

Bases: pydantic.RootModel[dict[str, DatasetGroupValue]]

Typed representation of the full data_request table.

Accepts any number of arbitrarily-named dataset groups (e.g. train, validate, infer, test, finetune, …). Each group value is a dict of friendly-named DataRequestConfig instances. A friendly name must always be provided explicitly — the schema will raise a validation error if a dataset source is specified without one.

Example (Python):

{
    "train": {
        "my_dataset": {
            "dataset_class": "HyraxRandomDataset",
            "data_location": "/path/to/data",
            "primary_id_field": "object_id",
        }
    }
}

Example (TOML):

[data_request.train.my_dataset]
dataset_class = "HyraxRandomDataset"
data_location = "/path/to/data"
primary_id_field = "object_id"
classmethod normalize_all_groups(value: Any) dict[str, DatasetGroupValue][source]#

Parse every top-level key into the expected group format.

reject_augment_on_infer() DataRequestDefinition[source]#

Augmentation cannot be enabled on the ‘infer’ data group.

require_at_least_one_dataset() DataRequestDefinition[source]#

Ensure at least one dataset group is provided.

validate_primary_id_fields() DataRequestDefinition[source]#

Validate that exactly one DataRequestConfig in each dataset group has a non-None primary_id_field.

This ensures that when multiple datasets are requested (e.g., a group contains a dict of multiple DataRequestConfig instances), exactly one of them specifies which field to use as the primary identifier.

validate_cross_group(groups: set[str]) None[source]#

No-op: cross-group split validation is now handled by splitting_utils.validate_split_config.

__contains__(key: str) bool[source]#

Return True if the group name is present in the definition.

__getitem__(key: str) DatasetGroupValue[source]#

Return the dataset group value for the given group name.

as_dict(*, exclude_unset: bool = False) dict[str, Any][source]#

Export as a nested dictionary compatible with existing configs.

Each group value is a dict of {friendly_name: flat_config_dict}. No implicit "data" wrapper is added — the friendly names supplied by the user are preserved verbatim.