hyrax.config_schemas.data_request#
Pydantic models describing the structure of the data_request configuration.
These schemas validate and enforce the structure of dataset requests used throughout the Hyrax framework.
Attributes#
Classes#
Per-dataset configuration used within |
|
Typed representation of the full |
Functions#
|
Normalize a single dataset group value into a |
|
Yield |
Module Contents#
- class DataRequestConfig(/, **data: Any)[source]#
Bases:
hyrax.config_schemas.base.BaseConfigModelPer-dataset configuration used within
data_request.Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- classmethod resolve_data_location(v: str) str[source]#
Fully resolve the data_location path, expanding user home directories and converting relative paths to absolute paths.
- join_field_excludes_primary() DataRequestConfig[source]#
Ensure that join_field and primary_id_field are mutually exclusive.
- validate_augment_list() DataRequestConfig[source]#
Validate the list form of augment against fields and primary_id_field.
- _normalize_dataset_group(value: Any) DatasetGroupValue[source]#
Normalize a single dataset group value into a
dict[str, DataRequestConfig].Every dataset source within a group must be identified by a user-supplied friendly name. The friendly name is the key in the returned dict and is used by
DataProviderto reference the dataset at runtime.Accepted inputs#
A
dictwhose values areDataRequestConfiginstances or plain dicts that can be validated as one. The keys become the friendly names.
Rejected inputs (raise
ValueError)#A flat dict that contains
dataset_classat the top level (no friendly name wrapper).A bare
DataRequestConfiginstance (no friendly name wrapper).
- _iter_all_configs(groups: dict[str, DatasetGroupValue]) list[tuple[str, DataRequestConfig]][source]#
Yield
(group_name, config)pairs across all groups.
- class DataRequestDefinition[source]#
Bases:
pydantic.RootModel[dict[str,DatasetGroupValue]]Typed representation of the full
data_requesttable.Accepts any number of arbitrarily-named dataset groups (e.g.
train,validate,infer,test,finetune, …). Each group value is adictof friendly-namedDataRequestConfiginstances. A friendly name must always be provided explicitly — the schema will raise a validation error if a dataset source is specified without one.Example (Python):
{ "train": { "my_dataset": { "dataset_class": "HyraxRandomDataset", "data_location": "/path/to/data", "primary_id_field": "object_id", } } }
Example (TOML):
[data_request.train.my_dataset] dataset_class = "HyraxRandomDataset" data_location = "/path/to/data" primary_id_field = "object_id"
- classmethod normalize_all_groups(value: Any) dict[str, DatasetGroupValue][source]#
Parse every top-level key into the expected group format.
- reject_augment_on_infer() DataRequestDefinition[source]#
Augmentation cannot be enabled on the ‘infer’ data group.
- require_at_least_one_dataset() DataRequestDefinition[source]#
Ensure at least one dataset group is provided.
- validate_primary_id_fields() DataRequestDefinition[source]#
Validate that exactly one DataRequestConfig in each dataset group has a non-None primary_id_field.
This ensures that when multiple datasets are requested (e.g., a group contains a dict of multiple DataRequestConfig instances), exactly one of them specifies which field to use as the primary identifier.
- validate_cross_group(groups: set[str]) None[source]#
No-op: cross-group split validation is now handled by splitting_utils.validate_split_config.
- __getitem__(key: str) DatasetGroupValue[source]#
Return the dataset group value for the given group name.