hyrax.config_schemas#
Typed configuration schemas for Hyrax.
This package will house Pydantic models that describe and validate Hyrax configuration files. For now it exposes the base schema stub to allow incremental adoption in downstream modules and tests.
Submodules#
Classes#
Base class for future Hyrax configuration schemas. |
|
Per-dataset configuration used within |
|
Typed representation of the full |
Package Contents#
- class BaseConfigModel(/, **data: Any)[source]#
Bases:
pydantic.BaseModelBase class for future Hyrax configuration schemas.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DataRequestConfig(/, **data: Any)[source]#
Bases:
hyrax.config_schemas.base.BaseConfigModelPer-dataset configuration used within
data_request.Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- dataset_class: str = None#
- data_location: str = None#
- fields: list[str] | None = None#
- primary_id_field: str | None = None#
- join_field: str | None = None#
- dataset_config: dict | None = None#
- augment: bool | list[str] | None = None#
- classmethod resolve_data_location(v: str) str[source]#
Fully resolve the data_location path, expanding user home directories and converting relative paths to absolute paths.
- join_field_excludes_primary() DataRequestConfig[source]#
Ensure that join_field and primary_id_field are mutually exclusive.
- validate_augment_list() DataRequestConfig[source]#
Validate the list form of augment against fields and primary_id_field.
- class DataRequestDefinition[source]#
Bases:
pydantic.RootModel[dict[str,DatasetGroupValue]]Typed representation of the full
data_requesttable.Accepts any number of arbitrarily-named dataset groups (e.g.
train,validate,infer,test,finetune, …). Each group value is adictof friendly-namedDataRequestConfiginstances. A friendly name must always be provided explicitly — the schema will raise a validation error if a dataset source is specified without one.Example (Python):
{ "train": { "my_dataset": { "dataset_class": "HyraxRandomDataset", "data_location": "/path/to/data", "primary_id_field": "object_id", } } }
Example (TOML):
[data_request.train.my_dataset] dataset_class = "HyraxRandomDataset" data_location = "/path/to/data" primary_id_field = "object_id"
- classmethod normalize_all_groups(value: Any) dict[str, DatasetGroupValue][source]#
Parse every top-level key into the expected group format.
- reject_augment_on_infer() DataRequestDefinition[source]#
Augmentation cannot be enabled on the ‘infer’ data group.
- require_at_least_one_dataset() DataRequestDefinition[source]#
Ensure at least one dataset group is provided.
- validate_primary_id_fields() DataRequestDefinition[source]#
Validate that exactly one DataRequestConfig in each dataset group has a non-None primary_id_field.
This ensures that when multiple datasets are requested (e.g., a group contains a dict of multiple DataRequestConfig instances), exactly one of them specifies which field to use as the primary identifier.
- validate_cross_group(groups: set[str]) None[source]#
No-op: cross-group split validation is now handled by splitting_utils.validate_split_config.
- __getitem__(key: str) DatasetGroupValue[source]#
Return the dataset group value for the given group name.