hyrax.config_schemas#
Typed configuration schemas for Hyrax.
This package will house Pydantic models that describe and validate Hyrax configuration files. For now it exposes the base schema stub to allow incremental adoption in downstream modules and tests.
Submodules#
Classes#
Base class for future Hyrax configuration schemas. |
|
Per-dataset configuration used within |
|
Typed representation of the full |
Package Contents#
- class BaseConfigModel(/, **data: Any)[source]#
Bases:
pydantic.BaseModelBase class for future Hyrax configuration schemas.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DataRequestConfig(/, **data: Any)[source]#
Bases:
hyrax.config_schemas.base.BaseConfigModelPer-dataset configuration used within
data_request.Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- dataset_class: str = None#
- data_location: str = None#
- fields: list[str] | None = None#
- primary_id_field: str | None = None#
- split_fraction: float | None = None#
- join_field: str | None = None#
- dataset_config: dict | None = None#
- classmethod resolve_data_location(v: str) str[source]#
Fully resolve the data_location path, expanding user home directories and converting relative paths to absolute paths.
- require_primary_id_for_split_fraction() DataRequestConfig[source]#
Ensure that split_fraction is only set when primary_id_field is also provided.
- join_field_excludes_primary() DataRequestConfig[source]#
Ensure that join_field and primary_id_field are mutually exclusive.
- class DataRequestDefinition[source]#
Bases:
pydantic.RootModel[dict[str,DatasetGroupValue]]Typed representation of the full
data_requesttable.Accepts any number of arbitrarily-named dataset groups (e.g.
train,validate,infer,test,finetune, …). Each group value is adictof friendly-namedDataRequestConfiginstances. A friendly name must always be provided explicitly — the schema will raise a validation error if a dataset source is specified without one.Example (Python):
{ "train": { "my_dataset": { "dataset_class": "HyraxRandomDataset", "data_location": "/path/to/data", "primary_id_field": "object_id", } } }
Example (TOML):
[data_request.train.my_dataset] dataset_class = "HyraxRandomDataset" data_location = "/path/to/data" primary_id_field = "object_id"
- classmethod normalize_all_groups(value: Any) dict[str, DatasetGroupValue][source]#
Parse every top-level key into the expected group format.
- require_at_least_one_dataset() DataRequestDefinition[source]#
Ensure at least one dataset group is provided.
- validate_primary_id_fields() DataRequestDefinition[source]#
Validate that exactly one DataRequestConfig in each dataset group has a non-None primary_id_field.
This ensures that when multiple datasets are requested (e.g., a group contains a dict of multiple DataRequestConfig instances), exactly one of them specifies which field to use as the primary identifier.
- validate_cross_group(groups: set[str]) None[source]#
Run cross-group split_fraction checks restricted to the specified groups.
This method is intended to be called by verb classes at instantiation time, scoped to only the dataset groups the verb actually uses (via
REQUIRED_DATA_GROUPSandOPTIONAL_DATA_GROUPS). By restricting validation to active groups, configs that contain groups irrelevant to the current verb do not cause false validation failures.- Parameters:
groups (set[str]) – Set of active group names to validate. Only configs belonging to these groups are considered.
- Raises:
ValueError – If split_fraction values for a given
data_locationsum to more than 1.0, or if split_fraction consistency is violated (some configs for a location set it while others do not).
- __getitem__(key: str) DatasetGroupValue[source]#
Return the dataset group value for the given group name.