hyrax.config_schemas
====================

.. py:module:: hyrax.config_schemas

.. autoapi-nested-parse::

   Typed configuration schemas for Hyrax.

   This package will house Pydantic models that describe and validate Hyrax
   configuration files.  For now it exposes the base schema stub to allow
   incremental adoption in downstream modules and tests.


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/hyrax/config_schemas/base/index
   /autoapi/hyrax/config_schemas/data_request/index


Classes
-------

.. autoapisummary::

   hyrax.config_schemas.BaseConfigModel
   hyrax.config_schemas.DataRequestConfig
   hyrax.config_schemas.DataRequestDefinition


Package Contents
----------------

.. py:class:: BaseConfigModel(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   Base class for future Hyrax configuration schemas.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


   .. py:attribute:: model_config

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


.. py:class:: DataRequestConfig(/, **data: Any)

   Bases: :py:obj:`hyrax.config_schemas.base.BaseConfigModel`


   Per-dataset configuration used within ``data_request``.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


   .. py:attribute:: dataset_class
      :type:  str
      :value: None


   .. py:attribute:: data_location
      :type:  str
      :value: None


   .. py:attribute:: fields
      :type:  list[str] | None
      :value: None


   .. py:attribute:: primary_id_field
      :type:  str | None
      :value: None


   .. py:attribute:: join_field
      :type:  str | None
      :value: None


   .. py:attribute:: dataset_config
      :type:  dict | None
      :value: None


   .. py:attribute:: augment
      :type:  bool | list[str] | None
      :value: None


   .. py:method:: resolve_data_location(v: str) -> str
      :classmethod:


      Fully resolve the data_location path, expanding user home directories
      and converting relative paths to absolute paths.


   .. py:method:: join_field_excludes_primary() -> DataRequestConfig

      Ensure that join_field and primary_id_field are mutually exclusive.


   .. py:method:: validate_augment_list() -> DataRequestConfig

      Validate the list form of augment against fields and primary_id_field.


   .. py:method:: as_dict(*, exclude_unset: bool = False) -> dict[str, Any]

      Return the configuration as a plain dictionary.


.. py:class:: DataRequestDefinition

   Bases: :py:obj:`pydantic.RootModel`\ [\ :py:obj:`dict`\ [\ :py:obj:`str`\ , :py:obj:`DatasetGroupValue`\ ]\ ]


   Typed representation of the full ``data_request`` table.

   Accepts any number of arbitrarily-named dataset groups (e.g. ``train``,
   ``validate``, ``infer``, ``test``, ``finetune``, …).  Each group value is
   a ``dict`` of *friendly-named* ``DataRequestConfig`` instances.  A friendly
   name must always be provided explicitly — the schema will raise a validation
   error if a dataset source is specified without one.

   Example (Python)::

       {
           "train": {
               "my_dataset": {
                   "dataset_class": "HyraxRandomDataset",
                   "data_location": "/path/to/data",
                   "primary_id_field": "object_id",
               }
           }
       }

   Example (TOML)::

       [data_request.train.my_dataset]
       dataset_class = "HyraxRandomDataset"
       data_location = "/path/to/data"
       primary_id_field = "object_id"


   .. py:method:: normalize_all_groups(value: Any) -> dict[str, DatasetGroupValue]
      :classmethod:


      Parse every top-level key into the expected group format.


   .. py:method:: reject_augment_on_infer() -> DataRequestDefinition

      Augmentation cannot be enabled on the 'infer' data group.


   .. py:method:: require_at_least_one_dataset() -> DataRequestDefinition

      Ensure at least one dataset group is provided.


   .. py:method:: validate_primary_id_fields() -> DataRequestDefinition

      Validate that exactly one DataRequestConfig in each dataset group
      has a non-None primary_id_field.

      This ensures that when multiple datasets are requested (e.g., a group
      contains a dict of multiple DataRequestConfig instances), exactly
      one of them specifies which field to use as the primary identifier.


   .. py:method:: validate_cross_group(groups: set[str]) -> None

      No-op: cross-group split validation is now handled by splitting_utils.validate_split_config.


   .. py:method:: __contains__(key: str) -> bool

      Return True if the group name is present in the definition.


   .. py:method:: __getitem__(key: str) -> DatasetGroupValue

      Return the dataset group value for the given group name.


   .. py:method:: as_dict(*, exclude_unset: bool = False) -> dict[str, Any]

      Export as a nested dictionary compatible with existing configs.

      Each group value is a dict of ``{friendly_name: flat_config_dict}``.
      No implicit ``"data"`` wrapper is added — the friendly names supplied
      by the user are preserved verbatim.