hyrax.config_schemas.data_request
=================================

.. py:module:: hyrax.config_schemas.data_request

.. autoapi-nested-parse::

   Pydantic models describing the structure of the ``data_request`` configuration.

   These schemas validate and enforce the structure of dataset requests used throughout
   the Hyrax framework.



Attributes
----------

.. autoapisummary::

   hyrax.config_schemas.data_request.DatasetGroupValue


Classes
-------

.. autoapisummary::

   hyrax.config_schemas.data_request.DataRequestConfig
   hyrax.config_schemas.data_request.DataRequestDefinition


Functions
---------

.. autoapisummary::

   hyrax.config_schemas.data_request._normalize_dataset_group
   hyrax.config_schemas.data_request._iter_all_configs


Module Contents
---------------

.. py:class:: DataRequestConfig(/, **data: Any)

   Bases: :py:obj:`hyrax.config_schemas.base.BaseConfigModel`


   Per-dataset configuration used within ``data_request``.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


   .. py:attribute:: dataset_class
      :type:  str
      :value: None



   .. py:attribute:: data_location
      :type:  str
      :value: None



   .. py:attribute:: fields
      :type:  list[str] | None
      :value: None



   .. py:attribute:: primary_id_field
      :type:  str | None
      :value: None



   .. py:attribute:: join_field
      :type:  str | None
      :value: None



   .. py:attribute:: dataset_config
      :type:  dict | None
      :value: None



   .. py:attribute:: augment
      :type:  bool | list[str] | None
      :value: None



   .. py:method:: resolve_data_location(v: str) -> str
      :classmethod:


      Fully resolve the data_location path, expanding user home directories
      and converting relative paths to absolute paths.



   .. py:method:: join_field_excludes_primary() -> DataRequestConfig

      Ensure that join_field and primary_id_field are mutually exclusive.



   .. py:method:: validate_augment_list() -> DataRequestConfig

      Validate the list form of augment against fields and primary_id_field.



   .. py:method:: as_dict(*, exclude_unset: bool = False) -> dict[str, Any]

      Return the configuration as a plain dictionary.



.. py:data:: DatasetGroupValue

.. py:function:: _normalize_dataset_group(value: Any) -> DatasetGroupValue

   Normalize a single dataset group value into a ``dict[str, DataRequestConfig]``.

   Every dataset source within a group must be identified by a user-supplied
   *friendly name*.  The friendly name is the key in the returned dict and is
   used by ``DataProvider`` to reference the dataset at runtime.

   Accepted inputs
   ---------------
   - A ``dict`` whose values are ``DataRequestConfig`` instances or plain dicts
     that can be validated as one.  The keys become the friendly names.

   Rejected inputs (raise ``ValueError``)
   ----------------------------------------
   - A flat dict that contains ``dataset_class`` at the top level (no friendly
     name wrapper).
   - A bare ``DataRequestConfig`` instance (no friendly name wrapper).


.. py:function:: _iter_all_configs(groups: dict[str, DatasetGroupValue]) -> list[tuple[str, DataRequestConfig]]

   Yield ``(group_name, config)`` pairs across all groups.


.. py:class:: DataRequestDefinition

   Bases: :py:obj:`pydantic.RootModel`\ [\ :py:obj:`dict`\ [\ :py:obj:`str`\ , :py:obj:`DatasetGroupValue`\ ]\ ]


   Typed representation of the full ``data_request`` table.

   Accepts any number of arbitrarily-named dataset groups (e.g. ``train``,
   ``validate``, ``infer``, ``test``, ``finetune``, …).  Each group value is
   a ``dict`` of *friendly-named* ``DataRequestConfig`` instances.  A friendly
   name must always be provided explicitly — the schema will raise a validation
   error if a dataset source is specified without one.

   Example (Python)::

       {
           "train": {
               "my_dataset": {
                   "dataset_class": "HyraxRandomDataset",
                   "data_location": "/path/to/data",
                   "primary_id_field": "object_id",
               }
           }
       }

   Example (TOML)::

       [data_request.train.my_dataset]
       dataset_class = "HyraxRandomDataset"
       data_location = "/path/to/data"
       primary_id_field = "object_id"


   .. py:method:: normalize_all_groups(value: Any) -> dict[str, DatasetGroupValue]
      :classmethod:


      Parse every top-level key into the expected group format.



   .. py:method:: reject_augment_on_infer() -> DataRequestDefinition

      Augmentation cannot be enabled on the 'infer' data group.



   .. py:method:: require_at_least_one_dataset() -> DataRequestDefinition

      Ensure at least one dataset group is provided.



   .. py:method:: validate_primary_id_fields() -> DataRequestDefinition

      Validate that exactly one DataRequestConfig in each dataset group
      has a non-None primary_id_field.

      This ensures that when multiple datasets are requested (e.g., a group
      contains a dict of multiple DataRequestConfig instances), exactly
      one of them specifies which field to use as the primary identifier.



   .. py:method:: validate_cross_group(groups: set[str]) -> None

      No-op: cross-group split validation is now handled by splitting_utils.validate_split_config.



   .. py:method:: __contains__(key: str) -> bool

      Return True if the group name is present in the definition.



   .. py:method:: __getitem__(key: str) -> DatasetGroupValue

      Return the dataset group value for the given group name.



   .. py:method:: as_dict(*, exclude_unset: bool = False) -> dict[str, Any]

      Export as a nested dictionary compatible with existing configs.

      Each group value is a dict of ``{friendly_name: flat_config_dict}``.
      No implicit ``"data"`` wrapper is added — the friendly names supplied
      by the user are preserved verbatim.



