hyrax.config_schemas.data_request
=================================

.. py:module:: hyrax.config_schemas.data_request

.. autoapi-nested-parse::

   Pydantic models describing the structure of the ``data_request`` configuration.

   These schemas validate and enforce the structure of dataset requests used throughout
   the Hyrax framework.



Attributes
----------

.. autoapisummary::

   hyrax.config_schemas.data_request.DatasetGroupValue


Classes
-------

.. autoapisummary::

   hyrax.config_schemas.data_request.DataRequestConfig
   hyrax.config_schemas.data_request.DataRequestDefinition


Functions
---------

.. autoapisummary::

   hyrax.config_schemas.data_request._normalize_dataset_group
   hyrax.config_schemas.data_request._iter_all_configs


Module Contents
---------------

.. py:class:: DataRequestConfig(/, **data: Any)

   Bases: :py:obj:`hyrax.config_schemas.base.BaseConfigModel`


   Per-dataset configuration used within ``data_request``.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


   .. py:attribute:: dataset_class
      :type:  str
      :value: None



   .. py:attribute:: data_location
      :type:  str
      :value: None



   .. py:attribute:: fields
      :type:  list[str] | None
      :value: None



   .. py:attribute:: primary_id_field
      :type:  str | None
      :value: None



   .. py:attribute:: split_fraction
      :type:  float | None
      :value: None



   .. py:attribute:: join_field
      :type:  str | None
      :value: None



   .. py:attribute:: dataset_config
      :type:  dict | None
      :value: None



   .. py:method:: resolve_data_location(v: str) -> str
      :classmethod:


      Fully resolve the data_location path, expanding user home directories
      and converting relative paths to absolute paths.



   .. py:method:: require_primary_id_for_split_fraction() -> DataRequestConfig

      Ensure that split_fraction is only set when primary_id_field is also provided.



   .. py:method:: join_field_excludes_primary() -> DataRequestConfig

      Ensure that join_field and primary_id_field are mutually exclusive.



   .. py:method:: as_dict(*, exclude_unset: bool = False) -> dict[str, Any]

      Return the configuration as a plain dictionary.



.. py:data:: DatasetGroupValue

.. py:function:: _normalize_dataset_group(value: Any) -> DatasetGroupValue

   Normalize a single dataset group value into a ``dict[str, DataRequestConfig]``.

   Every dataset source within a group must be identified by a user-supplied
   *friendly name*.  The friendly name is the key in the returned dict and is
   used by ``DataProvider`` to reference the dataset at runtime.

   Accepted inputs
   ---------------
   - A ``dict`` whose values are ``DataRequestConfig`` instances or plain dicts
     that can be validated as one.  The keys become the friendly names.

   Rejected inputs (raise ``ValueError``)
   ----------------------------------------
   - A flat dict that contains ``dataset_class`` at the top level (no friendly
     name wrapper).
   - A bare ``DataRequestConfig`` instance (no friendly name wrapper).


.. py:function:: _iter_all_configs(groups: dict[str, DatasetGroupValue]) -> list[tuple[str, DataRequestConfig]]

   Yield ``(group_name, config)`` pairs across all groups.


.. py:class:: DataRequestDefinition

   Bases: :py:obj:`pydantic.RootModel`\ [\ :py:obj:`dict`\ [\ :py:obj:`str`\ , :py:obj:`DatasetGroupValue`\ ]\ ]


   Typed representation of the full ``data_request`` table.

   Accepts any number of arbitrarily-named dataset groups (e.g. ``train``,
   ``validate``, ``infer``, ``test``, ``finetune``, …).  Each group value is
   a ``dict`` of *friendly-named* ``DataRequestConfig`` instances.  A friendly
   name must always be provided explicitly — the schema will raise a validation
   error if a dataset source is specified without one.

   Example (Python)::

       {
           "train": {
               "my_dataset": {
                   "dataset_class": "HyraxRandomDataset",
                   "data_location": "/path/to/data",
                   "primary_id_field": "object_id",
               }
           }
       }

   Example (TOML)::

       [data_request.train.my_dataset]
       dataset_class = "HyraxRandomDataset"
       data_location = "/path/to/data"
       primary_id_field = "object_id"


   .. py:method:: normalize_all_groups(value: Any) -> dict[str, DatasetGroupValue]
      :classmethod:


      Parse every top-level key into the expected group format.



   .. py:method:: require_at_least_one_dataset() -> DataRequestDefinition

      Ensure at least one dataset group is provided.



   .. py:method:: validate_primary_id_fields() -> DataRequestDefinition

      Validate that exactly one DataRequestConfig in each dataset group
      has a non-None primary_id_field.

      This ensures that when multiple datasets are requested (e.g., a group
      contains a dict of multiple DataRequestConfig instances), exactly
      one of them specifies which field to use as the primary identifier.



   .. py:method:: validate_cross_group(groups: set[str]) -> None

      Run cross-group split_fraction checks restricted to the specified groups.

      This method is intended to be called by verb classes at instantiation time,
      scoped to only the dataset groups the verb actually uses (via
      ``REQUIRED_DATA_GROUPS`` and ``OPTIONAL_DATA_GROUPS``).  By restricting
      validation to active groups, configs that contain groups irrelevant to the
      current verb do not cause false validation failures.

      :param groups: Set of active group names to validate.  Only configs belonging to
                     these groups are considered.
      :type groups: set[str]

      :raises ValueError: If split_fraction values for a given ``data_location`` sum to more
          than 1.0, or if split_fraction consistency is violated (some configs
          for a location set it while others do not).



   .. py:method:: __contains__(key: str) -> bool

      Return True if the group name is present in the definition.



   .. py:method:: __getitem__(key: str) -> DatasetGroupValue

      Return the dataset group value for the given group name.



   .. py:method:: as_dict(*, exclude_unset: bool = False) -> dict[str, Any]

      Export as a nested dictionary compatible with existing configs.

      Each group value is a dict of ``{friendly_name: flat_config_dict}``.
      No implicit ``"data"`` wrapper is added — the friendly names supplied
      by the user are preserved verbatim.



