hyrax.datasets.dataset_registry
===============================

.. py:module:: hyrax.datasets.dataset_registry


Attributes
----------

.. autoapisummary::

   hyrax.datasets.dataset_registry.logger
   hyrax.datasets.dataset_registry.DATASET_REGISTRY


Classes
-------

.. autoapisummary::

   hyrax.datasets.dataset_registry.HyraxDataset
   hyrax.datasets.dataset_registry.HyraxImageDataset


Functions
---------

.. autoapisummary::

   hyrax.datasets.dataset_registry.fetch_dataset_class


Module Contents
---------------

.. py:data:: logger

.. py:data:: DATASET_REGISTRY
   :type:  dict[str, type[HyraxDataset]]

.. py:class:: HyraxDataset(config: dict, metadata_table=None, object_id_column_name=None)

   How to make a hyrax dataset:

   .. code-block:: python

       from hyrax.datasets import HyraxDataset

       class MyDataset(HyraxDataset):
           def __init__(self, config: dict):
               super().__init__(config)

           def __len__(self):
               # Your len function goes here
               pass

   Optional interfaces:

   ``metadata`` -> Subclasses may pass an astropy table of metadata to ``__init__`` in the
   superclass. This table of metadata will be available through the ``metadata_fields`` and
   ``metadata`` functions.  If desired, a subclass may override these functions directly
   rather than using the astropy Table interface.

   Further documentation is in the :doc:`/pre_executed/external_dataset_class` example notebook.


   .. py:method:: __init__

   Overall initialization for all Datasets which saves the config

   Subclasses of HyraxDataset ought call this at the end of their __init__ like:

   .. code-block:: python

       from hyrax.datasets import HyraxDataset

       class MyDataset(HyraxDataset):
           def __init__(config):
               <your code>
               super().__init__(config)

   If per tensor metadata is available, it is recommended that dataset authors create an
   astropy Table of that data, in the same order as their data and pass that `metadata_table`
   as shown below:

   .. code-block:: python

       from hyrax.datasets import HyraxDataset
       from astropy.table import Table

       class MyDataset(HyraxDataset):
           def __init__(config):
               <your code>
               metadata_table = Table(<Your catalog data goes here>)
               super().__init__(config, metadata_table)

   :param config: The runtime configuration for hyrax
   :type config: dict, Optional
   :param metadata_table: An Astropy Table with
                          1. the metadata columns desired for visualization AND
                          2. in the order your data will be enumerated.
   :type metadata_table: Optional[Table], optional
   :param object_id_column_name: The name of the column containing object IDs. If None, uses the default
                                 from config or creates one from the ids() method.
   :type object_id_column_name: Optional[str], optional


   .. py:attribute:: _config


   .. py:attribute:: _metadata_table
      :value: None



   .. py:property:: config


   .. py:method:: __init_subclass__()
      :classmethod:



   .. py:method:: metadata_fields() -> list[str]

      Returns a list of metadata fields supported by this object

      :returns: The column names of the metadata table passed. Empty string if no metadata was provided at
                during construction of the HyraxDataset (or derived class).
      :rtype: list[str]



   .. py:method:: augment_cache_key(idx: int, rng_seed: numpy.int64) -> numpy.int64 | None

      Return a cache key for augmented data, or None to skip caching.

      Base (non-augmented) data is always cached by index. This method is
      only called when augmentation is active, to decide whether the
      augmented result should also be cached. The default returns ``None``
      (augmented data is regenerated each access), which is the standard
      expectation in ML training.

      Override this when augmented results are deterministic and expensive
      to recompute.

      :param idx: The dataset-local index.
      :type idx: int
      :param rng_seed: The rng_seed passed to ``augment_<field>`` methods.
      :type rng_seed: np.int64

      :returns: Cache key, or ``None`` to skip caching augmented data.
      :rtype: np.int64 | None



   .. py:method:: on_epoch_start(verb: str)

      Called at the beginning of each epoch (or once for single-pass verbs).

      Override in subclasses to respond to epoch-level lifecycle events.

      :param verb: Name of the verb that is running, e.g. ``"train"``, ``"infer"``,
                   ``"test"``, or ``"engine"``.
      :type verb: str



   .. py:method:: metadata(idxs: numpy.typing.ArrayLike, fields: list[str]) -> numpy.typing.ArrayLike

      Returns a table representing the metadata given an array of indexes and a list of fields.

      :param idxs: The indexes of the relevant tensor objects
      :type idxs: npt.ArrayLike
      :param fields: The names of the fields you would like returned. All values must be among those returned by
                     metadata_fields()
      :type fields: list[str]

      :returns: A numpy record array of your metadata, with only the columns specified.
                Roughly equivalent to: `metadata_table[idxs][fields].as_array()` where metadata_table is the
                astropy table that the HyraxDataset (or derived class) was constructed with.
      :rtype: npt.ArrayLike

      :raises RuntimeError: When none of the provided fields are



.. py:function:: fetch_dataset_class(class_name: str) -> type[HyraxDataset]

   Fetch the dataset class from the registry.

   :param class_name:   The name of the dataset class to fetch. Either the class name of a built
                      in dataset, or the fully qualified name of a user-defined dataset.
                      e.g. "my_module.my_submodule.MyDatasetClass" or "HyraxRandomDataset".
   :type class_name: str

   :returns: The dataset class.
   :rtype: type[HyraxDataset]

   :raises ValueError: If a built in dataset was requested, but not found in the registry.
   :raises ValueError: If no dataset was specified in the runtime configuration.


.. py:class:: HyraxImageDataset

   This is a mixin for Image datasets primarily concerned with providing utility functions to
   allow derived classes to set and apply transformations based on configs.

   The various set_*_transform functions stack individual transformations on a single stack

   The stack can be applied with apply_transform.


   .. py:method:: set_function_transform()


   .. py:method:: set_crop_transform(cutout_shape=None)


   .. py:method:: apply_transform(data_torch)


   .. py:method:: _update_transform(new_transform)


   .. py:method:: _get_np_function(transform_str: str) -> collections.abc.Callable[Ellipsis, Any]

      _get_np_function. Returns the numpy mathematical function that the
      supplied string maps to; or raises an error if the supplied string
      cannot be mapped to a function.

      :param transform_str: The string to me mapped to a numpy function
      :type transform_str: str



