hyrax.data_sets.data_set_registry
=================================

.. py:module:: hyrax.data_sets.data_set_registry


Attributes
----------

.. autoapisummary::

   hyrax.data_sets.data_set_registry.logger
   hyrax.data_sets.data_set_registry.DATASET_REGISTRY


Classes
-------

.. autoapisummary::

   hyrax.data_sets.data_set_registry.HyraxDataset
   hyrax.data_sets.data_set_registry.HyraxImageDataset


Functions
---------

.. autoapisummary::

   hyrax.data_sets.data_set_registry.fetch_dataset_class
   hyrax.data_sets.data_set_registry.iterable_dataset_collate


Module Contents
---------------

.. py:data:: logger

.. py:data:: DATASET_REGISTRY
   :type:  dict[str, type[HyraxDataset]]

.. py:class:: HyraxDataset(config: dict, metadata_table=None, object_id_column_name=None)

   How to make a hyrax dataset:

   .. code-block:: python

       from hyrax.data_sets import HyraxDataset
       from torch.utils.data import Dataset

       class MyDataset(HyraxDataset, Dataset):
           def __init__(self, config: dict):
               super().__init__(config)

           def __getitem__():
               # Your getitem goes here
               pass

           def __len__ ():
               # Your len function goes here
               pass

   Optional interfaces:

   ``ids()`` -> Subclasses may override this directly with their own ids function
   returning a generator of strings

   ``metadata`` -> Subclasses may pass an astropy table of metadata to ``__init__`` in the
   superclass. This table of metadata will be available through the ``metadata_fields`` and
   ``metadata`` functions.  If desired, a subclass may override these functions directly
   rather than using the astropy Table interface.

   Further documentation is in the :doc:`/pre_executed/custom_dataset` example notebook.


   .. py:method:: __init__

   Overall initialization for all DataSets which saves the config

   Subclasses of HyraxDataSet ought call this at the end of their __init__ like:

   .. code-block:: python

       from hyrax.data_sets import HyraxDataset
       from torch.utils.data import Dataset

       class MyDataset(HyraxDataset, Dataset):
           def __init__(config):
               <your code>
               super().__init__(config)

   If per tensor metadata is available, it is recommended that dataset authors create an
   astropy Table of that data, in the same order as their data and pass that `metadata_table`
   as shown below:

   .. code-block:: python

       from hyrax.data_sets import HyraxDataset
       from torch.utils.data import Dataset
       from astropy.table import Table

       class MyDataset(HyraxDataset, Dataset):
           def __init__(config):
               <your code>
               metadata_table = Table(<Your catalog data goes here>)
               super().__init__(config, metadata_table)

   :param config: The runtime configuration for hyrax
   :type config: dict, Optional
   :param metadata_table: An Astropy Table with
                          1. the metadata columns desired for visualization AND
                          2. in the order your data will be enumerated.
   :type metadata_table: Optional[Table], optional
   :param object_id_column_name: The name of the column containing object IDs. If None, uses the default
                                 from config or creates one from the ids() method.
   :type object_id_column_name: Optional[str], optional


   .. py:attribute:: _config


   .. py:attribute:: _metadata_table
      :value: None



   .. py:attribute:: tensorboardx_logger
      :value: None



   .. py:method:: is_iterable()
      :classmethod:


      Returns true if underlying dataset is iterable style, supporting __iter__ vs map style
      where  __getitem__/__len__ are the preferred access methods.

      :returns: True if underlying dataset is iterable
      :rtype: bool



   .. py:method:: is_map()
      :classmethod:


      Returns true if underlying dataset is map style, supporting __getitem__/__len__ vs iterable
      where __iter__ is the preferred access method.

      :returns: True if underlying dataset is map-style
      :rtype: bool



   .. py:property:: config


   .. py:method:: __init_subclass__()
      :classmethod:



   .. py:method:: ids() -> collections.abc.Generator[str]

      This is the default IDs function you get when you derive from hyrax Dataset

      :returns: A generator yielding all the string IDs of the dataset.
      :rtype: Generator[str]



   .. py:method:: sample_data() -> dict

      Get a sample from the dataset. This is a convenience function that returns
      the first sample from the dataset, regardless of whether it is iterable
      or map-style. Often this will be used to instantiate a model that adjusts
      its form based on the shape of the data.



   .. py:method:: metadata_fields() -> list[str]

      Returns a list of metadata fields supported by this object

      :returns: The column names of the metadata table passed. Empty string if no metadata was provided at
                during construction of the HyraxDataset (or derived class).
      :rtype: list[str]



   .. py:method:: metadata(idxs: numpy.typing.ArrayLike, fields: list[str]) -> numpy.typing.ArrayLike

      Returns a table representing the metadata given an array of indexes and a list of fields.

      :param idxs: The indexes of the relevant tensor objects
      :type idxs: npt.ArrayLike
      :param fields: The names of the fields you would like returned. All values must be among those returned by
                     metadata_fields()
      :type fields: list[str]

      :returns: A numpy record array of your metadata, with only the columns specified.
                Roughly equivalent to: `metadata_table[idxs][fields].as_array()` where metadata_table is the
                astropy table that the HyraxDataset (or derived class) was constructed with.
      :rtype: npt.ArrayLike

      :raises RuntimeError: When none of the provided fields are



.. py:function:: fetch_dataset_class(class_name: str) -> type[HyraxDataset]

   Fetch the dataset class from the registry.

   :param class_name:   The name of the dataset class to fetch. Either the class name of a built
                      in dataset, or the fully qualified name of a user-defined dataset.
                      e.g. "my_module.my_submodule.MyDatasetClass" or "HyraxRandomDataset".
   :type class_name: str

   :returns: The dataset class.
   :rtype: type[HyraxDataset]

   :raises ValueError: If a built in dataset was requested, but not found in the registry.
   :raises ValueError: If no dataset was specified in the runtime configuration.


.. py:class:: HyraxImageDataset

   This is a mixin for Image datasets primarily concerned with providing utility functions to
   allow derived classes to set and apply transformations based on configs.

   The various set_*_transform functions stack individual transformations on a single stack

   The stack can be applied with apply_transform.


   .. py:method:: set_function_transform()


   .. py:method:: set_crop_transform(cutout_shape=None)


   .. py:method:: apply_transform(data_torch)


   .. py:method:: _update_transform(new_transform)


   .. py:method:: _get_np_function(transform_str: str) -> collections.abc.Callable[Ellipsis, Any]

      _get_np_function. Returns the numpy mathematical function that the
      supplied string maps to; or raises an error if the supplied string
      cannot be mapped to a function.

      :param transform_str: The string to me mapped to a numpy function
      :type transform_str: str



.. py:function:: iterable_dataset_collate(batch: list[dict]) -> dict

   Collate function used for iterable datasets since they do not work with DataProviders default collate

   Enable with h.config["data_loader"]["collate_fn"] = "hyrax.data_sets.iterable_dataset_collate"

   :param batch: The batch of data dictionaries returned from the iterble dataset
   :type batch: list[dict]

   :returns: Dict where each non-dict value is a np.array of items, ready for further hyrax processing.
   :rtype: dict

   :raises RuntimeError: If internal dictionary logic fails. This usually means an error in the structure of the input
       dictionary.


