hyrax.datasets.hyrax_csv_dataset
================================

.. py:module:: hyrax.datasets.hyrax_csv_dataset


Classes
-------

.. autoapisummary::

   hyrax.datasets.hyrax_csv_dataset.HyraxCSVDataset


Module Contents
---------------

.. py:class:: HyraxCSVDataset(config: dict, data_location: pathlib.Path = None)

   Bases: :py:obj:`hyrax.datasets.dataset_registry.HyraxDataset`


   A Hyrax Dataset for CSV files.

   This class reads a CSV file using pandas with memory mapping enabled.
   It dynamically creates getter methods for each column in the CSV file,
   allowing users to request data from specific columns.

   .. note::

      Column names found in the CSV file are used to create the getter methods.
      If a column name contains characters that are invalid for method names,
      those characters are replaced with underscores.

   .. rubric:: Examples

   Example data_request configuration::

       {
           "train": {
               "data": {
                   "dataset_class": "HyraxCSVDataset",
                   "data_location": "</path/to/data.csv>",
                   "fields": ["<column1>", "<column2>", ...],
                   "primary_id_field": "<column name that contains a unique ID>",
               },
           },
           "validate": { "<similar to above>" },
           "infer": { "<similar to above>" },
       }

   .. py:method:: __init__

   Overall initialization for all Datasets which saves the config

   Subclasses of HyraxDataset ought call this at the end of their __init__ like:

   .. code-block:: python

       from hyrax.datasets import HyraxDataset

       class MyDataset(HyraxDataset):
           def __init__(config):
               <your code>
               super().__init__(config)

   If per tensor metadata is available, it is recommended that dataset authors create an
   astropy Table of that data, in the same order as their data and pass that `metadata_table`
   as shown below:

   .. code-block:: python

       from hyrax.datasets import HyraxDataset
       from astropy.table import Table

       class MyDataset(HyraxDataset):
           def __init__(config):
               <your code>
               metadata_table = Table(<Your catalog data goes here>)
               super().__init__(config, metadata_table)

   :param config: The runtime configuration for hyrax
   :type config: dict, Optional
   :param metadata_table: An Astropy Table with
                          1. the metadata columns desired for visualization AND
                          2. in the order your data will be enumerated.
   :type metadata_table: Optional[Table], optional
   :param object_id_column_name: The name of the column containing object IDs. If None, uses the default
                                 from config or creates one from the ids() method.
   :type object_id_column_name: Optional[str], optional


   .. py:attribute:: data_location
      :value: None


   .. py:attribute:: column_names


   .. py:attribute:: mem_mapped_csv
      :value: None


   .. py:method:: __getitem__(idx)

      Currently required by Hyrax machinery, but likely to be phased out.


   .. py:method:: __len__() -> int

      Return the number of records in the CSV.


   .. py:method:: sample_data()

      Return the first record, in dictionary form, as the sample.