hyrax.data_sets.hyrax_csv_dataset#
Classes#
A Hyrax Dataset for CSV files. |
Module Contents#
- class HyraxCSVDataset(config: dict, data_location: pathlib.Path = None)[source]#
Bases:
hyrax.data_sets.data_set_registry.HyraxDatasetA Hyrax Dataset for CSV files.
This class reads a CSV file using pandas with memory mapping enabled. It dynamically creates getter methods for each column in the CSV file, allowing users to request data from specific columns.
Note
Column names found in the CSV file are used to create the getter methods. If a column name contains characters that are invalid for method names, those characters are replaced with underscores.
Examples
Example model_inputs configuration:
{ "train": { "data": { "dataset_class": "HyraxCSVDataset", "data_location": "</path/to/data.csv>", "fields": ["<column1>", "<column2>", ...], "primary_id_field": "<column name that contains a unique ID>", }, }, "validate": { "<similar to above>" }, "infer": { "<similar to above>" }, }
Overall initialization for all DataSets which saves the config
Subclasses of HyraxDataSet ought call this at the end of their __init__ like:
from hyrax.data_sets import HyraxDataset from torch.utils.data import Dataset class MyDataset(HyraxDataset, Dataset): def __init__(config): <your code> super().__init__(config)
If per tensor metadata is available, it is recommended that dataset authors create an astropy Table of that data, in the same order as their data and pass that metadata_table as shown below:
from hyrax.data_sets import HyraxDataset from torch.utils.data import Dataset from astropy.table import Table class MyDataset(HyraxDataset, Dataset): def __init__(config): <your code> metadata_table = Table(<Your catalog data goes here>) super().__init__(config, metadata_table)
- Parameters:
config (dict, Optional) – The runtime configuration for hyrax
metadata_table (Optional[Table], optional) – An Astropy Table with 1. the metadata columns desired for visualization AND 2. in the order your data will be enumerated.
object_id_column_name (Optional[str], optional) – The name of the column containing object IDs. If None, uses the default from config or creates one from the ids() method.