hyrax.data_sets.hyrax_csv_dataset

hyrax.data_sets.hyrax_csv_dataset#

Classes#

HyraxCSVDataset

A Hyrax Dataset for CSV files.

Module Contents#

class HyraxCSVDataset(config: dict, data_location: pathlib.Path = None)[source]#

Bases: hyrax.data_sets.data_set_registry.HyraxDataset

A Hyrax Dataset for CSV files.

This class reads a CSV file using pandas with memory mapping enabled. It dynamically creates getter methods for each column in the CSV file, allowing users to request data from specific columns.

Note

Column names found in the CSV file are used to create the getter methods. If a column name contains characters that are invalid for method names, those characters are replaced with underscores.

Examples

Example model_inputs configuration:

{
    "train": {
        "data": {
            "dataset_class": "HyraxCSVDataset",
            "data_location": "</path/to/data.csv>",
            "fields": ["<column1>", "<column2>", ...],
            "primary_id_field": "<column name that contains a unique ID>",
        },
    },
    "validate": { "<similar to above>" },
    "infer": { "<similar to above>" },
}

__init__()[source]#

Overall initialization for all DataSets which saves the config

Subclasses of HyraxDataSet ought call this at the end of their __init__ like:

from hyrax.data_sets import HyraxDataset
from torch.utils.data import Dataset

class MyDataset(HyraxDataset, Dataset):
    def __init__(config):
        <your code>
        super().__init__(config)

If per tensor metadata is available, it is recommended that dataset authors create an astropy Table of that data, in the same order as their data and pass that metadata_table as shown below:

from hyrax.data_sets import HyraxDataset
from torch.utils.data import Dataset
from astropy.table import Table

class MyDataset(HyraxDataset, Dataset):
    def __init__(config):
        <your code>
        metadata_table = Table(<Your catalog data goes here>)
        super().__init__(config, metadata_table)

Parameters:

config (dict, Optional) – The runtime configuration for hyrax
metadata_table (Optional[Table], optional) – An Astropy Table with 1. the metadata columns desired for visualization AND 2. in the order your data will be enumerated.
object_id_column_name (Optional[str], optional) – The name of the column containing object IDs. If None, uses the default from config or creates one from the ids() method.

data_location = None[source]#

column_names[source]#

mem_mapped_csv = None[source]#

__getitem__(idx)[source]#: Currently required by Hyrax machinery, but likely to be phased out.

__len__() → int[source]#: Return the number of records in the CSV.

sample_data()[source]#: Return the first record, in dictionary form, as the sample.

classmethod is_map() → bool[source]#: Boilerplate method to indicate this is a map-style dataset.