hyrax.data_sets.random.hyrax_random_dataset

Attributes

INVALID_VALUES

Mapping of string representation of invalid values to numpy representations.

Classes

HyraxRandomDatasetBase

This is the base class for the random datasets provided by Hyrax.

HyraxRandomDataset

This dataset is stand-in for a map-style dataset.

HyraxRandomIterableDataset

This dataset is stand-in for a iterable-style, or streaming, dataset.

Module Contents

INVALID_VALUES[source]

Mapping of string representation of invalid values to numpy representations.

class HyraxRandomDatasetBase(config, data_location)[source]

This is the base class for the random datasets provided by Hyrax.

Warning

Direct use of HyraxRandomDatasetBase is not advised. When working with Hyrax, prefer to use HyraxRandomDataset or HyraxRandomIterableDataset.

__init__(config, data_location)[source]

Initialize the dataset using the parameters defined in the configuration.

Parameter included for API consistency with other dataset classes, though not used by this implementation. All parameters are controlled by the following keys under the ["data_set"]["HyraxRandomDataset"] table in the configuration:

  • size: The number of random data samples to produce.

  • shape: The shape of each random data sample as a tuple (e.g. (3, 29, 29) = 3 layers of 2D data, each layer is 29x29 elements).

  • seed: The random seed to use for reproducibility.

  • provided_labels: A list of possible labels to randomly select from. If this is provided, the dataset will randomly select a label for each data sample.

  • metadata_fields: A list of metadata field names. Used to create a metadata

    table with columns corresponding to each field name. All data is numeric.

  • number_invalid_values: The number of invalid values to insert into the data.

  • invalid_value_type: The type of invalid value to insert into the data. Valid values are “nan”, “inf”, “-inf”, “none”, or a float value.

data: numpy.ndarray[source]

The random data samples produced by the dataset.

id_list: list[source]

A list of sequential numeric IDs for each data sample.

provided_labels: list[source]

A list of labels randomly selected from the provided list of possible labels.

data_location[source]
get_image(idx: int) numpy.ndarray[source]

Get the image at the given index as a NumPy array.

get_label(idx: int) str[source]

Get the label at the given index.

get_object_id(idx: int) str[source]

Get the index of the item.

class HyraxRandomDataset(config, data_location)[source]

Bases: HyraxRandomDatasetBase, hyrax.data_sets.data_set_registry.HyraxDataset, torch.utils.data.Dataset

This dataset is stand-in for a map-style dataset. It will produce random numpy arrays along with sequential numeric ids and, optionally, labels randomly selected from the provided list of possible labels.

__init__(config, data_location)[source]

Initialize the dataset using the parameters defined in the configuration.

Parameter included for API consistency with other dataset classes, though not used by this implementation. All parameters are controlled by the following keys under the ["data_set"]["HyraxRandomDataset"] table in the configuration:

  • size: The number of random data samples to produce.

  • shape: The shape of each random data sample as a tuple (e.g. (3, 29, 29) = 3 layers of 2D data, each layer is 29x29 elements).

  • seed: The random seed to use for reproducibility.

  • provided_labels: A list of possible labels to randomly select from. If this is provided, the dataset will randomly select a label for each data sample.

  • metadata_fields: A list of metadata field names. Used to create a metadata

    table with columns corresponding to each field name. All data is numeric.

  • number_invalid_values: The number of invalid values to insert into the data.

  • invalid_value_type: The type of invalid value to insert into the data. Valid values are “nan”, “inf”, “-inf”, “none”, or a float value.

__getitem__(idx: int) dict[source]

Get a data sample by index. The returned dictionary will contain the following keys:

  • index: The index of the data sample.

  • object_id: The ID of the data sample.

  • image: The data sample as a numpy array.

  • label: The label of the data sample (if provided).

Parameters:

idx (int) – The index of the data sample to retrieve.

Returns:

A dictionary containing the data sample and its metadata.

Return type:

dict

__len__()[source]

Get the total number of samples in this dataset. This should be return the same value as the size parameter in the configuration.

ids()[source]

This function yields IDs for the dataset. It can be used as an iterable in a loop, or converted to a list by wrapping the function call in list(...).

class HyraxRandomIterableDataset(config, data_location)[source]

Bases: HyraxRandomDatasetBase, hyrax.data_sets.data_set_registry.HyraxDataset, torch.utils.data.IterableDataset

This dataset is stand-in for a iterable-style, or streaming, dataset. It will produce random numpy arrays and, optionally, labels randomly selected from the provided list of possible labels.

Note

While ids will be generated automatically for this dataset, calling the ids method of this dataset will return the index instead of the id.

__init__(config, data_location)[source]

Initialize the dataset using the parameters defined in the configuration.

Parameter included for API consistency with other dataset classes, though not used by this implementation. All parameters are controlled by the following keys under the ["data_set"]["HyraxRandomDataset"] table in the configuration:

  • size: The number of random data samples to produce.

  • shape: The shape of each random data sample as a tuple (e.g. (3, 29, 29) = 3 layers of 2D data, each layer is 29x29 elements).

  • seed: The random seed to use for reproducibility.

  • provided_labels: A list of possible labels to randomly select from. If this is provided, the dataset will randomly select a label for each data sample.

  • metadata_fields: A list of metadata field names. Used to create a metadata

    table with columns corresponding to each field name. All data is numeric.

  • number_invalid_values: The number of invalid values to insert into the data.

  • invalid_value_type: The type of invalid value to insert into the data. Valid values are “nan”, “inf”, “-inf”, “none”, or a float value.

__iter__()[source]

Yield the next data sample. The returned dictionary will have the following form:

  • data: A dictionary containing the following keys:

index: The index of the data sample. – object_id: The value will be the same as index for this dataset. – image: The data sample as a numpy array. – label: The label of the data sample (if provided).

Returns:

A dictionary containing a data sample and its metadata.

Return type:

dict