hyrax.data_sets.random.hyrax_random_dataset
Attributes
Mapping of string representation of invalid values to numpy representations. |
Classes
This is the base class for the random datasets provided by Hyrax. |
|
This dataset is stand-in for a map-style dataset. |
|
This dataset is stand-in for a iterable-style, or streaming, dataset. |
Module Contents
- INVALID_VALUES[source]
Mapping of string representation of invalid values to numpy representations.
- class HyraxRandomDatasetBase(config, data_location)[source]
This is the base class for the random datasets provided by Hyrax.
Warning
Direct use of
HyraxRandomDatasetBaseis not advised. When working with Hyrax, prefer to useHyraxRandomDatasetorHyraxRandomIterableDataset.Initialize the dataset using the parameters defined in the configuration.
Parameter included for API consistency with other dataset classes, though not used by this implementation. All parameters are controlled by the following keys under the
["data_set"]["HyraxRandomDataset"]table in the configuration:size: The number of random data samples to produce.shape: The shape of each random data sample as a tuple (e.g. (3, 29, 29) = 3 layers of 2D data, each layer is 29x29 elements).seed: The random seed to use for reproducibility.provided_labels: A list of possible labels to randomly select from. If this is provided, the dataset will randomly select a label for each data sample.metadata_fields: A list of metadata field names. Used to create a metadatatable with columns corresponding to each field name. All data is numeric.
number_invalid_values: The number of invalid values to insert into the data.invalid_value_type: The type of invalid value to insert into the data. Valid values are “nan”, “inf”, “-inf”, “none”, or a float value.
- class HyraxRandomDataset(config, data_location)[source]
Bases:
HyraxRandomDatasetBase,hyrax.data_sets.data_set_registry.HyraxDataset,torch.utils.data.DatasetThis dataset is stand-in for a map-style dataset. It will produce random numpy arrays along with sequential numeric ids and, optionally, labels randomly selected from the provided list of possible labels.
Initialize the dataset using the parameters defined in the configuration.
Parameter included for API consistency with other dataset classes, though not used by this implementation. All parameters are controlled by the following keys under the
["data_set"]["HyraxRandomDataset"]table in the configuration:size: The number of random data samples to produce.shape: The shape of each random data sample as a tuple (e.g. (3, 29, 29) = 3 layers of 2D data, each layer is 29x29 elements).seed: The random seed to use for reproducibility.provided_labels: A list of possible labels to randomly select from. If this is provided, the dataset will randomly select a label for each data sample.metadata_fields: A list of metadata field names. Used to create a metadatatable with columns corresponding to each field name. All data is numeric.
number_invalid_values: The number of invalid values to insert into the data.invalid_value_type: The type of invalid value to insert into the data. Valid values are “nan”, “inf”, “-inf”, “none”, or a float value.
- __getitem__(idx: int) dict[source]
Get a data sample by index. The returned dictionary will contain the following keys:
index: The index of the data sample.object_id: The ID of the data sample.image: The data sample as a numpy array.label: The label of the data sample (if provided).
- Parameters:
idx (int) – The index of the data sample to retrieve.
- Returns:
A dictionary containing the data sample and its metadata.
- Return type:
dict
- class HyraxRandomIterableDataset(config, data_location)[source]
Bases:
HyraxRandomDatasetBase,hyrax.data_sets.data_set_registry.HyraxDataset,torch.utils.data.IterableDatasetThis dataset is stand-in for a iterable-style, or streaming, dataset. It will produce random numpy arrays and, optionally, labels randomly selected from the provided list of possible labels.
Note
While ids will be generated automatically for this dataset, calling the
idsmethod of this dataset will return the index instead of the id.Initialize the dataset using the parameters defined in the configuration.
Parameter included for API consistency with other dataset classes, though not used by this implementation. All parameters are controlled by the following keys under the
["data_set"]["HyraxRandomDataset"]table in the configuration:size: The number of random data samples to produce.shape: The shape of each random data sample as a tuple (e.g. (3, 29, 29) = 3 layers of 2D data, each layer is 29x29 elements).seed: The random seed to use for reproducibility.provided_labels: A list of possible labels to randomly select from. If this is provided, the dataset will randomly select a label for each data sample.metadata_fields: A list of metadata field names. Used to create a metadatatable with columns corresponding to each field name. All data is numeric.
number_invalid_values: The number of invalid values to insert into the data.invalid_value_type: The type of invalid value to insert into the data. Valid values are “nan”, “inf”, “-inf”, “none”, or a float value.
- __iter__()[source]
Yield the next data sample. The returned dictionary will have the following form:
data: A dictionary containing the following keys:
–
index: The index of the data sample. –object_id: The value will be the same asindexfor this dataset. –image: The data sample as a numpy array. –label: The label of the data sample (if provided).- Returns:
A dictionary containing a data sample and its metadata.
- Return type:
dict