hyrax.data_sets.data_set_registry#

Attributes#

Classes#

HyraxDataset

How to make a hyrax dataset:

HyraxImageDataset

This is a mixin for Image datasets primarily concerned with providing utility functions to

Functions#

fetch_dataset_class(→ type[HyraxDataset])

Fetch the dataset class from the registry.

iterable_dataset_collate(→ dict)

Collate function used for iterable datasets since they do not work with DataProviders default collate

Module Contents#

logger[source]#
DATASET_REGISTRY: dict[str, type[HyraxDataset]][source]#
class HyraxDataset(config: dict, metadata_table=None, object_id_column_name=None)[source]#

How to make a hyrax dataset:

from hyrax.data_sets import HyraxDataset
from torch.utils.data import Dataset

class MyDataset(HyraxDataset, Dataset):
    def __init__(self, config: dict):
        super().__init__(config)

    def __getitem__():
        # Your getitem goes here
        pass

    def __len__ ():
        # Your len function goes here
        pass

Optional interfaces:

ids() -> Subclasses may override this directly with their own ids function returning a generator of strings

metadata -> Subclasses may pass an astropy table of metadata to __init__ in the superclass. This table of metadata will be available through the metadata_fields and metadata functions. If desired, a subclass may override these functions directly rather than using the astropy Table interface.

Further documentation is in the Getting started with Hyrax Custom Dataset Classes example notebook.

__init__()[source]#

Overall initialization for all DataSets which saves the config

Subclasses of HyraxDataSet ought call this at the end of their __init__ like:

from hyrax.data_sets import HyraxDataset
from torch.utils.data import Dataset

class MyDataset(HyraxDataset, Dataset):
    def __init__(config):
        <your code>
        super().__init__(config)

If per tensor metadata is available, it is recommended that dataset authors create an astropy Table of that data, in the same order as their data and pass that metadata_table as shown below:

from hyrax.data_sets import HyraxDataset
from torch.utils.data import Dataset
from astropy.table import Table

class MyDataset(HyraxDataset, Dataset):
    def __init__(config):
        <your code>
        metadata_table = Table(<Your catalog data goes here>)
        super().__init__(config, metadata_table)
Parameters:
  • config (dict, Optional) – The runtime configuration for hyrax

  • metadata_table (Optional[Table], optional) – An Astropy Table with 1. the metadata columns desired for visualization AND 2. in the order your data will be enumerated.

  • object_id_column_name (Optional[str], optional) – The name of the column containing object IDs. If None, uses the default from config or creates one from the ids() method.

_config[source]#
_metadata_table = None[source]#
tensorboardx_logger = None[source]#
classmethod is_iterable()[source]#

Returns true if underlying dataset is iterable style, supporting __iter__ vs map style where __getitem__/__len__ are the preferred access methods.

Returns:

True if underlying dataset is iterable

Return type:

bool

classmethod is_map()[source]#

Returns true if underlying dataset is map style, supporting __getitem__/__len__ vs iterable where __iter__ is the preferred access method.

Returns:

True if underlying dataset is map-style

Return type:

bool

property config[source]#
classmethod __init_subclass__()[source]#
ids() collections.abc.Generator[str][source]#

This is the default IDs function you get when you derive from hyrax Dataset

Returns:

A generator yielding all the string IDs of the dataset.

Return type:

Generator[str]

sample_data() dict[source]#

Get a sample from the dataset. This is a convenience function that returns the first sample from the dataset, regardless of whether it is iterable or map-style. Often this will be used to instantiate a model that adjusts its form based on the shape of the data.

metadata_fields() list[str][source]#

Returns a list of metadata fields supported by this object

Returns:

The column names of the metadata table passed. Empty string if no metadata was provided at during construction of the HyraxDataset (or derived class).

Return type:

list[str]

metadata(idxs: numpy.typing.ArrayLike, fields: list[str]) numpy.typing.ArrayLike[source]#

Returns a table representing the metadata given an array of indexes and a list of fields.

Parameters:
  • idxs (npt.ArrayLike) – The indexes of the relevant tensor objects

  • fields (list[str]) – The names of the fields you would like returned. All values must be among those returned by metadata_fields()

Returns:

A numpy record array of your metadata, with only the columns specified. Roughly equivalent to: metadata_table[idxs][fields].as_array() where metadata_table is the astropy table that the HyraxDataset (or derived class) was constructed with.

Return type:

npt.ArrayLike

Raises:

RuntimeError – When none of the provided fields are

fetch_dataset_class(class_name: str) type[HyraxDataset][source]#

Fetch the dataset class from the registry.

Parameters:

class_name (str) – The name of the dataset class to fetch. Either the class name of a built in dataset, or the fully qualified name of a user-defined dataset. e.g. “my_module.my_submodule.MyDatasetClass” or “HyraxRandomDataset”.

Returns:

The dataset class.

Return type:

type[HyraxDataset]

Raises:
  • ValueError – If a built in dataset was requested, but not found in the registry.

  • ValueError – If no dataset was specified in the runtime configuration.

class HyraxImageDataset[source]#

This is a mixin for Image datasets primarily concerned with providing utility functions to allow derived classes to set and apply transformations based on configs.

The various set_*_transform functions stack individual transformations on a single stack

The stack can be applied with apply_transform.

set_function_transform()[source]#
set_crop_transform(cutout_shape=None)[source]#
apply_transform(data_torch)[source]#
_update_transform(new_transform)[source]#
_get_np_function(transform_str: str) collections.abc.Callable[Ellipsis, Any][source]#

_get_np_function. Returns the numpy mathematical function that the supplied string maps to; or raises an error if the supplied string cannot be mapped to a function.

Parameters:

transform_str (str) – The string to me mapped to a numpy function

iterable_dataset_collate(batch: list[dict]) dict[source]#

Collate function used for iterable datasets since they do not work with DataProviders default collate

Enable with h.config[“data_loader”][“collate_fn”] = “hyrax.data_sets.iterable_dataset_collate”

Parameters:

batch (list[dict]) – The batch of data dictionaries returned from the iterble dataset

Returns:

Dict where each non-dict value is a np.array of items, ready for further hyrax processing.

Return type:

dict

Raises:

RuntimeError – If internal dictionary logic fails. This usually means an error in the structure of the input dictionary.