hyrax.datasets.dataset_registry#

Attributes#

Classes#

HyraxDataset

How to make a hyrax dataset:

HyraxImageDataset

This is a mixin for Image datasets primarily concerned with providing utility functions to

Functions#

fetch_dataset_class(→ type[HyraxDataset])

Fetch the dataset class from the registry.

Module Contents#

logger[source]#
DATASET_REGISTRY: dict[str, type[HyraxDataset]][source]#
class HyraxDataset(config: dict, metadata_table=None, object_id_column_name=None)[source]#

How to make a hyrax dataset:

from hyrax.datasets import HyraxDataset

class MyDataset(HyraxDataset):
    def __init__(self, config: dict):
        super().__init__(config)

    def __len__(self):
        # Your len function goes here
        pass

Optional interfaces:

metadata -> Subclasses may pass an astropy table of metadata to __init__ in the superclass. This table of metadata will be available through the metadata_fields and metadata functions. If desired, a subclass may override these functions directly rather than using the astropy Table interface.

Further documentation is in the Build a dataset class in a notebook example notebook.

__init__()[source]#

Overall initialization for all Datasets which saves the config

Subclasses of HyraxDataset ought call this at the end of their __init__ like:

from hyrax.datasets import HyraxDataset

class MyDataset(HyraxDataset):
    def __init__(config):
        <your code>
        super().__init__(config)

If per tensor metadata is available, it is recommended that dataset authors create an astropy Table of that data, in the same order as their data and pass that metadata_table as shown below:

from hyrax.datasets import HyraxDataset
from astropy.table import Table

class MyDataset(HyraxDataset):
    def __init__(config):
        <your code>
        metadata_table = Table(<Your catalog data goes here>)
        super().__init__(config, metadata_table)
Parameters:
  • config (dict, Optional) – The runtime configuration for hyrax

  • metadata_table (Optional[Table], optional) – An Astropy Table with 1. the metadata columns desired for visualization AND 2. in the order your data will be enumerated.

  • object_id_column_name (Optional[str], optional) – The name of the column containing object IDs. If None, uses the default from config or creates one from the ids() method.

_config[source]#
_metadata_table = None[source]#
property config[source]#
classmethod __init_subclass__()[source]#
metadata_fields() list[str][source]#

Returns a list of metadata fields supported by this object

Returns:

The column names of the metadata table passed. Empty string if no metadata was provided at during construction of the HyraxDataset (or derived class).

Return type:

list[str]

augment_cache_key(idx: int, rng_seed: numpy.int64) numpy.int64 | None[source]#

Return a cache key for augmented data, or None to skip caching.

Base (non-augmented) data is always cached by index. This method is only called when augmentation is active, to decide whether the augmented result should also be cached. The default returns None (augmented data is regenerated each access), which is the standard expectation in ML training.

Override this when augmented results are deterministic and expensive to recompute.

Parameters:
  • idx (int) – The dataset-local index.

  • rng_seed (np.int64) – The rng_seed passed to augment_<field> methods.

Returns:

Cache key, or None to skip caching augmented data.

Return type:

np.int64 | None

on_epoch_start(verb: str)[source]#

Called at the beginning of each epoch (or once for single-pass verbs).

Override in subclasses to respond to epoch-level lifecycle events.

Parameters:

verb (str) – Name of the verb that is running, e.g. "train", "infer", "test", or "engine".

metadata(idxs: numpy.typing.ArrayLike, fields: list[str]) numpy.typing.ArrayLike[source]#

Returns a table representing the metadata given an array of indexes and a list of fields.

Parameters:
  • idxs (npt.ArrayLike) – The indexes of the relevant tensor objects

  • fields (list[str]) – The names of the fields you would like returned. All values must be among those returned by metadata_fields()

Returns:

A numpy record array of your metadata, with only the columns specified. Roughly equivalent to: metadata_table[idxs][fields].as_array() where metadata_table is the astropy table that the HyraxDataset (or derived class) was constructed with.

Return type:

npt.ArrayLike

Raises:

RuntimeError – When none of the provided fields are

fetch_dataset_class(class_name: str) type[HyraxDataset][source]#

Fetch the dataset class from the registry.

Parameters:

class_name (str) – The name of the dataset class to fetch. Either the class name of a built in dataset, or the fully qualified name of a user-defined dataset. e.g. “my_module.my_submodule.MyDatasetClass” or “HyraxRandomDataset”.

Returns:

The dataset class.

Return type:

type[HyraxDataset]

Raises:
  • ValueError – If a built in dataset was requested, but not found in the registry.

  • ValueError – If no dataset was specified in the runtime configuration.

class HyraxImageDataset[source]#

This is a mixin for Image datasets primarily concerned with providing utility functions to allow derived classes to set and apply transformations based on configs.

The various set_*_transform functions stack individual transformations on a single stack

The stack can be applied with apply_transform.

set_function_transform()[source]#
set_crop_transform(cutout_shape=None)[source]#
apply_transform(data_torch)[source]#
_update_transform(new_transform)[source]#
_get_np_function(transform_str: str) collections.abc.Callable[Ellipsis, Any][source]#

_get_np_function. Returns the numpy mathematical function that the supplied string maps to; or raises an error if the supplied string cannot be mapped to a function.

Parameters:

transform_str (str) – The string to me mapped to a numpy function