hyrax.datasets.dataset_registry#
Attributes#
Classes#
How to make a hyrax dataset: |
|
This is a mixin for Image datasets primarily concerned with providing utility functions to |
Functions#
|
Fetch the dataset class from the registry. |
Module Contents#
- DATASET_REGISTRY: dict[str, type[HyraxDataset]][source]#
- class HyraxDataset(config: dict, metadata_table=None, object_id_column_name=None)[source]#
How to make a hyrax dataset:
from hyrax.datasets import HyraxDataset class MyDataset(HyraxDataset): def __init__(self, config: dict): super().__init__(config) def __len__(self): # Your len function goes here pass
Optional interfaces:
metadata-> Subclasses may pass an astropy table of metadata to__init__in the superclass. This table of metadata will be available through themetadata_fieldsandmetadatafunctions. If desired, a subclass may override these functions directly rather than using the astropy Table interface.Further documentation is in the Build a dataset class in a notebook example notebook.
Overall initialization for all Datasets which saves the config
Subclasses of HyraxDataset ought call this at the end of their __init__ like:
from hyrax.datasets import HyraxDataset class MyDataset(HyraxDataset): def __init__(config): <your code> super().__init__(config)
If per tensor metadata is available, it is recommended that dataset authors create an astropy Table of that data, in the same order as their data and pass that metadata_table as shown below:
from hyrax.datasets import HyraxDataset from astropy.table import Table class MyDataset(HyraxDataset): def __init__(config): <your code> metadata_table = Table(<Your catalog data goes here>) super().__init__(config, metadata_table)
- Parameters:
config (dict, Optional) – The runtime configuration for hyrax
metadata_table (Optional[Table], optional) – An Astropy Table with 1. the metadata columns desired for visualization AND 2. in the order your data will be enumerated.
object_id_column_name (Optional[str], optional) – The name of the column containing object IDs. If None, uses the default from config or creates one from the ids() method.
- metadata_fields() list[str][source]#
Returns a list of metadata fields supported by this object
- Returns:
The column names of the metadata table passed. Empty string if no metadata was provided at during construction of the HyraxDataset (or derived class).
- Return type:
list[str]
- augment_cache_key(idx: int, rng_seed: numpy.int64) numpy.int64 | None[source]#
Return a cache key for augmented data, or None to skip caching.
Base (non-augmented) data is always cached by index. This method is only called when augmentation is active, to decide whether the augmented result should also be cached. The default returns
None(augmented data is regenerated each access), which is the standard expectation in ML training.Override this when augmented results are deterministic and expensive to recompute.
- Parameters:
idx (int) – The dataset-local index.
rng_seed (np.int64) – The rng_seed passed to
augment_<field>methods.
- Returns:
Cache key, or
Noneto skip caching augmented data.- Return type:
np.int64 | None
- on_epoch_start(verb: str)[source]#
Called at the beginning of each epoch (or once for single-pass verbs).
Override in subclasses to respond to epoch-level lifecycle events.
- Parameters:
verb (str) – Name of the verb that is running, e.g.
"train","infer","test", or"engine".
- metadata(idxs: numpy.typing.ArrayLike, fields: list[str]) numpy.typing.ArrayLike[source]#
Returns a table representing the metadata given an array of indexes and a list of fields.
- Parameters:
idxs (npt.ArrayLike) – The indexes of the relevant tensor objects
fields (list[str]) – The names of the fields you would like returned. All values must be among those returned by metadata_fields()
- Returns:
A numpy record array of your metadata, with only the columns specified. Roughly equivalent to: metadata_table[idxs][fields].as_array() where metadata_table is the astropy table that the HyraxDataset (or derived class) was constructed with.
- Return type:
npt.ArrayLike
- Raises:
RuntimeError – When none of the provided fields are
- fetch_dataset_class(class_name: str) type[HyraxDataset][source]#
Fetch the dataset class from the registry.
- Parameters:
class_name (str) – The name of the dataset class to fetch. Either the class name of a built in dataset, or the fully qualified name of a user-defined dataset. e.g. “my_module.my_submodule.MyDatasetClass” or “HyraxRandomDataset”.
- Returns:
The dataset class.
- Return type:
type[HyraxDataset]
- Raises:
ValueError – If a built in dataset was requested, but not found in the registry.
ValueError – If no dataset was specified in the runtime configuration.
- class HyraxImageDataset[source]#
This is a mixin for Image datasets primarily concerned with providing utility functions to allow derived classes to set and apply transformations based on configs.
The various set_*_transform functions stack individual transformations on a single stack
The stack can be applied with apply_transform.
- _get_np_function(transform_str: str) collections.abc.Callable[Ellipsis, Any][source]#
_get_np_function. Returns the numpy mathematical function that the supplied string maps to; or raises an error if the supplied string cannot be mapped to a function.
- Parameters:
transform_str (str) – The string to me mapped to a numpy function