hyrax.datasets.data_cache#

Attributes#

Classes#

DataCache

Per-dataset caching layer for DataProvider.

Module Contents#

logger[source]#
tensorboardx_logger[source]#
class DataCache(config: dict, datasets: dict[str, hyrax.datasets.dataset_registry.HyraxDataset], augment_active: dict[str, bool])[source]#

Per-dataset caching layer for DataProvider.

Each dataset (friendly name) gets two cache maps:

  • base cache — keyed by real_idx (an int), stores the result of get_<field> calls. No dataset method is called to produce the key.

  • augment cache — keyed by the return value of the dataset’s augment_cache_key method, stores augmented results. Only populated when the dataset opts in by returning a non-None key.

try_fetch checks the augment cache first (when applicable), then falls back to the base cache.

One config controls this functionality:

h.config["data_set"]["use_cache"] — when True, data dicts are cached after the first access so subsequent accesses are served from memory.

Initialize the DataCache.

Parameters:
  • config (dict) – The Hyrax configuration.

  • datasets (dict[str, HyraxDataset]) – Mapping of friendly_name to dataset instance. Used to call augment_cache_key for augmented data caching.

  • augment_active (dict[str, bool]) – Mapping of friendly_name to whether augmentation is active for that dataset. When True, try_fetch will check the augment cache before falling back to the base cache.

_use_cache[source]#
_datasets[source]#
_augment_active[source]#
_data_size_bytes = 0[source]#
_insert_count = 0[source]#
logging_interval = 1000[source]#
_base_cache: dict[str, dict[int, dict]][source]#
_augment_cache: dict[str, dict[numpy.int64, dict]][source]#
try_fetch(friendly_name: str, real_idx: int, rng_seed: numpy.int64 | None = None) tuple[dict | None, bool][source]#

Try to fetch cached data for a single dataset.

When augmentation is active and rng_seed is provided, this checks the augment cache first. On miss it falls back to the base cache.

Parameters:
  • friendly_name (str) – The dataset friendly name.

  • real_idx (int) – The dataset-local index.

  • rng_seed (np.int64 | None) – The augmentation RNG seed, or None for non-augmented access.

Returns:

(data, already_augmented) where data is the cached field dict or None on miss, and already_augmented indicates whether the cached data includes augmentation.

Return type:

tuple[dict | None, bool]

insert_base(friendly_name: str, real_idx: int, data: dict[str, Any])[source]#

Insert base (non-augmented) field data into the cache.

Parameters:
  • friendly_name (str) – The dataset friendly name.

  • real_idx (int) – The dataset-local index (used directly as cache key).

  • data (dict[str, Any]) – The field data dict to cache.

insert_augmented(friendly_name: str, real_idx: int, rng_seed: numpy.int64, data: dict[str, Any])[source]#

Insert augmented field data into the cache.

Calls augment_cache_key to determine the cache key. If the key is None, this is a no-op (the dataset opted out of caching augmented data).

Parameters:
  • friendly_name (str) – The dataset friendly name.

  • real_idx (int) – The dataset-local index.

  • rng_seed (np.int64) – The augmentation RNG seed.

  • data (dict[str, Any]) – The augmented field data dict to cache.

_do_insert(cache_map: dict, cache_key, data: dict[str, Any])[source]#
static _data_size(data, seen: set[int] | None = None) int[source]#