hyrax.datasets.data_cache

hyrax.datasets.data_cache#

Attributes#

`logger`
`tensorboardx_logger`

Classes#

DataCache

Per-dataset caching layer for DataProvider.

Module Contents#

logger[source]#

tensorboardx_logger[source]#

class DataCache(config: dict, datasets: dict[str, hyrax.datasets.dataset_registry.HyraxDataset], augment_active: dict[str, bool])[source]#

Per-dataset caching layer for DataProvider.

Each dataset (friendly name) gets two cache maps:

base cache — keyed by real_idx (an int), stores the result of get_<field> calls. No dataset method is called to produce the key.
augment cache — keyed by the return value of the dataset’s augment_cache_key method, stores augmented results. Only populated when the dataset opts in by returning a non-None key.

try_fetch checks the augment cache first (when applicable), then falls back to the base cache.

One config controls this functionality:

h.config["data_set"]["use_cache"] — when True, data dicts are cached after the first access so subsequent accesses are served from memory.

Initialize the DataCache.

Parameters:

config (dict) – The Hyrax configuration.
datasets (dict[str, HyraxDataset]) – Mapping of friendly_name to dataset instance. Used to call augment_cache_key for augmented data caching.
augment_active (dict[str, bool]) – Mapping of friendly_name to whether augmentation is active for that dataset. When True, try_fetch will check the augment cache before falling back to the base cache.

_use_cache[source]#

_datasets[source]#

_augment_active[source]#

_data_size_bytes = 0[source]#

_insert_count = 0[source]#

logging_interval = 1000[source]#

_base_cache: dict[str, dict[int, dict]][source]#

_augment_cache: dict[str, dict[numpy.int64, dict]][source]#

try_fetch(friendly_name: str, real_idx: int, rng_seed: numpy.int64 | None = None) → tuple[dict | None, bool][source]#

Try to fetch cached data for a single dataset.

When augmentation is active and rng_seed is provided, this checks the augment cache first. On miss it falls back to the base cache.

Parameters:

friendly_name (str) – The dataset friendly name.
real_idx (int) – The dataset-local index.
rng_seed (np.int64 | None) – The augmentation RNG seed, or None for non-augmented access.

Returns:

(data, already_augmented) where data is the cached field dict or None on miss, and already_augmented indicates whether the cached data includes augmentation.

Return type:

tuple[dict | None, bool]

insert_base(friendly_name: str, real_idx: int, data: dict[str, Any])[source]#

Insert base (non-augmented) field data into the cache.

Parameters:

friendly_name (str) – The dataset friendly name.
real_idx (int) – The dataset-local index (used directly as cache key).
data (dict[str, Any]) – The field data dict to cache.

insert_augmented(friendly_name: str, real_idx: int, rng_seed: numpy.int64, data: dict[str, Any])[source]#

Insert augmented field data into the cache.

Calls augment_cache_key to determine the cache key. If the key is None, this is a no-op (the dataset opted out of caching augmented data).

Parameters:

friendly_name (str) – The dataset friendly name.
real_idx (int) – The dataset-local index.
rng_seed (np.int64) – The augmentation RNG seed.
data (dict[str, Any]) – The augmented field data dict to cache.

_do_insert(cache_map: dict, cache_key, data: dict[str, Any])[source]#

static _data_size(data, seen: set[int] | None = None) → int[source]#