hyrax.data_sets.tensor_cache_mixin

hyrax.data_sets.tensor_cache_mixin#

Attributes#

logger

Classes#

TensorCacheMixin

Mixin class providing in-memory tensor caching functionality for datasets.

Module Contents#

logger[source]#

class TensorCacheMixin[source]#

Bases: abc.ABC

Mixin class providing in-memory tensor caching functionality for datasets.

This mixin provides: - use_cache: Cache tensors in memory after first load - preload_cache: Preload all tensors in background thread - Efficient tensor cache management with hit/miss tracking - Background preloading with parallel processing

Classes using this mixin must implement: - _load_tensor_for_cache(object_id: str) -> torch.Tensor - ids() -> Generator[str] (iterator over object IDs) - __len__() -> int

_init_tensor_cache(config)[source]#: Initialize tensor caching. Call this from __init__ after other setup.

abstractmethod _load_tensor_for_cache(object_id: str)[source]#

Load tensor for the given object_id. Must be implemented by subclasses.

Parameters:: object_id (str) – The object ID to load tensor for
Returns:: The loaded tensor
Return type:: torch.Tensor

abstractmethod ids(log_every: int | None = None) → collections.abc.Generator[str, None, None][source]#

Iterator over all object IDs. Must be implemented by subclasses.

Parameters:: log_every (Optional[int]) – Log progress every N objects
Yields:: str – Object IDs in the dataset

_check_object_id_to_tensor_cache(object_id: str)[source]#: Check if tensor is already cached.

_populate_object_id_to_tensor_cache(object_id: str)[source]#: Load tensor and populate cache.

_object_id_to_tensor_cached(object_id: str)[source]#

Get tensor for object_id with caching support.

Parameters:: object_id (str) – The object_id requested
Returns:: The tensor for the object
Return type:: torch.Tensor

static _determine_numprocs_preload()[source]#: Determine number of processes for preloading.

_preload_tensor_cache()[source]#: Preload all tensors in the dataset using multiple threads.

_lazy_map_executor(executor: concurrent.futures.Executor, ids: collections.abc.Iterable[str])[source]#

Lazy evaluation version of concurrent.futures.Executor.map().

This limits memory usage during preloading by keeping only a small number of tensors in memory at once.

Parameters:

executor (concurrent.futures.Executor) – An executor for running futures
ids (Iterable[str]) – An iterable list of object IDs

Yields:

Iterator[torch.Tensor] – An iterator over torch tensors, lazily loaded

_log_duration_tensorboard(name: str, start_time: int)[source]#

Log a duration to tensorboardX if configured.

Parameters:

name (str) – The name of the scalar to log
start_time (int) – Start time in nanoseconds from time.monotonic_ns()