hyrax.datasets.data_cache
=========================

.. py:module:: hyrax.datasets.data_cache


Attributes
----------

.. autoapisummary::

   hyrax.datasets.data_cache.logger
   hyrax.datasets.data_cache.tensorboardx_logger


Classes
-------

.. autoapisummary::

   hyrax.datasets.data_cache.DataCache


Module Contents
---------------

.. py:data:: logger

.. py:data:: tensorboardx_logger

.. py:class:: DataCache(config: dict, datasets: dict[str, hyrax.datasets.dataset_registry.HyraxDataset], augment_active: dict[str, bool])

   Per-dataset caching layer for DataProvider.

   Each dataset (friendly name) gets two cache maps:

   * **base cache** — keyed by ``real_idx`` (an int), stores the result of
     ``get_<field>`` calls.  No dataset method is called to produce the key.
   * **augment cache** — keyed by the return value of the dataset's
     ``augment_cache_key`` method, stores augmented results.  Only populated
     when the dataset opts in by returning a non-None key.

   ``try_fetch`` checks the augment cache first (when applicable), then falls
   back to the base cache.

   One config controls this functionality:

   ``h.config["data_set"]["use_cache"]`` — when True, data dicts are cached
   after the first access so subsequent accesses are served from memory.

   Initialize the DataCache.

   :param config: The Hyrax configuration.
   :type config: dict
   :param datasets: Mapping of friendly_name to dataset instance. Used to call
                    ``augment_cache_key`` for augmented data caching.
   :type datasets: dict[str, HyraxDataset]
   :param augment_active: Mapping of friendly_name to whether augmentation is active
                          for that dataset. When True, ``try_fetch`` will check the
                          augment cache before falling back to the base cache.
   :type augment_active: dict[str, bool]


   .. py:attribute:: _use_cache


   .. py:attribute:: _datasets


   .. py:attribute:: _augment_active


   .. py:attribute:: _data_size_bytes
      :value: 0



   .. py:attribute:: _insert_count
      :value: 0



   .. py:attribute:: logging_interval
      :value: 1000



   .. py:attribute:: _base_cache
      :type:  dict[str, dict[int, dict]]


   .. py:attribute:: _augment_cache
      :type:  dict[str, dict[numpy.int64, dict]]


   .. py:method:: try_fetch(friendly_name: str, real_idx: int, rng_seed: numpy.int64 | None = None) -> tuple[dict | None, bool]

      Try to fetch cached data for a single dataset.

      When augmentation is active and ``rng_seed`` is provided, this checks
      the augment cache first.  On miss it falls back to the base cache.

      :param friendly_name: The dataset friendly name.
      :type friendly_name: str
      :param real_idx: The dataset-local index.
      :type real_idx: int
      :param rng_seed: The augmentation RNG seed, or None for non-augmented access.
      :type rng_seed: np.int64 | None

      :returns: ``(data, already_augmented)`` where ``data`` is the cached
                field dict or ``None`` on miss, and ``already_augmented``
                indicates whether the cached data includes augmentation.
      :rtype: tuple[dict | None, bool]



   .. py:method:: insert_base(friendly_name: str, real_idx: int, data: dict[str, Any])

      Insert base (non-augmented) field data into the cache.

      :param friendly_name: The dataset friendly name.
      :type friendly_name: str
      :param real_idx: The dataset-local index (used directly as cache key).
      :type real_idx: int
      :param data: The field data dict to cache.
      :type data: dict[str, Any]



   .. py:method:: insert_augmented(friendly_name: str, real_idx: int, rng_seed: numpy.int64, data: dict[str, Any])

      Insert augmented field data into the cache.

      Calls ``augment_cache_key`` to determine the cache key. If the key
      is ``None``, this is a no-op (the dataset opted out of caching
      augmented data).

      :param friendly_name: The dataset friendly name.
      :type friendly_name: str
      :param real_idx: The dataset-local index.
      :type real_idx: int
      :param rng_seed: The augmentation RNG seed.
      :type rng_seed: np.int64
      :param data: The augmented field data dict to cache.
      :type data: dict[str, Any]



   .. py:method:: _do_insert(cache_map: dict, cache_key, data: dict[str, Any])


   .. py:method:: _data_size(data, seen: set[int] | None = None) -> int
      :staticmethod:



