hyrax.datasets.result_dataset
=============================

.. py:module:: hyrax.datasets.result_dataset

.. autoapi-nested-parse::

   Lance-based storage for inference results.

   This module provides ResultDataset and ResultDatasetWriter classes that store
   inference results in Lance columnar format instead of batched .npy files.



Attributes
----------

.. autoapisummary::

   hyrax.datasets.result_dataset.logger
   hyrax.datasets.result_dataset.TABLE_NAME
   hyrax.datasets.result_dataset.LANCE_DB_DIR


Classes
-------

.. autoapisummary::

   hyrax.datasets.result_dataset.ResultDatasetWriter
   hyrax.datasets.result_dataset.ResultDataset


Module Contents
---------------

.. py:data:: logger

.. py:data:: TABLE_NAME
   :value: 'results'


.. py:data:: LANCE_DB_DIR
   :value: 'lance_db'


.. py:class:: ResultDatasetWriter(result_dir: Union[str, pathlib.Path])

   Writer for Lance-based inference results.

   Writes inference results incrementally to Lance format using table.add()
   for each batch, avoiding memory accumulation.

   Initialize the writer.

   :param result_dir: Directory where Lance database will be created
   :type result_dir: Union[str, Path]


   .. py:attribute:: result_dir


   .. py:attribute:: lance_dir


   .. py:attribute:: db
      :value: None



   .. py:attribute:: table
      :value: None



   .. py:attribute:: schema
      :value: None



   .. py:attribute:: tensor_dtype
      :value: None



   .. py:attribute:: tensor_shape
      :value: None



   .. py:attribute:: batch_count
      :value: 0



   .. py:method:: write_batch(object_ids: numpy.ndarray, data: list[numpy.ndarray])

      Write a batch of results incrementally.

      :param object_ids: Array of object IDs (will be converted to strings)
      :type object_ids: np.ndarray
      :param data: List of numpy arrays (tensors) to write
      :type data: list[np.ndarray]



   .. py:method:: commit()

      Finalize the write by optimizing the table.



   .. py:method:: _create_schema(sample_tensor: numpy.ndarray)

      Create PyArrow schema with tensor metadata.

      :param sample_tensor: Sample tensor to determine dtype and shape
      :type sample_tensor: np.ndarray



.. py:class:: ResultDataset(config: dict, data_location: Union[pathlib.Path, str])

   Bases: :py:obj:`hyrax.datasets.dataset_registry.HyraxDataset`


   Reader for Lance-based inference results.

   Provides HyraxQL-compatible getters to results stored in Lance format.

   Initialize the dataset.

   :param config: Hyrax configuration dictionary
   :type config: dict
   :param data_location: Path to results directory containing lance_db/
   :type data_location: Union[Path, str]


   .. py:attribute:: data_location


   .. py:attribute:: lance_dir


   .. py:attribute:: db


   .. py:attribute:: table


   .. py:attribute:: lance_dataset


   .. py:attribute:: tensor_shape


   .. py:attribute:: tensor_dtype


   .. py:method:: __len__() -> int

      Return the number of records in the dataset.



   .. py:method:: __getitem__(idx: Union[int, numpy.ndarray])

      Get data by index.

      :param idx: Single index or array of indices
      :type idx: Union[int, np.ndarray]

      :returns: Data tensor(s)
      :rtype: np.ndarray

      :raises IndexError: If index is out of range



   .. py:method:: __get_all__()

      Get all data tensors in the dataset.

      This is a specialized method that is meant for internal use (e.g. visualize_v2).
      It retrieves all tensors efficiently by assuming column names and accessing
      the array buffer directly, without creating Python objects for each row.

      :returns: All data tensors
      :rtype: np.ndarray



   .. py:method:: get_data(idx: int)

      Get data tensor at index (HyraxQL getter).

      :param idx: Index of the data item
      :type idx: int

      :returns: Data tensor
      :rtype: np.ndarray



   .. py:method:: get_object_id(idx: int) -> str

      Get object ID at index (HyraxQL getter).

      :param idx: Index of the data item
      :type idx: int

      :returns: Object ID
      :rtype: str



   .. py:method:: ids() -> list[str]

      Generate all object IDs.

      :returns: Object IDs in order
      :rtype: list[str]



