hyrax.datasets.lancedb_dataset
==============================

.. py:module:: hyrax.datasets.lancedb_dataset


Attributes
----------

.. autoapisummary::

   hyrax.datasets.lancedb_dataset._ROW_CACHE_SIZE


Classes
-------

.. autoapisummary::

   hyrax.datasets.lancedb_dataset.LanceDBDataset


Module Contents
---------------

.. py:data:: _ROW_CACHE_SIZE
   :value: 16


.. py:class:: LanceDBDataset(config: dict, data_location: pathlib.Path | str | None = None)

   Bases: :py:obj:`hyrax.datasets.dataset_registry.HyraxDataset`


   A minimal Hyrax wrapper around a LanceDB table.

   .. py:method:: __init__

   Overall initialization for all Datasets which saves the config

   Subclasses of HyraxDataset ought call this at the end of their __init__ like:

   .. code-block:: python

       from hyrax.datasets import HyraxDataset

       class MyDataset(HyraxDataset):
           def __init__(config):
               <your code>
               super().__init__(config)

   If per tensor metadata is available, it is recommended that dataset authors create an
   astropy Table of that data, in the same order as their data and pass that `metadata_table`
   as shown below:

   .. code-block:: python

       from hyrax.datasets import HyraxDataset
       from astropy.table import Table

       class MyDataset(HyraxDataset):
           def __init__(config):
               <your code>
               metadata_table = Table(<Your catalog data goes here>)
               super().__init__(config, metadata_table)

   :param config: The runtime configuration for hyrax
   :type config: dict, Optional
   :param metadata_table: An Astropy Table with
                          1. the metadata columns desired for visualization AND
                          2. in the order your data will be enumerated.
   :type metadata_table: Optional[Table], optional
   :param object_id_column_name: The name of the column containing object IDs. If None, uses the default
                                 from config or creates one from the ids() method.
   :type object_id_column_name: Optional[str], optional


   .. py:attribute:: data_location
      :value: ''



   .. py:attribute:: table_name


   .. py:attribute:: connect_kwargs


   .. py:attribute:: open_table_kwargs


   .. py:attribute:: db


   .. py:attribute:: table


   .. py:attribute:: lance_dataset


   .. py:attribute:: _row_cache
      :type:  collections.OrderedDict


   .. py:method:: _all_available_fields() -> list[str]


   .. py:method:: _get_row(idx: int)

      Return the PyArrow record-batch for *idx*, using a small FIFO row cache.

      Caching avoids redundant ``lance_dataset.take`` calls when multiple
      ``get_<field>`` accessors are invoked for the same sample index, which is
      the common pattern when DataProvider resolves all fields for a single item.
      The cache holds at most ``_ROW_CACHE_SIZE`` rows; the oldest entry is
      evicted once that limit is reached.



   .. py:method:: _resolve_table_name(configured_table_name) -> str


   .. py:method:: _register_getters() -> None


   .. py:method:: __len__() -> int


