hyrax.vector_dbs
================

.. py:module:: hyrax.vector_dbs


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/hyrax/vector_dbs/chromadb_impl/index
   /autoapi/hyrax/vector_dbs/qdrantdb_impl/index
   /autoapi/hyrax/vector_dbs/vector_db_factory/index
   /autoapi/hyrax/vector_dbs/vector_db_interface/index


Classes
-------

.. autoapisummary::

   hyrax.vector_dbs.ChromaDB


Package Contents
----------------

.. py:class:: ChromaDB(config, context)

   Bases: :py:obj:`hyrax.vector_dbs.vector_db_interface.VectorDB`


   Implementation of the VectorDB interface using ChromaDB as the backend.

   .. py:method:: __init__

   Create a new instance of a `VectorDB` object.

   :param config: An instance of the runtime configuration, by default None
   :type config: dict, optional
   :param context: An instance of the context object, by default None
   :type context: dict, optional


   .. py:attribute:: chromadb_client
      :value: None



   .. py:attribute:: collection
      :value: None



   .. py:attribute:: shard_index
      :value: 0



   .. py:attribute:: shard_size
      :value: 0



   .. py:attribute:: shard_size_limit


   .. py:attribute:: vector_size_limit


   .. py:attribute:: min_shards_for_parallelization
      :value: 50



   .. py:method:: connect()

      Create a database connection



   .. py:method:: create()

      Create a new database



   .. py:method:: insert(ids: list[Union[str, int]], vectors: list[numpy.ndarray])

      Insert a batch of vectors into the database.

      :param ids: The ids to associate with the vectors
      :type ids: list[Union[str | int]]
      :param vectors: The vectors to insert into the database
      :type vectors: list[np.ndarray]



   .. py:method:: search_by_id(id: Union[str, int], k: int = 1) -> dict[int, list[Union[str, int]]]

      Get the ids of the k nearest neighbors for a given id in the database.

      :param id: The id of the vector in the database for which we want to find the
                 k nearest neighbors. If type `int` is provided, it will be converted
                 to a string.
      :type id: Union[str | int]
      :param k: The number of nearest neighbors to return. By default 1, return only
                the closest neighbor - this is almost always the same as the input.
      :type k: int, optional

      :returns: Dictionary with input id as the key and the ids of the k
                nearest neighbors as the value. Because this function accepts only 1
                id, the key will always be 0. i.e. {0: [id1, id2, ...]}
      :rtype: dict[int, list[Union[str, int]]]

      :raises ValueError: If more than one vector is found for the given id



   .. py:method:: search_by_vector(vectors: Union[numpy.ndarray, list[numpy.ndarray]], k: int = 1) -> dict[int, list[Union[str, int]]]

      Get the ids of the k nearest neighbors for a given vector.

      :param vectors: The vector to use when searching for nearest neighbors
      :type vectors: Union[np.ndarray, list[np.ndarray]]
      :param k: The number of nearest neighbors to return, by default 1, return only
                the closest neighbor
      :type k: int, optional

      :returns: Dictionary with input vector index as the key and the ids of the k
                nearest neighbors as the value.
      :rtype: dict[int, list[Union[str, int]]]



   .. py:method:: get_by_id(ids: list[Union[str, int]]) -> dict[Union[str, int], list[float]]

      Retrieve the vectors associated with a list of ids.

      :param ids: The ids of the vectors to retrieve. For ChromaDB instances, these should
                  always be strings.
      :type ids: list[Union[str, int]]

      :returns: Dictionary with the ids as the keys and the vectors as the values.
      :rtype: dict[str, list[float]]



   .. py:method:: _get_ids(ids: list[Union[str, int]]) -> set[str]

      For the given list of ids, return the ids that are already in the database.

      :param ids: The ids of the vectors to retrieve. For ChromaDB instances, these should
                  always be strings.
      :type ids: list[Union[str, int]]

      :returns: Set of ids that are already in the database.
      :rtype: set(str)



