hyrax.vector_dbs#
Submodules#
Classes#
Implementation of the VectorDB interface using ChromaDB as the backend. |
Package Contents#
- class ChromaDB(config, context)[source]#
Bases:
hyrax.vector_dbs.vector_db_interface.VectorDBImplementation of the VectorDB interface using ChromaDB as the backend.
Create a new instance of a VectorDB object.
- Parameters:
config (dict, optional) – An instance of the runtime configuration, by default None
context (dict, optional) – An instance of the context object, by default None
- chromadb_client = None#
- collection = None#
- shard_index = 0#
- shard_size = 0#
- shard_size_limit#
- vector_size_limit#
- min_shards_for_parallelization = 50#
- insert(ids: list[str | int], vectors: list[numpy.ndarray])[source]#
Insert a batch of vectors into the database.
- Parameters:
ids (list[Union[str | int]]) – The ids to associate with the vectors
vectors (list[np.ndarray]) – The vectors to insert into the database
- search_by_id(id: str | int, k: int = 1) dict[int, list[str | int]][source]#
Get the ids of the k nearest neighbors for a given id in the database.
- Parameters:
id (Union[str | int]) – The id of the vector in the database for which we want to find the k nearest neighbors. If type int is provided, it will be converted to a string.
k (int, optional) – The number of nearest neighbors to return. By default 1, return only the closest neighbor - this is almost always the same as the input.
- Returns:
Dictionary with input id as the key and the ids of the k nearest neighbors as the value. Because this function accepts only 1 id, the key will always be 0. i.e. {0: [id1, id2, …]}
- Return type:
dict[int, list[Union[str, int]]]
- Raises:
ValueError – If more than one vector is found for the given id
- search_by_vector(vectors: numpy.ndarray | list[numpy.ndarray], k: int = 1) dict[int, list[str | int]][source]#
Get the ids of the k nearest neighbors for a given vector.
- Parameters:
vectors (Union[np.ndarray, list[np.ndarray]]) – The vector to use when searching for nearest neighbors
k (int, optional) – The number of nearest neighbors to return, by default 1, return only the closest neighbor
- Returns:
Dictionary with input vector index as the key and the ids of the k nearest neighbors as the value.
- Return type:
dict[int, list[Union[str, int]]]
- get_by_id(ids: list[str | int]) dict[str | int, list[float]][source]#
Retrieve the vectors associated with a list of ids.
- Parameters:
ids (list[Union[str, int]]) – The ids of the vectors to retrieve. For ChromaDB instances, these should always be strings.
- Returns:
Dictionary with the ids as the keys and the vectors as the values.
- Return type:
dict[str, list[float]]
- _get_ids(ids: list[str | int]) set[str][source]#
For the given list of ids, return the ids that are already in the database.
- Parameters:
ids (list[Union[str, int]]) – The ids of the vectors to retrieve. For ChromaDB instances, these should always be strings.
- Returns:
Set of ids that are already in the database.
- Return type:
set(str)