hyrax.data_sets.inference_dataset
Attributes
Classes
This is a dataset class to represent the situations where we wish to treat the output of inference |
|
Class to write out inference datasets. Used by infer, umap to consistently write out numpy |
Module Contents
- class InferenceDataSet(config, results_dir: pathlib.Path | str | None = None, verb: str | None = None)[source]
Bases:
hyrax.data_sets.data_set_registry.HyraxDataset,torch.utils.data.DatasetThis is a dataset class to represent the situations where we wish to treat the output of inference as a dataset. e.g. when performing umap/visualization operations
Initialize an InferenceDataSet object.
As a user of this code, you should almost never create this class, Instances of this class are returned by the umap and infer verbs. Prefer those over creating your own.
If you do end up creating your own class, you will need a hyrax config, and to know some things about where the result you are interested in is stored.
- Parameters:
config (dict) – The hyrax config dictionary
results_dir (Optional[Union[Path, str]], optional) –
The results subdirectory of the inference or umap results you want to access, by default None. If no results subdirectory is provided, this function will attempt the following in order:
Use the directory specified in
config['results']['inference_dir']if set and the directory existsLook in the results configured in
config['general']['results_dir'](./results/by default), then use the most recent results directory corresponding to the verb specified.
verb (Optional[str], optional) – The name of the verb that generated the results, only important when the most recent results are being fetched. If no verb is provided, “infer” will be assumed.
- Raises:
RuntimeError – When the provided results directory is corrupt, or cannot be found.
- _shape()[source]
The shape of the dataset (Discovered from files)
- Returns:
Tuple with the shape of an individual element of the dataset
- Return type:
Tuple
- ids() collections.abc.Generator[str][source]
IDs of this dataset. Will return a string generator with IDs.
These IDs are the IDs of the dataset used originally to generate this dataset.
- Returns:
Generator that yields the string ids of this dataset
- Return type:
Generator[str]
- Yields:
Generator[str] – Yields the string ids of this dataset
- __getitem__(idx: int | numpy.ndarray)[source]
Implements the
[]operator- Parameters:
idx (Union[int, np.ndarray]) – Either an index or a numpy array of indexes. These are NOT the ID values of the dataset, but rather a zero-based index starting at the beginning of the inference dataset.
- Returns:
Either the tensor corresponding to a single result, or a tensor with a multiplicity of results if multiple indexes were passed.
- Return type:
torch.tensor
- __len__() int[source]
Returns the length of the dataset.
- Returns:
Length of the dataset.
- Return type:
int
- property original_config: dict[source]
Get the original configuration for the dataset used to generate this inference dataset
Since this sort of dataset is definitionally an intermediate product, this returns the runtime config used to construct that dataset rather than this one.
- Returns:
Configuration that can be used to create the original dataset that was used as input for whatever inference process created this dataset.
- Return type:
- metadata_fields() list[str][source]
Get the metadata fields associted with the original dataset used to generate this one
- Returns:
List of valid field names for metadata queries
- Return type:
list[str]
- metadata(idxs: numpy.typing.ArrayLike, fields: list[str]) numpy.typing.ArrayLike[source]
Get metadata associated with the data in the InferenceDataSet. This metadata comes from the original dataset, but is indexed according to the InferenceDataSet.
- Parameters:
idxs (npt.ArrayLike) – Indexes in the InferenceDataSet for which metadata is desired
fields (list[str]) – Metadata fields requested
- Returns:
An array where the rows correspond to the passed list of indexes and the columns correspond to the fields passed. Order is preserved- metadata[i] corresponds to idxs[i].
- Return type:
npt.ArrayLike
- class InferenceDataSetWriter(original_dataset: torch.utils.data.Dataset, result_dir: str | pathlib.Path)[source]
Class to write out inference datasets. Used by infer, umap to consistently write out numpy files in batches which can be read by InferenceDataSet.
With the exception of building ID->Batch indexing info, this is implemented as a bag-o-functions that manipulate the filesystem directly as their primary effect.
- write_batch(ids: numpy.ndarray, tensors: list[numpy.ndarray])[source]
Write a batch of tensors into the dataset. This writes the whole batch immediately. Caller is in charge of batch size consistency considerations, and that ids is the same length as tensors
- Parameters:
ids (np.ndarray) – Array of IDs, dtype of the elements must match the dtype type of the ids of the original dataset used to construct this InferenceDataSetWriter.
tensors (list[np.ndarray]) – List of consistently dimensioned numpy arrays to save.