hyrax.verbs.reduction_algorithms.pca

hyrax.verbs.reduction_algorithms.pca#

Attributes#

logger

Classes#

PCA

PCA reduction implementation.

Module Contents#

logger[source]#

class PCA(config: dict, reduction_results=None)[source]#

Bases: hyrax.verbs.reduction_algorithms.algorithm_registry.ReductionAlgorithm

PCA reduction implementation.

reducer[source]#

save_model(results_dir: pathlib.Path)[source]#

Save the fitted PCA model to a pickle file.

Parameters:: results_dir (Path) – The directory where the model should be saved. The model will be saved as ‘pca.pickle’ in this directory.

load_model(expected_input_dim: int, model_path: pathlib.Path | str | None = None)[source]#

Load a pre-existing PCA model from disk.

Parameters:

expected_input_dim (int) – The expected number of input features for the loaded model.
model_path (Path or str, optional) – The path to the file to load the model from. If not specified, method will look in the config for a default model path.

_validate_pca_model(reducer, expected_input_dim: int) → None[source]#

Validate the loaded PCA model. Checks that the loaded object is a PCA instance and that its input and output dimensions match the expected values.

Parameters:

reducer (object) – The loaded model object to validate.
expected_input_dim (int) – The expected number of input features for the loaded model.

Raises:

ValueError – If the loaded model is not a PCA instance or if its input/output dimensions are incompatible.

fit(data_sample: numpy.ndarray)[source]#

Fit the PCA model to a sample of inference data. The fitted model is stored in the instance variable self.reducer and can be used for transforming data.

Parameters:: data_sample (numpy.ndarray) – The data sample used to fit the model.

transform(args: dict, num_batches: int)[source]#

Transform the data with the fitted PCA model. Use parallel processing if specified in the config.

Parameters:

args (dict) – A dictionary containing the data to be transformed.
num_batches (int) – The total number of batches that the data is split into for transformation.