hyrax.pytorch_ignite

hyrax.pytorch_ignite#

Attributes#

`logger`
`_LEGACY_SPLIT_KEYS`
`_old__getattr__`
`_create_process_func`

Classes#

`SubsetSequentialSampler`	Samples elements sequentially from a given list of indices, without replacement.
`HyraxEvents`	Workaround event for a pytorch ignite bug. See fixup_engine for details

Functions#

`_new__getattr__`(self, name)
`_auto_model`(→ torch.nn.Module)
`setup_dataset`(→ dict[str, ...)	Create DataProvider instances for each requested data group.
`_build_data_provider`(config, request)	Build the right provider for a data-request group.
`setup_model`(→ torch.nn.Module)	Create a model object based on the configuration.
`setup_model_from_sample`(→ torch.nn.Module)	Create a model from a pre-formatted batch dict instead of a DataProvider.
`dist_data_loader`(→ torch.utils.data.DataLoader)	Create Pytorch Ignite distributed data loaders.
`_inner_loop`(func, prepare_inputs, device, config, ...)	This wraps a model-specific function (func) to move data to the appropriate device.
`create_process_func`(funcname, device, model, config)	Build the per-batch processing function used by the Ignite engine loop.
`create_engine`(→ ignite.engine.Engine)	Unified creation of the pytorch engine object for either an evaluator or trainer.
`extract_model_method`(model, method_name)	Extract a method from a model, which may be wrapped in DistributedDataParallel.
`create_evaluator`(→ ignite.engine.Engine)	Creates an evaluator engine
`create_validator`(→ ignite.engine.Engine)	This function creates a Pytorch Ignite engine object that will be used to
`create_tester`(→ ignite.engine.Engine)	This function creates a Pytorch Ignite engine object that will be used to
`attach_best_checkpoint`(→ None)	Attach a best-checkpoint handler to `engine`, scored on `engine.state.output["loss"]`.
`create_trainer`(→ ignite.engine.Engine)	This function is originally copied from here:
`create_save_batch_callback`(results_dir)	Create a callback function for saving batch results during inference or testing.
`fixup_engine`(engine)	Workaround for this pytorch ignite bug (pytorch/ignite#3372) where

Module Contents#

logger[source]#

_LEGACY_SPLIT_KEYS = ('train_size', 'validate_size', 'test_size')[source]#

_old__getattr__[source]#

_new__getattr__(self, name: str)[source]#

_auto_model(model: torch.nn.Module, sync_bn: bool = False, **kwargs: Any) → torch.nn.Module[source]#

class SubsetSequentialSampler(indices: collections.abc.Sequence[int], generator=None)[source]#

Bases: torch.utils.data.Sampler[int]

Samples elements sequentially from a given list of indices, without replacement.

Parameters:: indices – sequence a sequence of indices

indices: collections.abc.Sequence[int][source]#

generator = None[source]#

__iter__() → collections.abc.Iterator[int][source]#

__len__() → int[source]#

setup_dataset(config: dict, *, splits: tuple[str, Ellipsis] | None = None, shuffle: bool = True) → dict[str, hyrax.datasets.data_provider.DataProvider][source]#

Create DataProvider instances for each requested data group.

Parameters:

config (dict) – The runtime configuration.
splits (tuple[str, ...] | None, optional) – When provided, only create DataProvider instances for the listed groups. When None every group in the data_request is loaded.
shuffle (bool, optional) – Unused; kept for backward-compatibility with call sites that still pass it. Split shuffling is now handled by splitting_utils.create_splits.

Returns:

Mapping of data group names to DataProvider instances.

Return type:

dict[str, DataProvider]

_build_data_provider(config: dict, request: dict)[source]#

Build the right provider for a data-request group.

Streaming (IterableDataset) datasets are wrapped in a StreamingDataProvider; all other (map-style) datasets use DataProvider.

setup_model(config: dict, dataset: hyrax.datasets.data_provider.DataProvider) → torch.nn.Module[source]#

Create a model object based on the configuration.

Parameters:

config (dict) – The runtime configuration
dataset (DataProvider) – The dataset object that will provide data to the model for training or inference. Here it is only used to provide a data sample to the model so that it can resize itself at runtime if necessary.

Returns:

An instance of the model class specified in the configuration

Return type:

torch.nn.Module

setup_model_from_sample(config: dict, sample_batch: dict) → torch.nn.Module[source]#

Create a model from a pre-formatted batch dict instead of a DataProvider.

Like setup_model() but accepts a batch dict directly, bypassing the DataLoader/DataProvider pipeline. Used by InferStream to pre-flight model architecture without a dataset.

Parameters:

config (dict) – The runtime configuration.
sample_batch (dict) – A representative batch with the same structure as batches that will be processed later (e.g. {"object_id": [...], "data": {...}}).

Returns:

An instance of the model class specified in the configuration.

Return type:

torch.nn.Module

dist_data_loader(dataset: torch.utils.data.Dataset, config: dict, shuffle: bool = False) → torch.utils.data.DataLoader[source]#

Create Pytorch Ignite distributed data loaders.

It is recommended that each verb needing dataloaders only call this function once.

Parameters:

dataset (hyrax.datasets.dataset_registry.HyraxDataset) – A Hyrax dataset instance. When dataset is a hyrax.datasets.data_provider.DataProvider with split_indices set (by create_splits()), the loader is restricted to those indices via a Subset. When split_weights is also set, a WeightedRandomSampler is used so that under-represented classes are over-sampled to achieve the configured class distribution.
config (dict) – Hyrax runtime configuration
shuffle (bool, optional) – If True and no weights are present, a SubsetRandomSampler is used for uniform shuffling. If False and no weights, a sequential sampler preserves deterministic order. Ignored when split_weights is set (weighted sampling always draws with replacement). Defaults to False so non-training verbs preserve deterministic order.

Returns:

The distributed dataloader.

Return type:

DataLoader

_inner_loop(func, prepare_inputs, device, config, engine, batch)[source]#: This wraps a model-specific function (func) to move data to the appropriate device.

create_process_func(funcname, device, model, config)[source]#

Build the per-batch processing function used by the Ignite engine loop.

Returns a partial of _inner_loop with func, prepare_inputs, device, and config already bound. The remaining signature is (engine, batch) — pass None for engine when calling outside an Ignite engine (e.g. from InferStreamSession).

_create_process_func[source]#

create_engine(funcname: str, device: torch.device, model: torch.nn.Module, config: dict) → ignite.engine.Engine[source]#

Unified creation of the pytorch engine object for either an evaluator or trainer.

This function will automatically unwrap a distributed model to find the necessary function, and construct the necessary functions to transfer data to the device on every batch, so model code can be the same no matter where the model is being run.

Parameters:

funcname (str) – The function name on the model that we will call in the core of the engine loop, and be called once per batch
device (torch.device) – The device the engine will run the model on
model (torch.nn.Module) – The Model the engine will be using
config (dict) – The runtime config in use

extract_model_method(model, method_name)[source]#

Extract a method from a model, which may be wrapped in DistributedDataParallel. For instance, method_name could be train_batch or infer_batch.

Parameters:

model (nn.Module) – The model to extract the method from
method_name (str) – Name of the method to extract

Returns:

The method extracted from the model

Return type:

Callable

create_evaluator(model: torch.nn.Module, save_function: collections.abc.Callable[[torch.Tensor, torch.Tensor], Any], config: dict) → ignite.engine.Engine[source]#

Creates an evaluator engine Primary purpose of this function is to attach the appropriate handlers to an evaluator engine

Parameters:

model (torch.nn.Module) – The model to evaluate
save_function (Callable[[torch.Tensor], Any]) – A function which will receive Engine.state.output at the end of each iteration. The intent is for the results of evaluation to be saved.
config (dict) – The runtime config in use

Returns:

Engine object which when run will evaluate the model.

Return type:

pytorch-ignite.Engine

create_validator(model: torch.nn.Module, config: dict, validation_data_loader: torch.utils.data.DataLoader, trainer: ignite.engine.Engine) → ignite.engine.Engine[source]#

This function creates a Pytorch Ignite engine object that will be used to validate the model.

Parameters:

model (torch.nn.Module) – The model to train
config (dict) – Hyrax runtime configuration
validation_data_loader (DataLoader) – The data loader for the validation data
trainer (pytorch-ignite.Engine) – The engine object that will be used to train the model. We will use specific hooks in the trainer to determine when to run the validation engine.

Returns:

Engine object that will be used to train the model.

Return type:

pytorch-ignite.Engine

create_tester(model: torch.nn.Module, config: dict) → ignite.engine.Engine[source]#

This function creates a Pytorch Ignite engine object that will be used to test the model and compute metrics without updating model weights.

Parameters:

model (torch.nn.Module) – The model to test
config (dict) – Hyrax runtime configuration

Returns:

Engine object that will be used to test the model and compute metrics.

Return type:

pytorch-ignite.Engine

attach_best_checkpoint(engine: ignite.engine.Engine, model: torch.nn.Module, trainer: ignite.engine.Engine, results_directory: pathlib.Path) → None[source]#

Attach a best-checkpoint handler to engine, scored on engine.state.output["loss"].

Call this function after both create_trainer and (optionally) create_validator have been called so that handler registration order is correct. When a validator is available, pass it as engine so that checkpointing is driven by validation loss. When no validator is available, pass the trainer as engine so that checkpointing falls back to training loss — preserving the previous behaviour.

The saved checkpoint format is identical to the one produced by create_trainer, so existing resume logic is fully backward-compatible.

Parameters:

engine (pytorch-ignite.Engine) – The engine whose output["loss"] is used as the checkpoint score. Pass the validator when one exists; otherwise pass the trainer. If the engine has a hyrax_label attribute, it will be included in the checkpoint filename.
model (torch.nn.Module) – The model being trained. Must expose model.optimizer and optionally model.scheduler.
trainer (pytorch-ignite.Engine) – The training engine. Used to derive the global step counter and to attach the end-of-training log handler.
results_directory (Path) – Directory where checkpoint files are written.

create_trainer(model: torch.nn.Module, config: dict, results_directory: pathlib.Path) → ignite.engine.Engine[source]#

This function is originally copied from here: pytorch-ignite/examples

It was substantially trimmed down to make it easier to understand.

Parameters:

model (torch.nn.Module) – The model to train
config (dict) – Hyrax runtime configuration
results_directory (Path) – The directory where training results will be saved

Returns:

Engine object that will be used to train the model.

Return type:

pytorch-ignite.Engine

create_save_batch_callback(results_dir)[source]#

Create a callback function for saving batch results during inference or testing.

This factory function creates a closure that captures the output directory, then returns a callback that can be used by pytorch_ignite engines to save model outputs batch by batch.

Parameters:: results_dir (Path) – Directory where results should be saved
Returns:: A callback function with signature (batch, batch_results) that saves results
Return type:: callable

class HyraxEvents(value: str, event_filter: collections.abc.Callable | None = None, name: str | None = None)[source]#

Bases: ignite.engine.EventEnum

Workaround event for a pytorch ignite bug. See fixup_engine for details

HYRAX_EPOCH_COMPLETED = 'HyraxEpochCompleted'[source]#

fixup_engine(engine: ignite.engine.Engine)[source]#

Workaround for this pytorch ignite bug (pytorch/ignite#3372) where engine.state.output is not available at EPOCH_COMPLETED or later times (COMPLETED, etc)

We create a new event HYRAX_EPOCH_COMPLETED which triggers at ITERATION_COMPLETED, but only on the final iteration. This is just before the erronious state reset.

This hack relies on pytorch ignite internal state, but can be removed as soon as our fix is mainlined (pytorch/ignite#3373) in version 0.6.0 estimated August 2025