hyrax.verbs.create_splits#

Attributes#

Classes#

CreateSplits

Create and persist reproducible dataset splits.

Module Contents#

logger[source]#
class CreateSplits(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Create and persist reproducible dataset splits.

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'create_splits'[source]#
add_parser_kwargs[source]#
description = 'Compute and persist dataset splits for reproducible training workflows.'[source]#
REQUIRED_DATA_GROUPS = ()[source]#
OPTIONAL_DATA_GROUPS = ()[source]#
static setup_parser(parser)[source]#

No additional CLI options needed.

run_cli(args=None)[source]#

CLI stub for CreateSplits verb.

run()[source]#

Compute dataset splits and write them to a results directory.

Reads the [split] and [balance] config tables to determine how to partition each data group, then persists .npz index files and a split_config.toml under a timestamped *-splits-* results directory. Subsequent verbs (train, infer, test) can point at this directory to reuse the same split without recomputing it.

Returns:

The populated dataset providers, keyed by group name.

Return type:

dict[str, DataProvider]