hyrax.datasets.hsc_dataset

hyrax.datasets.hsc_dataset#

Attributes#

`logger`
`dim_dict`

Classes#

HSCDataset

Dataset for sets of HSC cutouts created by the fibad download command.

Module Contents#

logger[source]#

dim_dict[source]#

class HSCDataset(config: dict, data_location=None)[source]#

Bases: hyrax.datasets.fits_image_dataset.FitsImageDataset

Dataset for sets of HSC cutouts created by the fibad download command.

__init__()[source]#

_called_from_test = False[source]#

filters_config[source]#

_read_filter_catalog(filter_catalog_path: pathlib.Path | None)[source]#

_parse_filter_catalog(table) → None[source]#

Sets self.files by parsing the catalog.

Subclasses may override this function to control parsing of the table more directly, but the overriding class must create the files dict which has type dict[object_id -> dict[filter -> filename]] with object_id, filter, and filename all strings. In the case of no filter distinction, a single flag value may be used for the filter dict keys in the inner dicts.

Parameters:: table (Table) – The catalog we read in

_set_crop_transform()[source]#

Returns the crop transform on the image

If overriden, subclass must: 1) set self.cutout_shape to a tuple of ints representing the size of the cutouts that will be returned at some point in the init flow.

Update the crop tranform using self.set_crop_transform() from the HyraxImageDataset mixin

_before_preload()[source]#

_scan_file_names(filters: list[str] | None, filter_obj_ids: list[str] | None = None) → hyrax.datasets.fits_image_dataset.files_dict[source]#

Class initialization helper

Parameters:

filters (list[str], Optional:) – List of filters that we should look for in the data corpus
filter_obj_ids (list[str], Optional:) – Filter the file scan to only file names which have the provided object IDs, skipping other files When not provided, all file names in the configured data directory that match the pattern from hyrax download are parsed.

Returns:

Nested dictionary where the first level maps object_id -> dict, and the second level maps filter_name -> file name. Corresponds to self.files

Return type:

dict[str,dict[str,str]]

static _determine_numprocs() → int[source]#

static _fixup_limit(nproc: int, res, est_limit, est_procs) → int[source]#

_scan_file_dimensions() → dim_dict[source]#

static _scan_file_dimension(processing_unit: tuple[str, list[str]]) → tuple[str, list[tuple[int, int]]][source]#

static _fits_file_dims(filepath) → tuple[int, int][source]#

_prune_objects(filters_ref: list[str], cutout_shape: tuple[int, int] | None)[source]#

Class initialization helper. Prunes objects from the list of objects.

Removes any objects which do not have all the filters specified in filters_ref
If a cutout_shape was provided in the constructor, prunes files that are too small for the chosen cutout size

This function deletes from self.files and self.dims via _prune_object

Parameters:

files (dict[str,dict[str,str]]) – Nested dictionary where the first level maps object_id -> dict, and the second level maps filter_name -> file name. This is created by _scan_files()
filters_ref (list[str]) – List of the filter names
cutout_shape (tuple[int, int]) – Cutout shape tuple provided from constructor

_mark_for_prune(object_id, reason)[source]#

_prune_object(object_id, reason: str)[source]#

_check_file_dimensions() → tuple[int, int][source]#

Class initialization helper. Find the maximal pixel size that all images can support

It is assumed that all the cutouts will be of very similar size; however, HSC’s cutout server does not return exactly the same number of pixels for every query, even when it is given the same angular spread for every cutout.

Machine learning models expect all images to be the same size.

This function warns on significant differences (>2px) on any dimension between the largest and smallest images.

Returns:: The minimum width and height in pixels of the entire dataset. In other words: the maximal image size in pixels that can be generated from ALL cutout images via cropping.
Return type:: tuple(int,int)

_rebuild_manifest(config)[source]#

__contains__(object_id: str) → bool[source]#

Allows you to do object_id in dataset queries. Used by testing code.

Parameters:: object_id (str) – The object ID you’d like to know if is in the dataset
Returns:: True of the object_id given is in the data set
Return type:: bool

_all_files_full()[source]#

Private read-only iterator over all files that enforces a strict total order across objects and filters. Will not work prior to self.files, and self.path initialization in __init__

Yields:: Tuple[object_id, filter, filename, dim] – Members of this tuple are - The object_id as a string - The filter name as a string - The filename relative to self.path - A tuple containing the dimensions of the fits file in pixels.

_object_files(object_id)[source]#

Private read-only iterator over all files for a given object. This enforces a strict total order across filters. Will not work prior to self.files, and self.path initialization in __init__

Guaranteed to only return files that have filters in self.filters_ref.

Yields:: Path – The path to the file.