hyrax.data_sets.hsc_data_set
Attributes
Classes
Dataset for sets of HSC cutouts created by the |
Module Contents
- class HSCDataSet(config: dict, data_location=None)[source]
Bases:
hyrax.data_sets.fits_image_dataset.FitsImageDataSetDataset for sets of HSC cutouts created by the
fibad downloadcommand.- _parse_filter_catalog(table) None[source]
Sets self.files by parsing the catalog.
Subclasses may override this function to control parsing of the table more directly, but the overriding class must create the files dict which has type dict[object_id -> dict[filter -> filename]] with object_id, filter, and filename all strings. In the case of no filter distinction, a single flag value may be used for the filter dict keys in the inner dicts.
- Parameters:
table (Table) – The catalog we read in
- _set_crop_transform()[source]
Returns the crop transform on the image
If overriden, subclass must: 1) set self.cutout_shape to a tuple of ints representing the size of the cutouts that will be returned at some point in the init flow.
Update the crop tranform using self.set_crop_transform() from the HyraxImageDataset mixin
- _scan_file_names(filters: list[str] | None, filter_obj_ids: list[str] | None = None) hyrax.data_sets.fits_image_dataset.files_dict[source]
Class initialization helper
- Parameters:
filters (list[str], Optional:) – List of filters that we should look for in the data corpus
filter_obj_ids (list[str], Optional:) – Filter the file scan to only file names which have the provided object IDs, skipping other files When not provided, all file names in the configured data directory that match the pattern from hyrax download are parsed.
- Returns:
Nested dictionary where the first level maps object_id -> dict, and the second level maps filter_name -> file name. Corresponds to self.files
- Return type:
dict[str,dict[str,str]]
- static _scan_file_dimension(processing_unit: tuple[str, list[str]]) tuple[str, list[tuple[int, int]]][source]
- _prune_objects(filters_ref: list[str], cutout_shape: tuple[int, int] | None)[source]
Class initialization helper. Prunes objects from the list of objects.
Removes any objects which do not have all the filters specified in filters_ref
If a cutout_shape was provided in the constructor, prunes files that are too small for the chosen cutout size
This function deletes from self.files and self.dims via _prune_object
- Parameters:
files (dict[str,dict[str,str]]) – Nested dictionary where the first level maps object_id -> dict, and the second level maps filter_name -> file name. This is created by _scan_files()
filters_ref (list[str]) – List of the filter names
cutout_shape (tuple[int, int]) – Cutout shape tuple provided from constructor
- _check_file_dimensions() tuple[int, int][source]
Class initialization helper. Find the maximal pixel size that all images can support
It is assumed that all the cutouts will be of very similar size; however, HSC’s cutout server does not return exactly the same number of pixels for every query, even when it is given the same angular spread for every cutout.
Machine learning models expect all images to be the same size.
This function warns on significant differences (>2px) on any dimension between the largest and smallest images.
- Returns:
The minimum width and height in pixels of the entire dataset. In other words: the maximal image size in pixels that can be generated from ALL cutout images via cropping.
- Return type:
tuple(int,int)
- __contains__(object_id: str) bool[source]
Allows you to do object_id in dataset queries. Used by testing code.
- Parameters:
object_id (str) – The object ID you’d like to know if is in the dataset
- Returns:
True of the object_id given is in the data set
- Return type:
bool
- _all_files_full()[source]
Private read-only iterator over all files that enforces a strict total order across objects and filters. Will not work prior to self.files, and self.path initialization in __init__
- Yields:
Tuple[object_id, filter, filename, dim] – Members of this tuple are - The object_id as a string - The filter name as a string - The filename relative to self.path - A tuple containing the dimensions of the fits file in pixels.
- _object_files(object_id)[source]
Private read-only iterator over all files for a given object. This enforces a strict total order across filters. Will not work prior to self.files, and self.path initialization in __init__
Guaranteed to only return files that have filters in self.filters_ref.
- Yields:
Path – The path to the file.