hyrax.data_sets.hsc_data_set
============================

.. py:module:: hyrax.data_sets.hsc_data_set


Attributes
----------

.. autoapisummary::

   hyrax.data_sets.hsc_data_set.logger
   hyrax.data_sets.hsc_data_set.dim_dict


Classes
-------

.. autoapisummary::

   hyrax.data_sets.hsc_data_set.HSCDataSet


Module Contents
---------------

.. py:data:: logger

.. py:data:: dim_dict

.. py:class:: HSCDataSet(config: dict, data_location=None)

   Bases: :py:obj:`hyrax.data_sets.fits_image_dataset.FitsImageDataSet`


   Dataset for sets of HSC cutouts created by the ``fibad download`` command.

   .. py:method:: __init__



   .. py:attribute:: _called_from_test
      :value: False



   .. py:attribute:: filters_config


   .. py:method:: _read_filter_catalog(filter_catalog_path: pathlib.Path | None)


   .. py:method:: _parse_filter_catalog(table) -> None

      Sets self.files by parsing the catalog.

      Subclasses may override this function to control parsing of the table more directly, but the
      overriding class must create the files dict which has type dict[object_id -> dict[filter -> filename]]
      with object_id, filter, and filename all strings.  In the case of no filter distinction, a single
      flag value may be used for the filter dict keys in the inner dicts.

      :param table: The catalog we read in
      :type table: Table



   .. py:method:: _set_crop_transform()

      Returns the crop transform on the image

      If overriden, subclass must:
      1) set self.cutout_shape to a tuple of ints representing the size of the cutouts that will be
      returned at some point in the init flow.

      2) Update the crop tranform using self.set_crop_transform() from the HyraxImageDataset mixin



   .. py:method:: _before_preload()


   .. py:method:: _scan_file_names(filters: list[str] | None, filter_obj_ids: list[str] | None = None) -> hyrax.data_sets.fits_image_dataset.files_dict

      Class initialization helper

      :param filters: List of filters that we should look for in the data corpus
      :type filters: list[str], Optional:
      :param filter_obj_ids: Filter the file scan to only file names which have the provided object IDs, skipping other files
                             When not provided, all file names in the configured data directory that match the pattern from
                             hyrax download are parsed.
      :type filter_obj_ids: list[str], Optional:

      :returns: Nested dictionary where the first level maps object_id -> dict, and the second level maps
                filter_name -> file name. Corresponds to self.files
      :rtype: dict[str,dict[str,str]]



   .. py:method:: _determine_numprocs() -> int
      :staticmethod:



   .. py:method:: _fixup_limit(nproc: int, res, est_limit, est_procs) -> int
      :staticmethod:



   .. py:method:: _scan_file_dimensions() -> dim_dict


   .. py:method:: _scan_file_dimension(processing_unit: tuple[str, list[str]]) -> tuple[str, list[tuple[int, int]]]
      :staticmethod:



   .. py:method:: _fits_file_dims(filepath) -> tuple[int, int]
      :staticmethod:



   .. py:method:: _prune_objects(filters_ref: list[str], cutout_shape: tuple[int, int] | None)

      Class initialization helper. Prunes objects from the list of objects.

      1) Removes any objects which do not have all the filters specified in filters_ref
      2) If a cutout_shape was provided in the constructor, prunes files that are too small
         for the chosen cutout size

      This function deletes from self.files and self.dims via _prune_object

      :param files: Nested dictionary where the first level maps object_id -> dict, and the second level maps
                    filter_name -> file name. This is created by _scan_files()
      :type files: dict[str,dict[str,str]]
      :param filters_ref: List of the filter names
      :type filters_ref: list[str]
      :param cutout_shape: Cutout shape tuple provided from constructor
      :type cutout_shape: tuple[int, int]



   .. py:method:: _mark_for_prune(object_id, reason)


   .. py:method:: _prune_object(object_id, reason: str)


   .. py:method:: _check_file_dimensions() -> tuple[int, int]

      Class initialization helper. Find the maximal pixel size that all images can support

      It is assumed that all the cutouts will be of very similar size; however, HSC's cutout
      server does not return exactly the same number of pixels for every query, even when it
      is given the same angular spread for every cutout.

      Machine learning models expect all images to be the same size.

      This function warns on significant differences (>2px) on any dimension between the largest
      and smallest images.

      :returns: The minimum width and height in pixels of the entire dataset. In other words: the maximal image
                size in pixels that can be generated from ALL cutout images via cropping.
      :rtype: tuple(int,int)



   .. py:method:: _rebuild_manifest(config)


   .. py:method:: __contains__(object_id: str) -> bool

      Allows you to do `object_id in dataset` queries. Used by testing code.

      :param object_id: The object ID you'd like to know if is in the dataset
      :type object_id: str

      :returns: True of the object_id given is in the data set
      :rtype: bool



   .. py:method:: _all_files_full()

      Private read-only iterator over all files that enforces a strict total order across
      objects and filters. Will not work prior to self.files, and self.path initialization in __init__

      :Yields: *Tuple[object_id, filter, filename, dim]* -- Members of this tuple are
               - The object_id as a string
               - The filter name as a string
               - The filename relative to self.path
               - A tuple containing the dimensions of the fits file in pixels.



   .. py:method:: _object_files(object_id)

      Private read-only iterator over all files for a given object. This enforces a strict total order
      across filters. Will not work prior to self.files, and self.path initialization in __init__

      Guaranteed to only return files that have filters in self.filters_ref.

      :Yields: *Path* -- The path to the file.



