hyrax.data_sets.downloaded_lsst_dataset

Attributes

logger

Classes

DownloadedLSSTDataset

DownloadedLSSTDataset: A dataset that inherits from LSSTDataset and downloads

Module Contents

logger[source]
class DownloadedLSSTDataset(config, data_location)[source]

Bases: hyrax.data_sets.lsst_dataset.LSSTDataset

DownloadedLSSTDataset: A dataset that inherits from LSSTDataset and downloads cutouts from the LSST butler, saving them as .pt files during first access. On subsequent accesses, it loads cutouts directly from these cached files.

This class also creates a manifest files with the shape of each cutout and the corresponding filename.

Public Methods:
download_cutouts(indices=None, sync_filesystem=True, max_workers=None, force_retry=False):

Download cutouts with parallel processing. Automatically resumes from previous progress. Use max_workers to control thread count, force_retry to re-attempt failed downloads.

get_manifest_stats():

Returns dict with download statistics: total, successful, failed, pending counts and manifest file path.

get_download_progress():

Returns detailed progress metrics including completion percentage and failure rates.

reset_failed_downloads():

Resets all failed download attempts to allow retry without force_retry flag. Returns count of reset entries.

save_manifest_now():

Forces immediate manifest save (normally saved periodically during downloads).

get_cache_info():

Returns LRU cache statistics for patch fetching performance monitoring.

clear_cache():

Clears the patch LRU cache to free memory.

Usage Example:

# Initialize Hyrax h = hyrax.Hyrax() a = h.prepare()

# Download all cutouts (resumes automatically) a.download_cutouts(max_workers=4) WARNING: The LRU Caching scheme is slightly complicated, so it is recommended to use the default max_workers=1 for the first download. Simply using more workers may not always speed up the download process.

# Check progress a.get_download_progress()

# Retry failed downloads a.download_cutouts(force_retry=True)

# Access cutouts (loads from cache) cutout = a[0] # Single cutout cutouts = a[0:10] # Multiple cutouts

File Organization: - Cutouts saved as: cutout_{object_id}.pt or cutout_{index:04d}.pt - Manifest saved as: manifest.fits (Astropy) or manifest.parquet (HATS) - All files stored in config[“general”][“data_dir”]

__init__()[source]

Initialize the dataset with either a HATS catalog or astropy table.

Config can specify either: - config[“data_set”][“hats_catalog”]: path to HATS catalog - config[“data_set”][“astropy_table”]: path to any file readable by Astropy Table

download_dir[source]
_config[source]
_manifest_lock[source]
_updates_since_save = 0[source]
_save_interval = 1000[source]
_band_failure_stats[source]
_band_failure_lock[source]
_setup_naming_strategy()[source]

Setup file naming strategy based on catalog columns.

_initialize_manifest()[source]

Create new manifest or load/merge with existing manifest, with band filtering validation.

_load_existing_manifest()[source]

Load existing manifest file.

_merge_manifests(existing_manifest)[source]

Merge existing manifest with current catalog based on object_id.

_add_manifest_columns()[source]

Add cutout_shape, filename, and downloaded_bands columns to manifest.

_get_available_bands_from_manifest(manifest)[source]

Get available bands by checking first 10 successful downloads for consistency.

_setup_band_filtering(requested_bands, original_band_order)[source]

Setup band filtering to extract only requested bands from cached cutouts.

_get_cutout_path(idx)[source]

Generate cutout file path for a given index.

_update_manifest_entry(idx, cutout_shape=None, filename='Attempted', downloaded_bands=None)[source]

Thread-safe manifest update with periodic saves.git sta

Parameters:
  • idx – Index in the catalog

  • cutout_shape – Shape tuple of the cutout tensor, or None for failed downloads

  • filename – Basename of the saved file, or “Attempted” only when ALL bands fail

  • downloaded_bands – List of band names successfully downloaded in tensor order

_save_manifest()[source]

Save manifest in appropriate format (FITS for Astropy, Parquet for HATS).

_sync_manifest_with_filesystem()[source]

Sync manifest with actual downloaded files on disk.

static _request_patch_cached(tract_index, patch_index, butler_repo, butler_collections, skymap_name, bands_tuple)[source]

Cached patch fetching using static method.

Static method means no ‘self’ in cache key, making it truly global. Thread-safe because each call creates its own Butler instance.

_fetch_single_cutout(row, idx=None)[source]

Fetch cutout, using saved cutout if available, with optional band filtering.

_fetch_cutout_with_cache(row)[source]

Generate cutout using cached patch fetching with NaN filling for failed bands.

get_image(idxs)[source]

Fetch image cutout(s) for given index or indices, using caching and band filtering.

Parameters:

idxs: int or slice or list

Index or indices to fetch.

Returns:

torch.Tensor or list of torch.Tensor:

Single cutout tensor or list of cutout tensors.

__getitem__(idxs) dict[source]

Modified to pass index for saving cutouts.

Parameters:

idxs: int or slice or list

Index or indices to fetch.

Returns:

dict:

Dictionary with key ‘data’ containing another dict of default data fields to return. Currently only ‘image’ is supported.

download_cutouts(indices=None, sync_filesystem=True, max_workers=None, force_retry=False)[source]

Download cutouts using multiple threads with caching.

Parameters:
  • indices – List of indices to download, or None for all

  • sync_filesystem – Whether to sync manifest with existing files on disk

  • max_workers – Maximum number of worker threads, or None to use default

  • force_retry – Whether to retry previously failed downloads

_download_single_cutout(idx)[source]

Helper method to download a single cutout.

get_cache_info()[source]

Get cache statistics.

clear_cache()[source]

Clear the LRU cache.

get_manifest_stats()[source]

Get manifest statistics including downloaded bands information.

get_band_filtering_info()[source]

Get information about current band filtering configuration.

save_manifest_now()[source]

Force immediate manifest save.

static _determine_numprocs_download()[source]

Determine number of threads for downloading.

reset_failed_downloads()[source]

Reset failed download attempts to allow retry.

get_download_progress()[source]

Get detailed download progress information.

get_download_summary()[source]

Get detailed download and band analysis, accounting for band filtering.