pydiva.gathering
pydiva.gathering.base
Gatherer
Bases: ABC
Abstract base class for data gathering operations.
This class defines the interface for all pydiva Gatherers that collect data from external sources like APIs, websites, or other remote services.
caching_location
instance-attribute
__init__
__init__(caching_base_location: Path | str = Path('/workspace/.cache/pydiva'), caching_name: str = 'pydiva_data', caching_sub_location: Path | str = '')
fetch
Function to call to start the data gathering for the given parameters.
Return the collected data and writes it to disk in the caching location. To redownload existing files, use bool parameter "overwrite_cache".
pydiva.gathering.aeronet
AERONET_URL_PATTERN
module-attribute
AERONET_URL_PATTERN = '{source}/cgi-bin/{url_segment}site={site}&year={start_year}&month={start_month}&day={start_day}&year2={end_year}&month2={end_month}&day2={end_day}&{data_type}=1&AVG={averaging}&if_no_html=1'
DATA_TYPE_URLS
module-attribute
DATA_TYPE_URLS = {'ALM00': 'print_web_data_raw_sky_v3?', 'ALM15': 'print_web_data_inv_v3?product=ALL&', 'ALM20': 'print_web_data_inv_v3?product=ALL&', 'ALP00': 'print_web_data_raw_sky_v3?', 'AOD10': 'print_web_data_v3?', 'AOD15': 'print_web_data_v3?', 'AOD20': 'print_web_data_v3?', 'HYB00': 'print_web_data_raw_sky_v3?', 'HYB15': 'print_web_data_inv_v3?product=ALL&', 'HYB20': 'print_web_data_inv_v3?product=ALL&', 'HYP00': 'print_web_data_raw_sky_v3?', 'LWN10': 'print_web_data_v3?', 'LWN15': 'print_web_data_v3?', 'LWN20': 'print_web_data_v3?', 'PPL00': 'print_web_data_raw_sky_v3?', 'PPP00': 'print_web_data_raw_sky_v3?', 'SDA10': 'print_web_data_v3?', 'SDA15': 'print_web_data_v3?', 'SDA20': 'print_web_data_v3?', 'TOT10': 'print_web_data_v3?', 'TOT15': 'print_web_data_v3?', 'TOT20': 'print_web_data_v3?', 'ZEN00': 'print_web_data_zenith_radiance_v3?'}
AeronetGatherer
Bases: Gatherer
Fetches data for the specified arguments using the AERONET Web Service.
Example
aeronet_gatherer = AeronetGatherer() aeronet_gatherer.fetch( site="Magurele_Inoe", data_type="AOD15", start_date="2023-05-01", end_date="2023-05-21", )
The fetched data is also stored in the caching_location by default.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site
|
The exact sitename as found in AERONET, e.g. "Magurele_Inoe" |
required | |
data_type
|
The data type as found in AERONET, e.g. "AOD20" |
required | |
start_date
|
Start date of the measurement points, e.g. "2022-05-16" |
required | |
end_date
|
End date of the measurement points, e.g. "2022-08-25" |
required | |
averaging
|
False for all points, True for daily averages; default is False |
required |
Other Parameters: overwrite_cache: Set True to redownload and overwrite existing cached files; default is False
pydiva.gathering.actris_ares
ACTRIS_ARES_API_URL
module-attribute
ACTRIS_ARES_KIND
module-attribute
ACTRIS_KWARG_ALTERNATIVES
module-attribute
ACTRIS_KWARG_ALTERNATIVES = [('ewls', ['wavelength']), ('from_date', ['fromDate']), ('from_day_time', ['fromDayTime']), ('to_date', ['toDate']), ('to_day_time', ['toDayTime']), ('file_types', ['fileTypes', 'opticaltype', 'optical_type']), ('quality_control_version', ['qualityControlVersion', 'qa_version']), ('scc_version', ['sccVersion'])]
ActrisAresParameters
Bases: BaseModel
quality_control_version
class-attribute
instance-attribute
ActrisAresGatherer
Bases: Gatherer
Wrapper for the ACTRIS Ares Rest API (https://data.earlinet.org/api/services/restapi?_wadl)
Example
ares_gatherer = ActrisAresGatherer() params = { "kind": ["cloudmask", "optical"], "from_date": "2020-04-01", "to_date": datetime.strptime("2020-04-07", "%Y-%m-%d"), "from_day_time": "22:00:00", "to_day_time": time(22, 30, 0), "stations": "waw", "ewls": [355, 532], "file_types": "b", "levels": 1.0, } ares_gatherer.fetch(**params)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
Kind of data to fetch, if 'optical', uses the specific endpoint, otherwise the generic one; str | list[str]; default 'optical' |
required | |
from_date
|
Start of sensing time; str | date | datetime |
required | |
to_date
|
End of sensing time; str | date | datetime |
required | |
stations
|
ACTRIS stations from which to fetch data; str | list[str] |
required | |
ewls | wavelength
|
Wavelengths for which to fetch data; str | int | float | list[Any] |
required | |
file_types | optical_type
|
File types to fetch; str | list[str] |
required | |
tag
|
str | list[str] | None = None |
required | |
overwrite_cache
|
Set True to redownload and overwrite existing cached files; default is False; bool |
required |
Other Parameters:
| Name | Type | Description |
|---|---|---|
from_day_time |
Start of sensing time in time of day; str | time | datetime |
|
to_day_time |
End of sensing time in time of day; str | time | datetime |
|
levels |
File levels to fetch; str | int | float | list[Any] |
|
quality_control_version |
For which QA version to fetch files; str | int |
Note
ActrisAresGatherer() ActrisAresGatherer.STATIONS ActrisAresGatherer.WAVELENGTHS ActrisAresGatherer.FILETYPES ActrisAresGatherer.LEVELS ActrisAresGatherer.QA_VERSIONS ActrisAresGatherer.TAGS
pydiva.gathering.actris_cloudnet
ActrisCloudnetParameters
Bases: BaseModel
updated_at_from
class-attribute
instance-attribute
ActrisCloudnetGatherer
Bases: Gatherer
Wrapper for the ACTRIS Cloudnet api client (https://pypi.org/project/cloudnet-api-client/)
Example
cloudnet_gatherer = ActrisCloudnetGatherer() params = { "site_id": "hyytiala", "date": "2021-01-01", "product": ["mwr", "radar"], "updated_at_to": datetime.strptime( "2025-01-01T12:00:00", "%Y-%m-%dT%H:%M:%S" ) } cloudnet_gatherer.fetch(**params)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date
|
Sensing time at this date; str | date | datetime |
required | |
date_from
|
Sensing after this date; str | date | datetime |
required | |
date_to
|
Sensing time before this date; str | date | datetime |
required | |
site_id
|
Id of the site from which to fetch data: str | list[str] |
required | |
updated_at
|
Fetch files updated on this date; str | date | datetime |
required | |
updated_at_from
|
Fetch files updated after this date; str | date | datetime |
required | |
updated_at_to
|
Fetch files updated before this date; str | date | datetime |
required | |
product
|
Which products to fetch; str | list[str] |
required | |
instrument_id
|
ID of the instruments from which to fetch; str | list[str] |
required | |
instrument_pid
|
PID of the instruments from which to fetch; str | list[str] |
required | |
overwrite_cache
|
Set True to redownload and overwrite existing cached files; default is False; bool |
required |
Note
ActrisCloudnetGatherer() ActrisCloudnetGatherer.SITES ActrisCloudnetGatherer.PRODUCTS ActrisCloudnetGatherer.INSTRUMENTS
pydiva.gathering.earthcare
Gatherer wrapper for oads-download https://github.com/koenigleon/oads-download/
FILE_TYPES
module-attribute
FILE_TYPES = ['ATL_NOM_1B', 'ATL_DCC_1B', 'ATL_CSC_1B', 'ATL_FSC_1B', 'MSI_NOM_1B', 'MSI_BBS_1B', 'MSI_SD1_1B', 'MSI_SD2_1B', 'BBR_NOM_1B', 'BBR_SNG_1B', 'BBR_SOL_1B', 'BBR_LIN_1B', 'CPR_NOM_1B', 'MSI_RGR_1C', 'AUX_MET_1D', 'AUX_JSG_1D', 'ATL_FM__2A', 'ATL_AER_2A', 'ATL_ICE_2A', 'ATL_TC__2A', 'ATL_EBD_2A', 'ATL_CTH_2A', 'ATL_ALD_2A', 'ATL_CLA_2A', 'MSI_CM__2A', 'MSI_COP_2A', 'MSI_AOT_2A', 'MSI_CLP_2A', 'CPR_FMR_2A', 'CPR_CD__2A', 'CPR_TC__2A', 'CPR_CLD_2A', 'CPR_APC_2A', 'CPR_ECO_2A', 'CPR_CLP_2A', 'AM__MO__2B', 'AM__CTH_2B', 'AM__ACD_2B', 'AC__TC__2B', 'AC__CLP_2B', 'BM__RAD_2B', 'BMA_FLX_2B', 'ACM_CAP_2B', 'ACM_COM_2B', 'ACM_RT__2B', 'ACM_CLP_2B', 'ALL_DF__2B', 'ALL_3D__2B', 'ALL_RAD_2B', 'MPL_ORBSCT', 'AUX_ORBPRE', 'AUX_ORBRES']
COLLECTIONS
module-attribute
COLLECTIONS = ['EarthCAREL1Validated', 'EarthCAREL2Validated', 'EarthCAREXMETL1DProducts10', 'JAXAL2Validated', 'EarthCAREAuxiliary', 'EarthCAREL1InstChecked', 'EarthCAREL2InstChecked', 'JAXAL2InstChecked', 'EarthCAREL0L1Products', 'EarthCAREL2Products', 'JAXAL2Products']
EarthCAREParameters
Bases: BaseModel
Pydantic model for EarthCARE download parameters. Uses validator to pass the expected types to the OADS downloader.
product_types
class-attribute
instance-attribute
radius_search
class-attribute
instance-attribute
bounding_box
class-attribute
instance-attribute
is_found_files_list_to_txt
class-attribute
instance-attribute
validate_collections
classmethod
Validate collections are in the valid list.
EarthCAREGatherer
Bases: Gatherer
Flexible wrapper for EarthCARE OADS download script.
Example
ec_gatherer = EarthCAREGatherer(username="user", password="pass") file_paths = ec_gatherer.fetch( product_types=["ATL_NOM_1B"], collections=["EarthCAREL1InstChecked"], start_time="2024-07-31T13:00:00Z", ... )
Fetches EarthCARE satellite data from ESA's Online Access and Distribution System (OADS) using the OpenSearch API. Downloaded files are stored in a temporary directory and file paths are returned.
Note
Standard Users: EarthCAREL1Validated, EarthCAREL2Validated, JAXAL2Validated, EarthCAREXMETL1DProducts10, EarthCAREOrbitData Cal/Val Users: All standard collections plus InstChecked variants and EarthCAREAuxiliary Commissioning Team: All standard collections plus Products and *L0L1Products variants
For more details check the EarthCARE Online Dissemination Service: https://ec-pdgs-dissemination2.eo.esa.int/oads/access/collection
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
product_types
|
List of EarthCARE products, e.g. ["ATL_NOM_1B", "MSI_NOM_1B"] Supports short names e.g. "ANOM" and version specification e.g. "ANOM:AC" |
required | |
collections
|
List of accessible collections, e.g. ["EarthCAREL1Validated", "EarthCAREL2Validated"] Must be from the valid collections list based on user access level |
required |
Other Parameters:
| Name | Type | Description |
|---|---|---|
start_time |
Start of sensing time, e.g. "2024-07-31T13:00:00Z" |
|
end_time |
End of sensing time, e.g. "2024-07-31T14:00:00Z" |
|
timestamps |
List of specific timestamps to search for |
|
orbit_numbers |
List of orbit numbers, e.g. [981, 982, 983] |
|
frame_ids |
List of frame IDs (A-H), e.g. ["A", "B", "C"] |
|
radius_search |
Spatial search around point [radius_m, lat, lon], e.g. [25000, 51.35, 12.43] |
|
bounding_box |
Spatial search in box [latS, lonW, latN, lonE], e.g. [14.9, 37.7, 14.99, 37.78] |
|
product_version |
Two-letter processor baseline, e.g. "AC" |
|
download_idx |
Select single file by index from results |
|
orbit_and_frames |
Combined orbit/frame strings, e.g. ["00981E", "00982A"] |
|
start_orbit_number/end_orbit_number |
Orbit range bounds |
|
start_orbit_and_frame/end_orbit_and_frame |
Combined orbit/frame range bounds |
|
is_download |
Download files; default True |
|
is_unzip |
Extract downloaded archives; default True |
|
is_delete |
Delete archives after extraction; default True |
|
is_overwrite |
Overwrite existing files; default False |
|
is_create_subdirs |
Create organized subdirectory structure; default True |
|
is_debug |
Enable debug logging; default False |
|
overwrite_cache |
Set True to redownload and overwrite existing cached files; default False |