Skip to content

pydiva.gathering

pydiva.gathering.base

FILE_ENCODING module-attribute

FILE_ENCODING = 'utf-8'

Gatherer

Bases: ABC

Abstract base class for data gathering operations.

This class defines the interface for all pydiva Gatherers that collect data from external sources like APIs, websites, or other remote services.

caching_location instance-attribute

caching_location = Path(caching_base_location, caching_sub_location)

caching_name instance-attribute

caching_name = caching_name

__init__

__init__(caching_base_location: Path | str = Path('/workspace/.cache/pydiva'), caching_name: str = 'pydiva_data', caching_sub_location: Path | str = '')

fetch

fetch(overwrite_cache: bool = False, **kwargs) -> str | list[str] | None

Function to call to start the data gathering for the given parameters.

Return the collected data and writes it to disk in the caching location. To redownload existing files, use bool parameter "overwrite_cache".

list_cached_files

list_cached_files() -> list[str]

list_cached_directories

list_cached_directories() -> list[str]

pydiva.gathering.aeronet

AERONET_SOURCE module-attribute

AERONET_SOURCE = 'https://aeronet.gsfc.nasa.gov'

AERONET_URL_PATTERN module-attribute

AERONET_URL_PATTERN = '{source}/cgi-bin/{url_segment}site={site}&year={start_year}&month={start_month}&day={start_day}&year2={end_year}&month2={end_month}&day2={end_day}&{data_type}=1&AVG={averaging}&if_no_html=1'

DATA_TYPE_URLS module-attribute

DATA_TYPE_URLS = {'ALM00': 'print_web_data_raw_sky_v3?', 'ALM15': 'print_web_data_inv_v3?product=ALL&', 'ALM20': 'print_web_data_inv_v3?product=ALL&', 'ALP00': 'print_web_data_raw_sky_v3?', 'AOD10': 'print_web_data_v3?', 'AOD15': 'print_web_data_v3?', 'AOD20': 'print_web_data_v3?', 'HYB00': 'print_web_data_raw_sky_v3?', 'HYB15': 'print_web_data_inv_v3?product=ALL&', 'HYB20': 'print_web_data_inv_v3?product=ALL&', 'HYP00': 'print_web_data_raw_sky_v3?', 'LWN10': 'print_web_data_v3?', 'LWN15': 'print_web_data_v3?', 'LWN20': 'print_web_data_v3?', 'PPL00': 'print_web_data_raw_sky_v3?', 'PPP00': 'print_web_data_raw_sky_v3?', 'SDA10': 'print_web_data_v3?', 'SDA15': 'print_web_data_v3?', 'SDA20': 'print_web_data_v3?', 'TOT10': 'print_web_data_v3?', 'TOT15': 'print_web_data_v3?', 'TOT20': 'print_web_data_v3?', 'ZEN00': 'print_web_data_zenith_radiance_v3?'}

AeronetGatherer

Bases: Gatherer

Fetches data for the specified arguments using the AERONET Web Service.

Example

aeronet_gatherer = AeronetGatherer() aeronet_gatherer.fetch( site="Magurele_Inoe", data_type="AOD15", start_date="2023-05-01", end_date="2023-05-21", )

The fetched data is also stored in the caching_location by default.

Parameters:

Name Type Description Default
site

The exact sitename as found in AERONET, e.g. "Magurele_Inoe"

required
data_type

The data type as found in AERONET, e.g. "AOD20"

required
start_date

Start date of the measurement points, e.g. "2022-05-16"

required
end_date

End date of the measurement points, e.g. "2022-08-25"

required
averaging

False for all points, True for daily averages; default is False

required

Other Parameters: overwrite_cache: Set True to redownload and overwrite existing cached files; default is False

available_data_types property

available_data_types

__init__

__init__(*, caching_sub_location: Path | str = Path('aeronet'), **kwargs)

pydiva.gathering.actris_ares

ACTRIS_ARES_API_URL module-attribute

ACTRIS_ARES_API_URL = 'https://api.actris-ares.eu/api/services/restapi/'

ACTRIS_ARES_KIND module-attribute

ACTRIS_ARES_KIND = ['cloudmask', 'eldec', 'elic', 'elpp', 'hirelpp', 'optical', 'garrlic']

ACTRIS_ARES_DEFAULT_TIMEOUT module-attribute

ACTRIS_ARES_DEFAULT_TIMEOUT = 60

ACTRIS_KWARG_ALTERNATIVES module-attribute

ACTRIS_KWARG_ALTERNATIVES = [('ewls', ['wavelength']), ('from_date', ['fromDate']), ('from_day_time', ['fromDayTime']), ('to_date', ['toDate']), ('to_day_time', ['toDayTime']), ('file_types', ['fileTypes', 'opticaltype', 'optical_type']), ('quality_control_version', ['qualityControlVersion', 'qa_version']), ('scc_version', ['sccVersion'])]

ActrisAresParameters

Bases: BaseModel

kind class-attribute instance-attribute

kind: str | list[str] = ['optical']

from_date class-attribute instance-attribute

from_date: str | date | datetime | None = None

from_day_time class-attribute instance-attribute

from_day_time: str | time | datetime | None = None

to_date class-attribute instance-attribute

to_date: str | date | datetime | None = None

to_day_time class-attribute instance-attribute

to_day_time: str | time | datetime | None = None

stations class-attribute instance-attribute

stations: str | list[str] | None = None

ewls class-attribute instance-attribute

ewls: str | int | float | list[Any] | None = None

file_types class-attribute instance-attribute

file_types: str | list[str] | None = None

levels class-attribute instance-attribute

levels: str | int | float | list[Any] | None = None

quality_control_version class-attribute instance-attribute

quality_control_version: str | int | None = None

tag class-attribute instance-attribute

tag: str | list[str] | None = None

scc_version class-attribute instance-attribute

scc_version: bool | None = None

validate_kind classmethod

validate_kind(v)

validate_from_date classmethod

validate_from_date(v)

validate_to_date classmethod

validate_to_date(v)

validate_from_day_time classmethod

validate_from_day_time(v)

validate_to_day_time classmethod

validate_to_day_time(v)

validate_stations classmethod

validate_stations(v)

validate_ewls classmethod

validate_ewls(v)

validate_file_types classmethod

validate_file_types(v)

validate_levels classmethod

validate_levels(v)

validate_quality_control_version classmethod

validate_quality_control_version(v)

validate_tag classmethod

validate_tag(v)

validate_scc_version classmethod

validate_scc_version(v)

ActrisAresGatherer

Bases: Gatherer

Wrapper for the ACTRIS Ares Rest API (https://data.earlinet.org/api/services/restapi?_wadl)

Example

ares_gatherer = ActrisAresGatherer() params = { "kind": ["cloudmask", "optical"], "from_date": "2020-04-01", "to_date": datetime.strptime("2020-04-07", "%Y-%m-%d"), "from_day_time": "22:00:00", "to_day_time": time(22, 30, 0), "stations": "waw", "ewls": [355, 532], "file_types": "b", "levels": 1.0, } ares_gatherer.fetch(**params)

Parameters:

Name Type Description Default
kind

Kind of data to fetch, if 'optical', uses the specific endpoint, otherwise the generic one; str | list[str]; default 'optical'

required
from_date

Start of sensing time; str | date | datetime

required
to_date

End of sensing time; str | date | datetime

required
stations

ACTRIS stations from which to fetch data; str | list[str]

required
ewls | wavelength

Wavelengths for which to fetch data; str | int | float | list[Any]

required
file_types | optical_type

File types to fetch; str | list[str]

required
tag

str | list[str] | None = None

required
overwrite_cache

Set True to redownload and overwrite existing cached files; default is False; bool

required

Other Parameters:

Name Type Description
from_day_time

Start of sensing time in time of day; str | time | datetime

to_day_time

End of sensing time in time of day; str | time | datetime

levels

File levels to fetch; str | int | float | list[Any]

quality_control_version

For which QA version to fetch files; str | int

Note

ActrisAresGatherer() ActrisAresGatherer.STATIONS ActrisAresGatherer.WAVELENGTHS ActrisAresGatherer.FILETYPES ActrisAresGatherer.LEVELS ActrisAresGatherer.QA_VERSIONS ActrisAresGatherer.TAGS

STATIONS class-attribute instance-attribute

STATIONS = None

WAVELENGTHS class-attribute instance-attribute

WAVELENGTHS = None

FILETYPES class-attribute instance-attribute

FILETYPES = None

LEVELS class-attribute instance-attribute

LEVELS = None

QA_VERSIONS class-attribute instance-attribute

QA_VERSIONS = None

TAGS class-attribute instance-attribute

TAGS = None

__init__

__init__(*, caching_sub_location: Path | str = Path('actris_ares'), **kwargs)

pydiva.gathering.actris_cloudnet

ActrisCloudnetParameters

Bases: BaseModel

date class-attribute instance-attribute

date: str | date | datetime | None = None

date_from class-attribute instance-attribute

date_from: str | date | datetime | None = None

date_to class-attribute instance-attribute

date_to: str | date | datetime | None = None

site_id class-attribute instance-attribute

site_id: str | list[str] | None = None

updated_at class-attribute instance-attribute

updated_at: str | date | datetime | None = None

updated_at_from class-attribute instance-attribute

updated_at_from: str | date | datetime | None = None

updated_at_to class-attribute instance-attribute

updated_at_to: str | date | datetime | None = None

product class-attribute instance-attribute

product: str | list[str] | None = None

instrument_id class-attribute instance-attribute

instrument_id: str | list[str] | None = None

instrument_pid class-attribute instance-attribute

instrument_pid: str | list[str] | None = None

validate_date classmethod

validate_date(v)

validate_date_from classmethod

validate_date_from(v)

validate_date_to classmethod

validate_date_to(v)

validate_site_id classmethod

validate_site_id(v)

validate_updated_at classmethod

validate_updated_at(v)

validate_updated_at_from classmethod

validate_updated_at_from(v)

validate_updated_at_to classmethod

validate_updated_at_to(v)

validate_instrument_id classmethod

validate_instrument_id(v)

validate_instrument_pid classmethod

validate_instrument_pid(v)

validate_product classmethod

validate_product(v)

ActrisCloudnetGatherer

Bases: Gatherer

Wrapper for the ACTRIS Cloudnet api client (https://pypi.org/project/cloudnet-api-client/)

Example

cloudnet_gatherer = ActrisCloudnetGatherer() params = { "site_id": "hyytiala", "date": "2021-01-01", "product": ["mwr", "radar"], "updated_at_to": datetime.strptime( "2025-01-01T12:00:00", "%Y-%m-%dT%H:%M:%S" ) } cloudnet_gatherer.fetch(**params)

Parameters:

Name Type Description Default
date

Sensing time at this date; str | date | datetime

required
date_from

Sensing after this date; str | date | datetime

required
date_to

Sensing time before this date; str | date | datetime

required
site_id

Id of the site from which to fetch data: str | list[str]

required
updated_at

Fetch files updated on this date; str | date | datetime

required
updated_at_from

Fetch files updated after this date; str | date | datetime

required
updated_at_to

Fetch files updated before this date; str | date | datetime

required
product

Which products to fetch; str | list[str]

required
instrument_id

ID of the instruments from which to fetch; str | list[str]

required
instrument_pid

PID of the instruments from which to fetch; str | list[str]

required
overwrite_cache

Set True to redownload and overwrite existing cached files; default is False; bool

required
Note

ActrisCloudnetGatherer() ActrisCloudnetGatherer.SITES ActrisCloudnetGatherer.PRODUCTS ActrisCloudnetGatherer.INSTRUMENTS

SITES class-attribute instance-attribute

SITES = None

PRODUCTS class-attribute instance-attribute

PRODUCTS = None

INSTRUMENTS class-attribute instance-attribute

INSTRUMENTS = None

client instance-attribute

client = APIClient()

__init__

__init__(*, caching_sub_location: Path | str = Path('actris_cloudnet'), **kwargs)

pydiva.gathering.earthcare

Gatherer wrapper for oads-download https://github.com/koenigleon/oads-download/

ENV_USERNAME_KEY module-attribute

ENV_USERNAME_KEY = 'EARTHCARE_USERNAME'

ENV_PASSWORD_KEY module-attribute

ENV_PASSWORD_KEY = 'EARTHCARE_PASSWORD'

FILE_TYPES module-attribute

FILE_TYPES = ['ATL_NOM_1B', 'ATL_DCC_1B', 'ATL_CSC_1B', 'ATL_FSC_1B', 'MSI_NOM_1B', 'MSI_BBS_1B', 'MSI_SD1_1B', 'MSI_SD2_1B', 'BBR_NOM_1B', 'BBR_SNG_1B', 'BBR_SOL_1B', 'BBR_LIN_1B', 'CPR_NOM_1B', 'MSI_RGR_1C', 'AUX_MET_1D', 'AUX_JSG_1D', 'ATL_FM__2A', 'ATL_AER_2A', 'ATL_ICE_2A', 'ATL_TC__2A', 'ATL_EBD_2A', 'ATL_CTH_2A', 'ATL_ALD_2A', 'ATL_CLA_2A', 'MSI_CM__2A', 'MSI_COP_2A', 'MSI_AOT_2A', 'MSI_CLP_2A', 'CPR_FMR_2A', 'CPR_CD__2A', 'CPR_TC__2A', 'CPR_CLD_2A', 'CPR_APC_2A', 'CPR_ECO_2A', 'CPR_CLP_2A', 'AM__MO__2B', 'AM__CTH_2B', 'AM__ACD_2B', 'AC__TC__2B', 'AC__CLP_2B', 'BM__RAD_2B', 'BMA_FLX_2B', 'ACM_CAP_2B', 'ACM_COM_2B', 'ACM_RT__2B', 'ACM_CLP_2B', 'ALL_DF__2B', 'ALL_3D__2B', 'ALL_RAD_2B', 'MPL_ORBSCT', 'AUX_ORBPRE', 'AUX_ORBRES']

COLLECTIONS module-attribute

COLLECTIONS = ['EarthCAREL1Validated', 'EarthCAREL2Validated', 'EarthCAREXMETL1DProducts10', 'JAXAL2Validated', 'EarthCAREAuxiliary', 'EarthCAREL1InstChecked', 'EarthCAREL2InstChecked', 'JAXAL2InstChecked', 'EarthCAREL0L1Products', 'EarthCAREL2Products', 'JAXAL2Products']

FRAMES module-attribute

FRAMES = 'ABCDEFGH'

EarthCAREParameters

Bases: BaseModel

Pydantic model for EarthCARE download parameters. Uses validator to pass the expected types to the OADS downloader.

product_types class-attribute instance-attribute

product_types: list[str] = Field(..., min_length=1)

collections class-attribute instance-attribute

collections: list[str] = Field(..., min_length=1)

start_time class-attribute instance-attribute

start_time: str | None = None

end_time class-attribute instance-attribute

end_time: str | None = None

timestamps class-attribute instance-attribute

timestamps: list[str] | None = None
radius_search: list[str | float] | None = Field(None, min_length=3, max_length=3)

bounding_box class-attribute instance-attribute

bounding_box: list[str | float] | None = Field(None, min_length=4, max_length=4)

orbit_numbers class-attribute instance-attribute

orbit_numbers: list[int] | None = None

frame_ids class-attribute instance-attribute

frame_ids: list[str] | None = None

orbit_and_frames class-attribute instance-attribute

orbit_and_frames: list[str] | None = None

start_orbit_number class-attribute instance-attribute

start_orbit_number: int | None = None

end_orbit_number class-attribute instance-attribute

end_orbit_number: int | None = None

start_orbit_and_frame class-attribute instance-attribute

start_orbit_and_frame: str | None = None

end_orbit_and_frame class-attribute instance-attribute

end_orbit_and_frame: str | None = None

product_version class-attribute instance-attribute

product_version: str | None = None

download_idx class-attribute instance-attribute

download_idx: int | None = None

is_download class-attribute instance-attribute

is_download: bool = True

is_unzip class-attribute instance-attribute

is_unzip: bool = True

is_delete class-attribute instance-attribute

is_delete: bool = True

is_overwrite class-attribute instance-attribute

is_overwrite: bool = False

is_create_subdirs class-attribute instance-attribute

is_create_subdirs: bool = True

is_log class-attribute instance-attribute

is_log: bool = False

is_debug class-attribute instance-attribute

is_debug: bool = False

is_found_files_list_to_txt class-attribute instance-attribute

is_found_files_list_to_txt: bool = False

validate_product_types classmethod

validate_product_types(v)

Validate each product type.

validate_collections classmethod

validate_collections(v)

Validate collections are in the valid list.

validate_frame_ids classmethod

validate_frame_ids(v)

Validate frame IDs are A-H.

validate_radius_search(v)

Convert radius search to strings.

validate_bounding_box classmethod

validate_bounding_box(v)

Convert bounding box to strings.

validate_products_in_collections

validate_products_in_collections() -> Self

Verify that the selected collections contain the selected products

EarthCAREGatherer

Bases: Gatherer

Flexible wrapper for EarthCARE OADS download script.

Example

ec_gatherer = EarthCAREGatherer(username="user", password="pass") file_paths = ec_gatherer.fetch( product_types=["ATL_NOM_1B"], collections=["EarthCAREL1InstChecked"], start_time="2024-07-31T13:00:00Z", ... )

Fetches EarthCARE satellite data from ESA's Online Access and Distribution System (OADS) using the OpenSearch API. Downloaded files are stored in a temporary directory and file paths are returned.

Note

Standard Users: EarthCAREL1Validated, EarthCAREL2Validated, JAXAL2Validated, EarthCAREXMETL1DProducts10, EarthCAREOrbitData Cal/Val Users: All standard collections plus InstChecked variants and EarthCAREAuxiliary Commissioning Team: All standard collections plus Products and *L0L1Products variants

For more details check the EarthCARE Online Dissemination Service: https://ec-pdgs-dissemination2.eo.esa.int/oads/access/collection

Parameters:

Name Type Description Default
product_types

List of EarthCARE products, e.g. ["ATL_NOM_1B", "MSI_NOM_1B"] Supports short names e.g. "ANOM" and version specification e.g. "ANOM:AC"

required
collections

List of accessible collections, e.g. ["EarthCAREL1Validated", "EarthCAREL2Validated"] Must be from the valid collections list based on user access level

required

Other Parameters:

Name Type Description
start_time

Start of sensing time, e.g. "2024-07-31T13:00:00Z"

end_time

End of sensing time, e.g. "2024-07-31T14:00:00Z"

timestamps

List of specific timestamps to search for

orbit_numbers

List of orbit numbers, e.g. [981, 982, 983]

frame_ids

List of frame IDs (A-H), e.g. ["A", "B", "C"]

radius_search

Spatial search around point [radius_m, lat, lon], e.g. [25000, 51.35, 12.43]

bounding_box

Spatial search in box [latS, lonW, latN, lonE], e.g. [14.9, 37.7, 14.99, 37.78]

product_version

Two-letter processor baseline, e.g. "AC"

download_idx

Select single file by index from results

orbit_and_frames

Combined orbit/frame strings, e.g. ["00981E", "00982A"]

start_orbit_number/end_orbit_number

Orbit range bounds

start_orbit_and_frame/end_orbit_and_frame

Combined orbit/frame range bounds

is_download

Download files; default True

is_unzip

Extract downloaded archives; default True

is_delete

Delete archives after extraction; default True

is_overwrite

Overwrite existing files; default False

is_create_subdirs

Create organized subdirectory structure; default True

is_debug

Enable debug logging; default False

overwrite_cache

Set True to redownload and overwrite existing cached files; default False

__init__

__init__(*, caching_sub_location: Path | str = Path('earthcare'), **kwargs)

set_credentials classmethod

set_credentials() -> None

Prompt user for credentials and store them in environment variables

InterceptHandler

Bases: Handler

Straight from the loguru docs

emit

emit(record: LogRecord) -> None