Data Connector Validation Documentation
Documentation for the terrakit.validate modules.
pipeline_model
PipelineModel
Bases: BaseModel
A model for configuring the TerraKit Pipeline. This class defines the attributes common across all pipeline steps.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset_name |
str
|
Name of the dataset. Default is "terrakit_curated_dataset". |
working_dir |
Path
|
Working directory for the pipeline. Default is "./tmp". The directory is created if it does not already exist. |
Source code in terrakit/validate/pipeline_model.py
check_dataset_name
Validate that the dataset_name does not contain special characters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
The dataset name to validate. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
The validated dataset name. |
Source code in terrakit/validate/pipeline_model.py
check_working_dir
Validate and create the working directory if it does not exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
Path
|
The working directory path. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
The validated and existing working directory path. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provided path is not a directory. |
Source code in terrakit/validate/pipeline_model.py
pipeline_model_validation
Validate the TerraKit Pipeline model configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_name
|
str
|
Name of the dataset. |
required |
working_dir
|
str
|
Working directory for the pipeline. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
PipelineModel |
The validated PipelineModel instance. |
Raises:
| Type | Description |
|---|---|
TerrakitValidationError
|
If the provided arguments are invalid. |
Source code in terrakit/validate/pipeline_model.py
labels_model
LabelsModel
Bases: BaseModel
Model for configuration of the process labels TerraKit pipeline step.
Attributes:
| Name | Type | Description |
|---|---|---|
model_config |
ConfigDict
|
Configuration dictionary for the model. |
labels_folder |
Path
|
Path to the folder containing label files. |
active |
bool
|
Indicates if the labels step is active. Default is True. |
label_type |
Literal['vector']
|
Type of labels, currently only 'vector' is supported. Default is 'vector'. |
datetime_info |
Literal['filename', 'csv']
|
Specifies how datetime information is stored, either by 'filename' or 'csv'. Default is 'filename'. |
Source code in terrakit/validate/labels_model.py
check_labels_folder
Validates that the labels_folder exists, is not empty, and contains at least one supported file.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the labels_folder does not exist, is empty, or does not contain any supported files. |
Source code in terrakit/validate/labels_model.py
check_datetime_info
Placeholder for future validation of datetime_info.
Currently, no specific checks are implemented for datetime_info.
terrakit.validate.download_model
DateAllowance
Bases: BaseModel
Model for specifying date allowance around the target date.
Attributes:
| Name | Type | Description |
|---|---|---|
pre_days |
int
|
Number of days before the target date to include. Default is 0. |
post_days |
int
|
Number of days after the target date to include. Default is 7. |
Example
Source code in terrakit/validate/download_model.py
Transform
Bases: BaseModel
Model for specifying data transformation options.
Attributes:
| Name | Type | Description |
|---|---|---|
scale_data_xarray |
bool
|
Whether to scale the data using xarray. Default is True. |
impute_nans |
bool
|
Whether to impute NaN values. Default is True. |
reproject |
bool
|
Whether to reproject the data. Default is True. |
Example
Source code in terrakit/validate/download_model.py
reproject: bool = True
class-attribute
instance-attribute
INCLUDE NEW TRANSFORMATIONS HERE <<<
: bool = False
DataSource
Bases: BaseModel
Model for specifying data source configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
data_connector |
str
|
The data connector to use. Default is "sentinel_aws". |
collection_name |
str
|
The collection name to download. Default is "sentinel-2-l2a". |
bands |
list[str]
|
The bands to download. Default is ["blue", "green", "red"]. |
save_file |
str | None
|
The file path to save the downloaded data. Default is None. |
Example
Source code in terrakit/validate/download_model.py
DownloadModel
Bases: BaseModel
Model for configuring the download process.
Attributes:
| Name | Type | Description |
|---|---|---|
model_config |
ConfigDict
|
Configuration dictionary. |
transform |
Transform
|
Transformation options. |
date_allowance |
DateAllowance
|
Date allowance around the target date. |
active |
bool
|
Whether the download step is active. Default is True. |
max_cloud_cover |
int
|
Maximum cloud cover allowed. Default is 80. |
keep_files |
bool
|
Whether to keep redundent shapefiles. Default is False. |
datetime_bbox_shp_file |
str
|
File path for datetime bounding box shapefile. Default is "./terrakit_curated_dataset_all_bboxes.shp". |
labels_shp_file |
str
|
File path for labels shapefile. Default is "./tmp/terrakit_curated_dataset_labels.shp". |
data_sources |
list[DataSource]
|
List of data sources to download. Default is an empty list. |
Source code in terrakit/validate/download_model.py
tiling_model
terrakit.validate.data_connector
ConnectorType
Bases: BaseModel
Attributes:
| Name | Type | Description |
|---|---|---|
connector_type |
Literal
|
The type of connector to be use to download data. |
Example
orSource code in terrakit/validate/data_connector.py
connector_type: Literal['nasa_earthdata', 'sentinelhub', 'sentinel_aws', 'IBMResearchSTAC', 'TheWeatherCompany']
instance-attribute
The type of connector to be use to download data. nasa_earthdata, sentinelhub, sentinel_aws, IBMResearchSTAC or TheWeatherCompany
helpers
check_collection_exists
Check if the provided data_collection_name exists in the collections list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_collection_name
|
str
|
The name of the collection to check. |
required |
collections
|
list
|
A list of available collections. |
required |
Raises:
| Type | Description |
|---|---|
TerrakitValueError
|
If the collection does not exist. |
Source code in terrakit/validate/helpers.py
check_start_end_date
Validate the start and end dates ensuring the end date is after the start date.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date_start
|
str
|
The start date in ISO format (YYYY-MM-DD). |
required |
date_end
|
str
|
The end date in ISO format (YYYY-MM-DD). |
required |
Raises:
| Type | Description |
|---|---|
TerrakitValueError
|
If the date range is invalid. |
Source code in terrakit/validate/helpers.py
check_datetime
Validate a date string ensuring it's in ISO format and not in the future.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
bool
|
True if validating the start date, False for end date. |
required |
date_str
|
str
|
The date string to validate. |
required |
Raises:
| Type | Description |
|---|---|
TerrakitValueError
|
If the date format is incorrect or the date is in the future. |
Source code in terrakit/validate/helpers.py
check_area_polygon
For connector_types that do not yet support 'area_polygon', this function provides a check to use 'bbox' instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
area_polygon
|
The area polygon to check. |
required | |
connector_type
|
str
|
The type of connector. |
required |
Raises:
| Type | Description |
|---|---|
TerrakitValueError
|
If 'area_polygon' is provided instead of 'bbox'. |
Source code in terrakit/validate/helpers.py
check_bbox
Validate the bounding box ensuring it's a list of four floats and not a degenerate rectangle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bbox
|
list
|
The bounding box to check. |
required |
connector_type
|
str
|
The type of connector. |
required |
Raises:
| Type | Description |
|---|---|
TerrakitValueError
|
If the bounding box is invalid. |