TerraKit: CLI¶
We can also run TerraKit via a CLI. This notebook will demonstrate going from a set of vector labels to a taco dataset using a single data connector. To try out the TerraKit CLI, just type terrakit!
All parameters specified in this notebook can be put in a yaml, and can be run using the commands below. In this example the configuration file can be found in docs/examples/config.yaml.
# Let's take a look at the TerraKit CLI options
!terrakit -h
We can add some basic imports to get started, including a helper function to download some example labels if they don't already exist.
import os
from glob import glob
import warnings
from pathlib import Path
from terrakit.general_utils.labels_downloader import (
rapid_mapping_geojson_downloader,
EXAMPLE_LABEL_FILES,
)
warnings.filterwarnings("ignore")
0. Download example labels, define a working directory and a dataset name¶
Download some labels if none already exist. Here we download the labels to ./test_wildfire_vector.
LABELS_FOLDER = "./test_wildfire_vector"
if (
Path(LABELS_FOLDER).is_dir() is False
or set(EXAMPLE_LABEL_FILES).issubset(glob(f"{LABELS_FOLDER}/*.json")) is False
):
rapid_mapping_geojson_downloader(
event_id="748",
aoi="01",
monitoring_number="05",
version="v1",
dest=LABELS_FOLDER,
)
rapid_mapping_geojson_downloader(
event_id="801",
aoi="01",
monitoring_number="02",
version="v1",
dest=LABELS_FOLDER,
)
Assuming this notebook is run from ./docs/examples, let's jump back to the project root so that the same config.yaml can be used if running the TerraKit CLI from this project's root directory.
root_dir = os.getcwd().split("/docs/examples")[0]
os.chdir(root_dir)
print(os.getcwd())
Use the config.yaml file to define a working directory that will be used for any outputs from TerraKit including downloading tiles, chipping the data and storing in a chosen format. By default, the working directory is set to ./tmp and the dataset name is set to terrakit_curated_dataset. The working directory does not need to exist beforehand. It is best to start with an empty working directory as TerraKit will look up and also delete certain files from this directory.
# config.yaml
"""
dataset_name: "terrakit_curated_dataset"
working_dir: "./tmp/terrakit_curated_dataset"
"""
1. Process labels using TerraKit CLI¶
Specify a labels folder in the config.yaml file using the labels_folder parameter to get started processing labels. All of the other arguments are optional. Here we also set the label_type to vector and specify that the label datetime information can be found in the filename by setting datetime_info to filename.
# config.yaml
"""
labels:
labels_folder: "./docs/examples/test_wildfire_vector"
label_type: vector
datetime_info: filename
active: True
"""
Run the terrakit CLI using the labels option to process the labels inside the labels_folder. This will output a pair of shapefiles in the working directory used by the download step coming up next.
!terrakit --config docs/examples/config.yaml labels
2. Download data using terrakit CLI¶
Here we specify the download parameters.
To specify more than one data connector, simply extend the data_source mapping. For example:
data_source:
- data_connector: "sentinel_aws"
collection_name: "sentinel-2-l2a"
bands: ["blue", "green", "red"]
- data_connector: "sentinelhub"
collection_name: s2_l2a
bands: ["B04", "B03", "B02"]
To find out about the different data connectors available, take a look at the Data Connector section of the docs.
# config.yaml
"""
download:
active: True
max_cloud_cover: 80
keep_files: False
data_source:
- data_connector: "sentinel_aws"
collection_name: "sentinel-2-l2a"
bands: ["blue", "green", "red"]
save_file: "sentinelaws_s2_l2a_cli_test" # Default: ./{working_dir}/{collection_name}/{tile_id}.tif
date_allowance:
pre_days: 0
post_days: 21
transform:
scale_data_xarray: True
impute_nans: true
reproject: True
"""
!terrakit --config docs/examples/config.yaml download
3 Chip the downloaded data using TerraKit CLI¶
# config.yaml
"""
chip:
sample_dim: 256
"""
!terrakit --config docs/examples/config.yaml chip
4 Store the downloaded data in a taco using TerraKit CLI¶
# config.yaml
"""
store:
active: True
tortilla_name: "terrakit_curated_tortilla"
"""
!terrakit --config docs/examples/config.yaml store