Fine-tuning
geostudio.backends.v2.gtune.client
Client
Client(
api_config: GeoFmSettings = None,
session: Session = None,
api_token: str = None,
api_key: str = None,
api_key_file: str = None,
geostudio_config_file: str = None,
*args,
**kwargs
)
Bases: BaseClient
Source code in geostudio/backends/base_client.py
list_tunes
list_tunes(output: str = 'json')
Lists all fine tuning jobs in the studio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the list of tunes found. |
Source code in geostudio/backends/v2/gtune/client.py
get_tune
Retrieves a tune by ID. If the tune's status is Failed, a pre-signed url for the logs is generated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tune_id
|
str
|
The unique identifier of the tune to retrieve. |
required |
output
|
str
|
The desired output format. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The tune's status and information |
Source code in geostudio/backends/v2/gtune/client.py
update_tune
update_tune(
tune_id: str, data: TuneUpdateIn, output: str = "json"
)
Update a tune in the database
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tune_id
|
str
|
The unique identifier of the tune to be updated. |
required |
data
|
TuneUpdateIn
|
A dictionary containing the data to update for the tune. |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary of the updated tune. |
Source code in geostudio/backends/v2/gtune/client.py
delete_tune
delete_tune(tune_id, output: str = 'json')
Deletes a specified tune using its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tune_id
|
str
|
The ID of the tune to be deleted. |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Message of successfully deleted tune |
Source code in geostudio/backends/v2/gtune/client.py
submit_tune
submit_tune(data: TuneSubmitIn, output: str = 'json')
Submit a fine-tuning job to the Geospatial studio platform
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
TuneSubmitIn
|
Parameters for the tuning job. |
required |
output
|
str
|
The desired output format. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The server's response containing the submitted tune info. |
Source code in geostudio/backends/v2/gtune/client.py
submit_hpo_tune
submit_hpo_tune(
data: HpoTuneSubmitIn, output: str = "json"
)
Submit a fine-tuning job with terratorch-iterate enabled.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
HpoTuneSubmitIn
|
Parameters for the tuning job |
required |
output
|
str
|
The desired output format. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The server's response containing the submitted tune info. |
Source code in geostudio/backends/v2/gtune/client.py
upload_completed_tunes
upload_completed_tunes(data: UploadTuneInput)
Upload a completed fine-tuning job to the Geostudio platform
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
UploadTuneInput
|
Parameters to update the tune with |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Message of successfully uploaded tune |
Source code in geostudio/backends/v2/gtune/client.py
try_out_tune
try_out_tune(tune_id: str, data: TryOutTuneInput)
Try-out inference on a tune without deploying the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tune_id
|
str
|
The unique identifier of the tune experiment. |
required |
data
|
TryOutTuneInput
|
The inference configurations to try the tune on |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Dictionary containing the details of the inference submitted. |
Source code in geostudio/backends/v2/gtune/client.py
download_tune
Downloads a tuned model from the server.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tune_id
|
str
|
The unique identifier of the tuned model to download. |
required |
output
|
str
|
The desired output format. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Dictionary with tune details including presigned urls to download the artifacts. |
Source code in geostudio/backends/v2/gtune/client.py
get_mlflow_metrics
Retrieves the MLflow URLs for the training and testing metrics of a given Tune experiment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tune_id
|
str
|
The ID of the Tune experiment. |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the MLflow URLs for the training and testing metrics. The dictionary will have the keys "train_mlflow_url" and "test_mlflow_url". If no metrics are found, the value for "train_mlflow_url" will be None. |
Source code in geostudio/backends/v2/gtune/client.py
get_tune_metrics
Retrieves the MLflow metrics for a specific tune.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tune_id
|
str
|
The unique identifier of the tune. |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The metrics of the tune in the specified format. |
Source code in geostudio/backends/v2/gtune/client.py
get_tune_metrics_df
Retrieves the MLflow metrics for a specific tune and displays them in a pandas DataFrame
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tune_id
|
str
|
The unique identifier of the tune. |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: A pandas DataFrame containing the tuning metrics. |
Source code in geostudio/backends/v2/gtune/client.py
list_tuning_artefacts
list_tuning_artefacts(tune_id: str)
Resolve the MLflow training run referenced by a tune and list artefact paths.
This function: - Calls gfm_client.get_tune(tune_id) to obtain tune metadata that contains a reference to the MLflow training run (expected under a metric named 'Train'). - Queries the MLflow server's artifacts list endpoint for that run to obtain available artifact file paths.
Parameters
tune_id : str Identifier of the tune (used to lookup metrics that contain the MLflow train run).
Returns
tuple[list[str], str] A tuple (art_files, train_run_id) where: - art_files : list[str] — list of artifact paths returned by MLflow (from the 'files' array -> each element's 'path'). - train_run_id : str — the resolved MLflow run id extracted from the tune metadata.
Notes
- The function expects the tune metadata (gfm_client.get_tune) to include a metric mapping containing a 'Train' entry whose value includes the MLflow run UUID (the run id is taken as the last path segment after splitting on '/').
- The MLflow artifacts list response is assumed to include a JSON 'files' array where each item has a 'path' key.
- Example: art_files, run_id = list_tuning_artefacts('geotune-xxxxx', 'https://my-mlflow')
Source code in geostudio/backends/v2/gtune/client.py
get_tuning_artefacts
Download fine‑tuning artefact images from an MLflow run referenced by a tune.
This function: - Resolves the MLflow training run id for the given tune via gfm_client.get_tune(...) - Lists artefacts for that run from the MLflow server - Optionally filters artefact filenames by epoch and/or image number - Downloads matching artefacts in parallel and returns a list of records
Parameters
tune_id : str Identifier of the tune (used to lookup metrics that contain the MLflow train run). epochs : list[int], optional If provided, only artefacts whose filename encodes an epoch contained in this list are retained. Filenames are assumed to contain epoch as the second underscore-separated token (e.g. "epoch_4_5.png" -> epoch 4). image_numbers : list[int], optional If provided, only artefacts whose filename encodes an image number contained in this list are retained. Filenames are assumed to contain the image number as the third underscore-separated token (e.g. "epoch_4_5.png" -> image_number 5).
Returns
list[dict] A list of dictionaries, one per downloaded artefact, with keys: - 'filename' (str): artefact path from MLflow - 'image' (bytes): raw downloaded bytes - 'epoch' (int): parsed epoch number - 'image_number' (int): parsed image/sample number
Notes
- Downloads are performed in parallel using joblib (threads).
- The function assumes artefact filenames follow the pattern containing
"epoch_
_ . ".
Source code in geostudio/backends/v2/gtune/client.py
list_tune_templates
list_tune_templates(output: str = 'json')
Lists tune templates studio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the list of tune templates in the studio |
Source code in geostudio/backends/v2/gtune/client.py
create_task
Creates a new task using the provided data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
TaskIn
|
The data required to create a new task. |
required |
output
|
str
|
The desired output format. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The response from the server containing the details of the newly created task. |
Source code in geostudio/backends/v2/gtune/client.py
get_task
Retrieves a task by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
The ID of the task to retrieve. |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The response from the server containing the task details. |
Source code in geostudio/backends/v2/gtune/client.py
delete_task
delete_task(task_id, output: str = 'json')
Deletes a task with the given task_id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
The unique identifier of the task to be deleted. |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Message of successfully deleted task. |
Source code in geostudio/backends/v2/gtune/client.py
get_task_template
Retrieves the task template yaml for the selected task
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
The ID of the task to retrieve. |
required |
output
|
str
|
The format of the response. Can either be "cell", "text" or "file". |
'text'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The response from the server containing the task template yaml. |
Source code in geostudio/backends/v2/gtune/client.py
update_task
Updates a task's content with a yaml file config
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
The ID of the task to upload. |
required |
file_path
|
str
|
The path to the file containing the new template. |
required |
output
|
str
|
The desired output format. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Message of successful task upload |
Source code in geostudio/backends/v2/gtune/client.py
update_task_schema
Update the JSONSchema of a task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
The ID of the task to update. |
required |
output
|
str
|
The desired output format. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Message of successful task update |
Source code in geostudio/backends/v2/gtune/client.py
get_task_param_defaults
get_task_param_defaults(task_id: str)
Retrieves the default parameter values for a given task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
The unique identifier of the task. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the default parameter values for the task. The keys are the parameter names and the values are the default values. |
Source code in geostudio/backends/v2/gtune/client.py
check_task_content
Checks that the the task renders correctly
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
The ID of the task to check. |
required |
output
|
str
|
The format of the returned template. Can be "text", "cell", or "file". Defaults to "text". |
'text'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Message of task content |
Source code in geostudio/backends/v2/gtune/client.py
render_template
Checks that the the user defined task renders correctly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
The ID of the task to check. |
required |
dataset_id
|
str
|
The ID of the dataset associated with the task. |
required |
output
|
str
|
The format of the returned template. Can be "text", "cell", or "file". Defaults to "text". |
'text'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The rendered template in the specified output format. |
Source code in geostudio/backends/v2/gtune/client.py
list_datasets
list_datasets(output: str = 'json')
Lists all datasets available in the studio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing a list of datasets found in the dataset factory |
Source code in geostudio/backends/v2/gtune/client.py
pre_scan_dataset
pre_scan_dataset(
data: PreScanDatasetIn, output: str = "json"
)
Scans a new dataset - checks accessibility of the dataset URL, ensures corresponding data and label files are present, and extracts bands and their descriptions from the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
PreScanDatasetIn
|
Link to the dataset to scan |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the scan results. |
Source code in geostudio/backends/v2/gtune/client.py
get_sample_images
Retrieves a sample of images from a specified dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
The unique identifier of the dataset. |
required |
output
|
str
|
The desired output format. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing the sample data in the requested format. |
Source code in geostudio/backends/v2/gtune/client.py
update_dataset
update_dataset(
dataset_id: str,
data: DatasetUpdateIn,
output: str = "json",
)
Update a dataset metadata in the database
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
The unique identifier of the dataset to be updated. |
required |
data
|
DatasetUpdateIn
|
A dictionary containing the data to update for the dataset. |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary of the updated dataset. |
Source code in geostudio/backends/v2/gtune/client.py
get_dataset
Retrieves a dataset from the studio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
The unique identifier of the dataset to retrieve. |
required |
output
|
str
|
The format of the response. Default is "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Information about the dataset found. |
Source code in geostudio/backends/v2/gtune/client.py
delete_dataset
Deletes a dataset with the given ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
The ID of the dataset to delete. |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary with a message after dataset is deleted |
Source code in geostudio/backends/v2/gtune/client.py
onboard_dataset
onboard_dataset(
data: DatasetOnboardIn, output: str = "json"
)
Onboards a new dataset to the Geospatial studio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DatasetOnboardIn
|
The dataset information to be onboarded. |
required |
output
|
str
|
The desired output format. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing information about the onboarded dataset. |
Source code in geostudio/backends/v2/gtune/client.py
list_base_models
list_base_models(output: str = 'json')
Lists all available base foundation models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing a list of base foundation models available in the studio |
Source code in geostudio/backends/v2/gtune/client.py
create_base_model
create_base_model(data: BaseModelsIn, output: str = 'json')
Create a base foundation model in the Studio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
data
|
BaseModelsIn
|
Parameters for creating the base model. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing a list of base foundation models available in the studio |
Source code in geostudio/backends/v2/gtune/client.py
get_base_model
Get base foundation model by id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_id
|
str
|
Base model ID |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The Found base model |
Source code in geostudio/backends/v2/gtune/client.py
update_base_model_params
update_base_model_params(
base_id: str,
data: BaseModelParamsIn,
output: str = "json",
)
Update base foundation model params.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_id
|
str
|
Base model ID. |
required |
data
|
BaseModelParamsIn
|
Base model params to update. |
required |
output
|
str
|
The format of the response. Defaults to "json". |
'json'
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Updates Base model params |
Source code in geostudio/backends/v2/gtune/client.py
poll_onboard_dataset_until_finished
Polls the status of an onboard dataset until it finishes processing. Defaults to a minimum of 5seconds poll frequency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
The unique identifier of the dataset being onboarded. |
required |
poll_frequency
|
int
|
The time interval in seconds between polls. Defaults to 5 seconds. |
10
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The final status of the dataset, either "Succeeded" or "Failed". |
Source code in geostudio/backends/v2/gtune/client.py
poll_finetuning_until_finished
Polls the status of a tune until it finishes or fails.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tune_id
|
str
|
The unique identifier of the tune to poll. |
required |
poll_frequency
|
int
|
The time interval in seconds between polls. Defaults to 5 seconds. |
10
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
The final status of the tune, including details such as the number of epochs and any error messages if the tune failed. |