004-Create-User-Defined-Tuning-Templates¶
📥 Download 004-Create-User-Defined-Tuning-Templates.ipynb and try it out
Introduction¶
This notebook is intended to be an introduction to creating and managing Tuning templates using the Geospatial Studio SDK. The tutorial assumes the user wishes to use an existing template and data in the studio.
For more information about the Geospatial Studio see the docs page: Geospatial Studio Docs.
Prerequisites¶
It is assumed that you have installed the Geospatial Studio SDK and have a network connection to the platform. Instructions for both can be found here: Geospatial Studio SDK Docs
%load_ext autoreload
%autoreload 2
# first import the required packages
import json
import uuid
import yaml
import base64
from IPython.display import display, Markdown
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
from geostudio import Client
from geostudio import gswidgets
Connecting to the platform¶
First, we set up the connection to the platform backend. To do this we need the base url for the studio UI and an API key.
To get an API Key:
- Go to the Geospatial Studio UI page and navigate to the Manage your API keys link.
- This should pop-up a window where you can generate, access and delete your api keys. NB: every user is limited to a maximum of two activate api keys at any one time.
Store the API key and geostudio ui base url in a credentials file locally, for example in /User/bob/.geostudio_config_file. You can do this by:
echo "GEOSTUDIO_API_KEY=<paste_api_key_here>" > .geostudio_config_file
echo "BASE_STUDIO_UI_URL=<paste_ui_base_url_here>" >> .geostudio_config_file
Copy and paste the file path to this credentials file in call below.
#############################################################
# Initialize Geostudio client using a geostudio config file
#############################################################
gfm_client = Client(geostudio_config_file=".geostudio_config_file")
Browse existing task templates¶
tasks = gfm_client.list_tune_templates(output="df")
display(tasks[["id","description","created_by","updated_at"]])
Browse existing datasets¶
datasets = gfm_client.list_datasets(output="df")
display(datasets[['id','description','created_by','updated_at']])
selected_dataset = "geodata-gdctf3vb3znbbtgptqvuku"
# selected_dataset = datasets["dataset_id"][0]
Create your own task¶
A new task is expected to have these params in the payload:
{
"name": "",
"description": "",
"purpose": "Other", # Must be Other
"content": "", # base64 encoding of the template
"extra_info":{"runtime_image": "us.icr.io/gfmaas/geostudio-ft-deploy:v0.99.9.post1-117", "model_framework": "terratorch-v2"},
"model_params":{},
"dataset_id": selected_dataset # dataset_id
}
Generate the base64 encoding of your template below and add it to the payload.
def encode_file_to_base64(file_path):
with open(file_path, "rb") as file:
# Read the file in binary mode
file_content = file.read()
# Encode the content to base64
base64_encoded = base64.b64encode(file_content)
# Decode the Base64 bytes into a string (if needed)
base64_string = base64_encoded.decode('utf-8')
return base64_string
encoded_content = encode_file_to_base64("../sample_files/sample-convnext-config.yaml")
encoded_content
created_task = gfm_client.create_task(
data={
"name": "user-new-task",
"description": "user new task",
"purpose": "Other", # DO NOT CHANGE THIS
"content": encoded_content, # base64 encoding of the template
"extra_info": {
"runtime_image": "us.icr.io/gfmaas/geostudio-ft-deploy:feat-update_tt_version-142",
"model_framework": "terratorch-v2",
},
"model_params": {},
"dataset_id": selected_dataset # dataset id
}
)
created_task
Review the created task with the original template¶
# Now we can get the task template yaml for the selected task. This can
# be returned as a string in a new cell which can be updated and edited in the
# notebook, as a file (by setting output='file') or a text string ('text').
tt = gfm_client.get_task_template(created_task["id"], output='cell')
Review the created task with the replaced template with the provided dataet¶
print(selected_dataset)
print(created_task['id'])
rendered_template = gfm_client.render_template(task_id = created_task['id'], dataset_id=selected_dataset, output="cell")
display(Markdown(f"```yaml\n{rendered_template}\n```"))
Update the created task if the the above blocks are not correct¶
If the template is not formated as expected, uncomment the cells below to update it
# updated_file_path = "../sample_files/sample-convnext-config.yaml"
# gfm_client.update_task(task_id="37382f53-44dc-4667-9270-634757a33c67",
# file_path=updated_file_path)
# rendered_template = gfm_client.render_template(task_id = created_task['id'], dataset_id = selected_dataset, output="text")
# display(Markdown(f"```yaml\n{rendered_template}\n```"))
Now you can use this task to run a Fine tuning¶
You can now submit a tune. Change the name and description
tune_payload= {
"name": "burns-user-defined",
"description": "test-user-defined-config with burnscars data",
"dataset_id": selected_dataset,
"tune_template_id": created_task['id'],
"model_parameters": {
"runner": {
"max_epochs": "10"
}
},
"train_options": {
"tune_type":"user-defined", # Required.
"image": "us.icr.io/gfmaas/geostudio-ft-deploy:feat-update_tt_version-142", # Specific terratorch version to override
}
}
submitted_tune = gfm_client.submit_tune(data=tune_payload)
submitted_tune
# If you wish to you can keep polling the tuning task to monitor its progress. Note that this operation is very expensive
# r = gfm_client.poll_finetuning_until_finished(tune_id=submitted_tune['tune_id'])
# Get Mlflow urls for the tune
gfm_client.get_mlflow_metrics(submitted_tune['tune_id'])
Now that you have created and updated your task, you can proceed to carry out fine-tuning following the instructions in the fine-tuning tutorial.
Delete a task¶
gfm_client.delete_task(task_id = created_task['id'])
rendered_template = gfm_client.render_template(task_id = created_task['id'], dataset_id = selected_dataset)
rendered_template