Initializing and serving a model with vLLM in image-to-image mode#
This section shows an example of how to bootstrap a TerraTorch model on vLLM and perform an image-to-image inference, from a GeoTiff input to a GeoTiff output, using an IOProcessor plugin. This section assumes that you have prepared your model for serving with vLLM and you have identified the IOProcessor to be used.
The example in the rest of this document uses the
Prithvi-EO-2.0-300M-TL
model finetuned to segment the extent of floods on Sentinel-2 images from the
Sen1Floods11 dataset and the
terratorch_segmentation IOProcessor
plugin. However, the commands can be adapted to work with any other supported
models and plugins.
Starting the vLLM serving instance#
The information required to start the serving instance is the model identifier on HuggingFace and the name of the IOProcessor plugin. In this example:
- Model identifier:
ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11 - IO Processor plugin name:
terratorch_segmentation
To start the serving instance, run the below command:
vllm serve ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11 \
--io-processor-plugin terratorch_segmentation \
--model-impl terratorch \
--skip-tokenizer-init \
--enforce-eager \
--enable-mm-embeds
The snippet below shows the logs of a successfully initialized vLLM serving instance
(APIServer pid=532339) INFO 10-01 09:01:06 [launcher.py:44] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=532339) INFO 10-01 09:01:06 [launcher.py:44] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=532339) INFO 10-01 09:01:06 [launcher.py:44] Route: /invocations, Methods: POST
(APIServer pid=532339) INFO 10-01 09:01:06 [launcher.py:44] Route: /metrics, Methods: GET
(APIServer pid=532339) INFO: Started server process [532339]
(APIServer pid=532339) INFO: Waiting for application startup.
(APIServer pid=532339) INFO: Application startup complete.
Send An Inference Request To The Model#
TerraTorch models can be served in vLLM via the /pooling endpoint. The snippet
below shows an example payload that can be used to send an inference request to
the model when using the terratorch_segmentation IOProcessor plugin. Refer to
the documentation of the
available IOProcessors
for more information on the expected payload format.
request_payload = {
"data": {
"data": "https://huggingface.co/christian-pinto/Prithvi-EO-2.0-300M-TL-VLLM/resolve/main/valencia_example_2024-10-26.tiff",
"data_format": "url",
"out_data_format": "path",
"image_format": "geoTiff"
},
"model": "ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11",
"softmax": False
}
The user can save this payload in a file named payload.json.
With this payload the IOProcessor plugin will download the input geoTiff from a URL and return the path on local filesystem of the output geoTiff.
Assuming the vLLM server is listening on localhost:8000 the below snippet
shows how to send the inference request and retrieve the output file path.