Serving TerraTorch models with vLLM#

TerraTorch models can be served using the vLLM serving engine. Currently, only models using the SemanticSegmentationTask or PixelwiseRegressionTask tasks can be served with vLLM.

TerraTorch models can be served with vLLM in tensor-to-tensor or image-to-image mode. The tensor-to-tensor mode is the default mode and is natively enabled by vLLM. For the image-to-image mode, TerraTorch uses a feature in vLLM called IOProcessor plugins, enabling processing and generation of data in any modality (e.g., geoTiff). In TerraTorch, we provide pre-defined IOProcessor plugins. Check the list of available plugins.

To enable your model to be served via vLLM, follow the steps below:

Ensure TerraTorch Integration: Verify the model you want to serve is either already a core model, or learn how to add your model to TerraTorch.
Create a Model config.json: Create a vLLM compatible config.json.
Determine IOProcessor Plugin Needs: If serving in image-to-image mode, identify an IOProcessor plugin that suits your model or build one yourself.
Make your Model Accessible to vLLM: Host your model weights and config.json on Hugging Face, or store them in a local directory accessible by the vLLM instance.

To validate the steps above, start a vLLM serving instance that loads your model and perform an inference in tensor-to-tensor mode or in image-to-image mode.