Serving a TerraTorch model with vLLM#
All models to be served with vLLM via IOProcessor plugins must adhere to a
specific configuration structure. In the specific all models are required to
profide a configuration file named config.json that can be hosted on
HuggingFace, or in a local folder alongside the model weights (.pt file).
vLLM compatible model configuration#
The snippet below shows the structure of the config.json file for a Prithvi
300M model finetuned on the Sen1Floods11 dataset:
{
"architectures": ["Terratorch"],
"num_classes": 0,
"pretrained_cfg": {
"seed_everything": 0,
"input": {
"target": "pixel_values",
"data": {
"pixel_values": {
"type": "torch.Tensor",
"shape": [6, 512, 512]
},
"location_coords": {
"type": "torch.Tensor",
"shape": [1, 2]
}
}
},
"model": {
"class_path": "terratorch.tasks.SemanticSegmentationTask",
"init_args": {
"model_args": {
"backbone_pretrained": true,
"backbone": "prithvi_eo_v2_300_tl",
"decoder": "UperNetDecoder",
"decoder_channels": 256,
"decoder_scale_modules": true,
"num_classes": 2,
"rescale": true,
"backbone_bands": [
"BLUE",
"GREEN",
"RED",
"NIR_NARROW",
"SWIR_1",
"SWIR_2"
],
"head_dropout": 0.1,
"necks": [
{
"name": "SelectIndices",
"indices": [5, 11, 17, 23]
},
{
"name": "ReshapeTokensToImage"
}
]
},
"model_factory": "EncoderDecoderFactory",
"loss": "ce",
"ignore_index": -1,
"lr": 0.001,
"freeze_backbone": false,
"freeze_decoder": false,
"plot_on_val": 10
}
}
}
}
from the above we highlight two main sections: 1) vLLM required info, 2) model configuration.
vLLM Required Information Section#
At the top of the configuration file we find
These values are mandatory and must be kept unchanged.
Model Specification Section#
The model specification section is contained in the pretrained_cfg section of
the configuration file and is in turn composed of three sub-sections: 1) model
input specification, 2) model configuration and 3) datamodule configuration.
Model Input Specification#
The model info specification is necessary to support vLLM in performing a set of warm-up runs and to properly prepare the input for inference.
Below is an extract from the Prithvi 300M configuration file.
"input":{
"target": "pixel_values",
"data":{
"pixel_values":{
"type": "torch.Tensor",
"shape": [6, 512, 512]
},
"location_coords":{
"type":"torch.Tensor",
"shape": [1, 2]
}
}
}
From the above example, the data field is mandatory and contains one entry for
each input field, described with their type (e.g., torch.Tensor) and shape
(e.g., [6, 512, 512]). Also, we support models whose forward function accept
arguments in the form or named argument, or as a combination of one positional
argument and named arguments. The one positional argument is specified with the
target field. With the above input configuration vLLM would invoke the model
forward function as in the snippet below:
If no positional argument is required, the target field can be omitted and all
entries in the data field would be passed as named arguments.
Model and Datamodule Configuration#
The model configuration is contained in the model section of the configuration
file, while the datamodule configuration is in the data section. Please refer
to the full snippet at the beginning of this page. Both model and datamodule
configuration come straight from the model yaml configuration used when
performing inference via the LightningCLI. The presence of the data section is
mandatory.
Automatic generation of vLLM model configurations#
To help the user we have developed a script
(vllm_config_generator.py)
for automatically generating the config.json file. The script takes two
arguments: 1) the model configuration in yaml format, 2) the input
specification. The specification of the input can be passed to the script as
both a json string or the path to a json file. See the examples below.
python vllm_config_generator.py \
--ttconfig config.yaml \
-i '{"data":{"pixel_values":{"type": "torch.Tensor","shape": [6, 512, 512]},"location_coords":{"type":"torch.Tensor","shape": [1, 2]}}}'
cat << EOF > ./input_sample.json
{
"target": "pixel_values",
"data": {
"pixel_values": {
"type": "torch.Tensor",
"shape": [6,512,512]
},
"location_coords": {
"type": "torch.Tensor",
"shape": [1,2]
}
}
}
EOF
python vllm_config_generator.py \
--ttconfig config.yaml \
-i ./input_sample.json