Terramind Segmentation IOProcessor Plugin#

This plugin targets segmentation tasks for Terramind models, and assumes multimodal input data to be provided via URLs or file paths, organized in separate directories by modality. The plugin performs a tiled inference according to the parameters given in the model configuration. During initialization, the plugin accesses the model's data module configuration from the vLLM configuration and instantiates a DataModule object dynamically.

Currently, the plugin is targeting TerraMind models finetuned on the ImpactMesh dataset and expects exactly three modalities to be given in input: DEM, S1RTC, S2L2A.

This plugin is installed as terratorch_tm_segmentation.

Plugin specification#

Model requirements#

This plugin expects the model to take three parameters for inference, one per input modality.

Below an example input model specification accepted by this plugin. The user can change the shapes of the tensors according to their model requirements but the number and names of the fields must be kept unchanged to work with ImpactMesh data modules.

Model input specification accepted by the Terramind Segmentation IOProcessor plugin

"input": {
  "data": {
    "S2L2A": {
      "type": "torch.Tensor",
      "shape": [
        12,
        4,
        256,
        256
      ]
    },
    "S1RTC": {
      "type": "torch.Tensor",
      "shape": [
        2,
        4,
        256,
        256
      ]
    },
    "DEM": {
      "type": "torch.Tensor",
      "shape": [
        1,
        4,
        256,
        256
      ]
    }
  }
}

Full details on TerraTorch models input model specification for vLLM are available here.

Plugin configuration#

This plugin allows for additional configuration data to be passed via the TERRATORCH_SEGMENTATION_IO_PROCESSOR_CONFIG environment variable. If set, the variable should contain the plugin configuration in json string format.

The plugin configuration format is defined in the PluginConfig class.

`terratorch.vllm.plugins.segmentation.types.PluginConfig` #

Bases: BaseModel

Source code in terratorch/vllm/plugins/segmentation/types.py

class PluginConfig(BaseModel):
    output_path: str = None
    """
    Default output folder path to be used when the out_data_format is set to path. 
    If omitted, the plugin will default to the current user home directory.
    """

    @model_validator(mode="after")
    def validate_values(self) -> Self:
        if not self.output_path:
            self.output_path = str(Path.home())
        elif os.path.exists(self.output_path):
            if not os.access(self.output_path, os.W_OK):
                raise ValueError(f"The path: {self.output_path} is not writable")
        else:
            raise ValueError(f"The path: {self.output_path} does not exist")

        return self

`output_path = None` `class-attribute` `instance-attribute` #

Default output folder path to be used when the out_data_format is set to path. If omitted, the plugin will default to the current user home directory.

Request Data Format#

The input format for the plugin is defined in the RequestData class.

`terratorch.vllm.plugins.segmentation.types.RequestData` #

Bases: BaseModel

Source code in terratorch/vllm/plugins/segmentation/types.py

class RequestData(BaseModel):
    data_format: Literal["path", "url"]
    """
    Data type for the input image.
    Allowed values are: [`path`, `url`]
    """

    out_data_format: Literal["b64_json", "path"]
    """
    Data type for the output image.
    Allowed values are: [`b64_json`, `path`]
    """

    data: Any
    """
    Input image data
    """

    indices: Optional[list[int]] = None
    """
    Indices for bands to be processed in the input file
    """

    out_path: Optional[str] = None
    """
    Path to store the output image. Only used when out_data_format is set to 'path'
    """

`data` `instance-attribute` #

Input image data

`data_format` `instance-attribute` #

Data type for the input image. Allowed values are: [path, url]

`indices = None` `class-attribute` `instance-attribute` #

Indices for bands to be processed in the input file

`out_data_format` `instance-attribute` #

Data type for the output image. Allowed values are: [b64_json, path]

`out_path = None` `class-attribute` `instance-attribute` #

Path to store the output image. Only used when out_data_format is set to 'path'

The indices field is ignored by this plugin.

The optional out_path field allows you to specify a custom output directory for the generated GeoTiff file on a per requests basis, when out_data_format is set to "path". If out_path is not provided, the plugin will use the default output path from the plugin configuration (set via the TERRATORCH_SEGMENTATION_IO_PROCESSOR_CONFIG environment variable).

Example request payload with URL input and base64 output:

{
  "data_format": "url",
  "out_data_format": "b64_json",
  "data": {
    "DEM": "https://example.com/path/to/dem_file",
    "S1RTC": "https://example.com/path/to/S1RTC_file",
    "S2L2A": "https://example.com/path/to/S2L2A_file"
  }
}

Example request payload with path input and path output:

{
  "data_format": "path",
  "out_data_format": "path",
  "data": "/path/to/input/directory"
}

Example request payload with URL input and custom path output:

{
  "data_format": "url",
  "out_data_format": "path",
  "out_path": "/custom/output/directory",
  "data": {
    "DEM": "https://example.com/path/to/dem_file",
    "S1RTC": "https://example.com/path/to/S1RTC_file",
    "S2L2A": "https://example.com/path/to/S2L2A_file"
  }
}

Multimodal Data Organization#

The structure of the data field in the RequestData structure depends on the data_format field. When using URL-based input (data_format: "url"), the plugin expects one URL for each modality file.

For example, your request includes:

{
  "data_format": "url",
  "data": {
    "DEM": "https://example.com/path/to/dem_file",
    "S1RTC": "https://example.com/path/to/S1RTC_file",
    "S2L2A": "https://example.com/path/to/S2L2A_file"
  }
}

When using path-based input (data_format: "path"), provide the root directory path that already contains the modality subdirectories organized in the same structure.

{
  "data_format": "path",
  "data": "/path/to/input/directory/"
}

Your directory structure should look like this:

/path/to/input/directory/
├── DEM/
│   └── FILE_NAME_DEM.tiff
├── S1RTC/
│   └── FILE_NAME_S1RTC.zarr.zip
└── S2L2A/
    └── FILE_NAME_S2L2A.zarr.zip

Each modality has its own subdirectory containing the respective data files.

One input bundle per request supported

The plugin currently supports only one input bundle per request (one file per modality). Do not place more than one file in each subfolder.

Request Output Format#

The output format for the plugin is defined in the RequestOutput class.

`terratorch.vllm.plugins.segmentation.types.RequestOutput` #

Bases: BaseModel

Source code in terratorch/vllm/plugins/segmentation/types.py

class RequestOutput(BaseModel):
    data_format: Literal["b64_json", "path"]
    """
    Data type for the output image.
    Allowed values are: [`b64_json`, `path`]
    """

    data: Any
    """
    Output image data
    """

    request_id: Optional[str] = None
    """
    The vLLM request ID if applicable
    """

`data` `instance-attribute` #

Output image data

`data_format` `instance-attribute` #

Data type for the output image. Allowed values are: [b64_json, path]

`request_id = None` `class-attribute` `instance-attribute` #

The vLLM request ID if applicable

Plugin Defaults#

Tiled Inference Parameters#

By default the plugin uses the same horizontal and vertical crop value of 512 when computing image tiles. Users can use different crop values by specifying them in their model config.json file. See the example below that overrides the default values with vertical and horizontal crop values of 256.

Custom tiled inference parameters in model configuration

{
  "pretrained_cfg": {
    "model": {
      "init_args": {
        "tiled_inference_parameters": {
          "h_crop": 256,
          "w_crop": 256,
          "delta": 8
        }
      }
    }
  }
}

Please note, the tiled_inference_parameters field is not mandatory in the model configuration. Full details on the model configuration file can be found here.

Full details on the available tiled inference parameters are available in the TiledInferenceParameters class.

`terratorch.vllm.plugins.segmentation.types.TiledInferenceParameters` #

Bases: BaseModel

Source code in terratorch/vllm/plugins/segmentation/types.py

class TiledInferenceParameters(BaseModel):
    h_crop: int = 512
    h_stride: int = None
    w_crop: int = 512
    w_stride: int = None
    average_patches: bool = True
    delta: int = 8
    blend_overlaps: bool = True
    padding: str | bool = "reflect"

Default Output Directory#

If no out_path is specified in the request payload and no output folder is configured in the plugin configuration (via the TERRATORCH_SEGMENTATION_IO_PROCESSOR_CONFIG environment variable), the plugin will default to writing output files to the user's home directory. This default only impacts requests that set out_data_format: "path".

Data Module Configuration#

This plugin dynamically instantiates a data module based on the configuration in the model's config.json file. The data module is then used for loading the input data. By default, the plugin configures the data module in predict mode and sets the predict_data_root of the DataModule to the input data folder. More info on the data module configuration can be found here.

Using a different data module

This plugin currently supports ImpactMesh, imposing a certain structure for the input data. e.g., a DEM input file or subfolder is always expected to be present, and is used for retrieving the input file metadata. Users interested in using a different data module might do so but they will have to guarantee the same behavior as the ImpactMesh ones.

Terramind Segmentation IOProcessor Plugin#

Plugin specification#

Model requirements#

Plugin configuration#

terratorch.vllm.plugins.segmentation.types.PluginConfig #

output_path = None class-attribute instance-attribute #

Request Data Format#

terratorch.vllm.plugins.segmentation.types.RequestData #

data instance-attribute #

data_format instance-attribute #

indices = None class-attribute instance-attribute #

out_data_format instance-attribute #

out_path = None class-attribute instance-attribute #

Multimodal Data Organization#

Request Output Format#

terratorch.vllm.plugins.segmentation.types.RequestOutput #

data instance-attribute #

data_format instance-attribute #

request_id = None class-attribute instance-attribute #

Plugin Defaults#

Tiled Inference Parameters#

terratorch.vllm.plugins.segmentation.types.TiledInferenceParameters #

Default Output Directory#

Data Module Configuration#

`terratorch.vllm.plugins.segmentation.types.PluginConfig` #

`output_path = None` `class-attribute` `instance-attribute` #

`terratorch.vllm.plugins.segmentation.types.RequestData` #

`data` `instance-attribute` #

`data_format` `instance-attribute` #

`indices = None` `class-attribute` `instance-attribute` #

`out_data_format` `instance-attribute` #

`out_path = None` `class-attribute` `instance-attribute` #

`terratorch.vllm.plugins.segmentation.types.RequestOutput` #

`data` `instance-attribute` #

`data_format` `instance-attribute` #

`request_id = None` `class-attribute` `instance-attribute` #

`terratorch.vllm.plugins.segmentation.types.TiledInferenceParameters` #