Skip to content

Architecture OverviewΒΆ

Understanding the Geospatial Studio architecture helps you make the most of the platform and troubleshoot issues effectively.

πŸ—οΈ High-Level ArchitectureΒΆ

Geospatial Studio follows a layered microservices architecture deployed on Kubernetes/OpenShift:

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#0f62fe','primaryTextColor':'#fff','primaryBorderColor':'#fff','lineColor':'#8a3ffc','secondaryColor':'#33b1ff','tertiaryColor':'#42be65'}}}%%
graph LR
    subgraph "Client Layer"
        UI[Studio UI]
        SDK[Python SDK]
        API[REST API]
    end

    subgraph "Security Layer"
        Auth[IAM Provider<br/>Keycloak/IBM Verify/Okta]
    end

    subgraph "Application Layer"
        Gateway[Gateway API<br/>Orchestration]
    end

    subgraph "Data & Storage Layer"
        DB[(PostgreSQL)]
        Cache[(Redis)]
        Storage[(Object Storage<br/>MinIO/S3/COS)]
        GeoServer[GeoServer<br/>Visualization]
    end

    subgraph "ML Platform Layer"
        MLflow[MLflow<br/>Tracking]
    end

    subgraph "Processing Layer"
        Pipelines[Kubernetes Pipelines<br/>Dataset β€’ Training β€’ Inference]
    end

    UI --> Auth
    SDK --> Auth
    API --> Auth
    Auth --> Gateway
    Gateway --> DB
    Gateway --> Cache
    Gateway --> Storage
    Gateway --> MLflow
    Gateway --> GeoServer
    Gateway --> Pipelines
    Pipelines --> Storage
    Pipelines --> MLflow
    Pipelines --> GeoServer

    style UI fill:#0f62fe,stroke:#fff,stroke-width:2px,color:#fff
    style SDK fill:#0f62fe,stroke:#fff,stroke-width:2px,color:#fff
    style API fill:#0f62fe,stroke:#fff,stroke-width:2px,color:#fff
    style Auth fill:#8a3ffc,stroke:#fff,stroke-width:2px,color:#fff
    style Gateway fill:#33b1ff,stroke:#fff,stroke-width:2px,color:#fff
    style DB fill:#007d79,stroke:#fff,stroke-width:2px,color:#fff
    style Cache fill:#007d79,stroke:#fff,stroke-width:2px,color:#fff
    style Storage fill:#007d79,stroke:#fff,stroke-width:2px,color:#fff
    style GeoServer fill:#007d79,stroke:#fff,stroke-width:2px,color:#fff
    style MLflow fill:#42be65,stroke:#fff,stroke-width:2px,color:#fff
    style Pipelines fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff

Architecture Layers:

  1. Client Layer: Multiple interfaces (UI, SDK, API) for different user needs
  2. Security Layer: OAuth2-based authentication with flexible IAM provider options (Keycloak, IBM Verify, Okta)
  3. Application Layer: Gateway API orchestrates all backend services and workflows
  4. Data & Storage Layer: Flexible storage options - in-cluster (PostgreSQL, MinIO) or cloud-managed (IBM Cloud, AWS, Azure, GCP), plus GeoServer for geospatial visualization
  5. ML Platform Layer: MLflow for experiment tracking and model versioning
  6. Processing Layer: Kubernetes-native pipelines for dataset processing, model training, and inference execution

πŸ”§ Core ComponentsΒΆ

1. Gateway APIΒΆ

Purpose: Central orchestration point for all backend services

Responsibilities: - Route requests to appropriate services - Manage authentication and authorization - Coordinate complex workflows - Handle API versioning

Technology: FastAPI (Python)

Endpoints: - /v2/models - Model management - /v2/datasets - Dataset operations - /v2/tunes - Fine-tuning tasks - /v2/inferences - Inference execution

2. Studio UIΒΆ

Purpose: No-code web interface for visual interaction

Features: - Interactive map visualization - Dataset catalog and preview - Model catalog and management - Training progress monitoring - Inference configuration and execution

Technology: Web Components, Carbon Design System

Pages: - Home - Quick access to key features - Data Catalog - Browse and manage datasets - Model Catalog - Browse and manage models - Inference Lab - Run and visualize inferences

3. Authentication (Keycloak)ΒΆ

Purpose: Secure access control and user management

Features: - OAuth2/OpenID Connect - User and role management - API key generation - Single sign-on (SSO)

Default Credentials (Development): - Admin: admin / admin - Test User: testuser / testpass123

4. Database (PostgreSQL)ΒΆ

Purpose: Store metadata and application state

Stores: - User information - Dataset metadata - Model configurations - Training job details - Inference history - API keys

Version: PostgreSQL 15.x

5. Object Storage (MinIO)ΒΆ

Purpose: Store large files and artifacts

Stores: - Training datasets - Model checkpoints - Inference outputs - Logs and artifacts

S3-Compatible: Can be replaced with AWS S3, IBM Cloud Object Storage, etc.

Buckets: - datasets - Training data - models - Model artifacts - inferences - Inference outputs - mlflow - MLflow artifacts

6. Cache & Queue (Redis)ΒΆ

Purpose: Performance optimization and task management

Uses: - API response caching - Session management - Message queuing for async tasks - Rate limiting

Version: Redis 8.x

7. Experiment Tracking (MLflow)ΒΆ

Purpose: Track and compare model training experiments

Features: - Log training metrics - Store model artifacts - Compare experiments - Model registry

Access: http://localhost:5000 (local deployment)

8. Visualization (GeoServer)ΒΆ

Purpose: Serve geospatial data for map visualization

Features: - WMS/WFS services - Dynamic styling - Layer management - Tile caching

Access: http://localhost:3000/geoserver (local deployment)

πŸ”„ Data FlowΒΆ

Fine-Tuning WorkflowΒΆ

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#0f62fe','primaryTextColor':'#fff','primaryBorderColor':'#fff','lineColor':'#8a3ffc','secondaryColor':'#33b1ff','tertiaryColor':'#42be65','noteBkgColor':'#262626','noteTextColor':'#fff'}}}%%
sequenceDiagram
    participant User
    participant UI/SDK
    participant Gateway
    participant DB
    participant K8s
    participant Storage
    participant MLflow

    User->>UI/SDK: Configure tuning task
    UI/SDK->>Gateway: Submit tuning request
    Gateway->>DB: Store task metadata
    Gateway->>K8s: Create training job
    K8s->>Storage: Load dataset
    K8s->>MLflow: Log metrics
    K8s->>Storage: Save checkpoint
    K8s->>Gateway: Job complete
    Gateway->>DB: Update status
    Gateway->>UI/SDK: Notify completion
    UI/SDK->>User: Display results

Inference WorkflowΒΆ

High-Level SequenceΒΆ

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#0f62fe','primaryTextColor':'#fff','primaryBorderColor':'#fff','lineColor':'#8a3ffc','secondaryColor':'#33b1ff','tertiaryColor':'#42be65','noteBkgColor':'#262626','noteTextColor':'#fff'}}}%%
sequenceDiagram
    participant User
    participant UI/SDK
    participant Gateway
    participant DB
    participant K8s
    participant Storage
    participant GeoServer

    User->>UI/SDK: Configure inference
    UI/SDK->>Gateway: Submit inference request
    Gateway->>DB: Store request metadata
    Gateway->>K8s: Create inference pipeline
    K8s->>Storage: Load model & data
    K8s->>Storage: Save outputs
    K8s->>GeoServer: Publish layers
    K8s->>Gateway: Pipeline complete
    Gateway->>DB: Update status
    Gateway->>UI/SDK: Notify completion
    UI/SDK->>User: Display on map

Detailed Pipeline OrchestrationΒΆ

The inference pipeline orchestrates parallel processing of geospatial data through modular, scalable microservices. When an inference request is received, the Inference Planner analyzes the spatial and temporal requirements, then breaks down the work into parallel subtasks. Each subtask processes a specific geographic area and time period independently, enabling efficient large-scale processing.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#0f62fe','primaryTextColor':'#fff','primaryBorderColor':'#fff','lineColor':'#8a3ffc','secondaryColor':'#33b1ff','tertiaryColor':'#42be65'}}}%%
flowchart LR
 subgraph subGraph0["Pipeline Process"]
        D["Inference Planner<br/>Spatial-Temporal Analysis"]
  end
 subgraph subGraph1["Pipeline Subtask 0"]
        F["TerraKit Data Pull<br/>Satellite Imagery"]
        G["Run Inference<br/>Model Prediction"]
        H["Post-Processing<br/>Masking & Filtering"]
        I["Push to GeoServer<br/>Visualization"]
  end
 subgraph subGraph2["Pipeline Subtask 1"]
        J["TerraKit Data Pull"]
        K["Run Inference"]
        L["Post-Processing"]
        M["Push to GeoServer"]
  end
 subgraph subGraph3["Pipeline Subtask 2"]
        N["TerraKit Data Pull"]
        O["Run Inference"]
        P["Post-Processing"]
        Q["Push to GeoServer"]
  end
    A(["Inference Request"]) --> B["Gateway API"]
    B --> C["PlannerTask Database"]
    C --> D
    D --> E["PipelineTasks Database"]
    E --> F
    E --> J
    E --> N
    F --> G
    G --> H
    H --> I
    J --> K
    K --> L
    L --> M
    N --> O
    O --> P
    P --> Q

    style D fill:#42be65,stroke:#fff,stroke-width:2px,color:#fff
    style F fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style G fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style H fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style I fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style J fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style K fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style L fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style M fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style N fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style O fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style P fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style Q fill:#fa4d56,stroke:#fff,stroke-width:2px,color:#fff
    style A fill:#8a3ffc,stroke:#fff,stroke-width:2px,color:#fff
    style B fill:#33b1ff,stroke:#fff,stroke-width:2px,color:#fff
    style C fill:#007d79,stroke:#fff,stroke-width:2px,color:#fff
    style E fill:#007d79,stroke:#fff,stroke-width:2px,color:#fff

Pipeline Components:

Component Purpose
Inference Planner Analyzes spatial-temporal requirements and creates parallel subtasks
TerraKit Data Pull Acquires satellite imagery from Sentinel Hub, NASA Earthdata, AWS, etc.
URL Connector Processes user-provided geospatial data
Run Inference Executes model predictions using TerraTorch
Post-Processing Applies masks for cloud cover, water, ice, snow, etc.
Push to GeoServer Publishes results as WMS/WFS layers for visualization

Key Features:

  • Parallel Processing: Subtasks run concurrently for faster completion
  • Scalable: Each component can scale independently based on workload
  • Modular: Components are loosely coupled and can be updated independently
  • Fault Tolerant: Failed tasks can be retried without affecting other subtasks
  • Priority Queuing: Tasks can be prioritized for urgent processing

πŸš€ Deployment ArchitectureΒΆ

Local Deployment (Lima VM)ΒΆ

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Host Machine                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚      Lima VM (K8s)            β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚  Geospatial Studio      β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - Gateway API          β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - UI                   β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - PostgreSQL           β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - MinIO                β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - Keycloak             β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - MLflow               β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - GeoServer            β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - Redis                β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         ↕ Port Forwarding           β”‚
β”‚    localhost:4180 β†’ UI              β”‚
β”‚    localhost:4181 β†’ API             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Characteristics: - Single-node Kubernetes - CPU-only (no GPU) - Data persisted in ~/studio-data - Port forwarding for access - Fixed in-cluster services only (no external service configuration) - All services (PostgreSQL, MinIO, Keycloak) deployed within Lima VM

Cluster Deployment (OpenShift/K8s)ΒΆ

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Kubernetes/OpenShift Cluster                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Namespace: geospatial-studio                     β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚  Core Services                              β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - Gateway (3 replicas)                     β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - UI (3 replicas)                          β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - PostgreSQL (HA) OR External DB           β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - Redis (HA)                               β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - MinIO OR External Object Storage         β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - Keycloak OR External OAuth               β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚  ML Services                                β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - MLflow                                   β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - GeoServer                                β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚  Processing (GPU Nodes)                     β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - Fine-tuning Jobs                         β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  - Inference Pipelines                      β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         ↕ Ingress/Routes                                β”‚
β”‚    https://studio.domain.com                            β”‚
β”‚                                                          β”‚
β”‚  External Services (Optional):                          β”‚
β”‚  - IBM Cloud Databases / AWS RDS / Azure PostgreSQL     β”‚
β”‚  - IBM COS / AWS S3 / Azure Blob Storage                β”‚
β”‚  - IBM Security Verify / External Keycloak / Okta       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Characteristics: - Multi-node cluster - GPU acceleration available - High availability - Flexible service configuration: - In-cluster services (PostgreSQL, MinIO, Keycloak) - OR external cloud-managed services - Load balancing - Auto-scaling

Service Configuration ComparisonΒΆ

Service Local Deployment Cluster Deployment
PostgreSQL In-cluster only In-cluster OR cloud-managed (IBM Cloud, AWS RDS, Azure, GCP)
Object Storage MinIO in-cluster only MinIO in-cluster OR cloud storage (IBM COS, AWS S3, Azure, GCP)
Authentication Keycloak in-cluster only Keycloak in-cluster OR external OAuth (IBM Verify, Okta, Azure AD)
Configuration Fixed, no options Fully configurable via values.yaml
Use Case Learning, testing Development, staging, production

πŸ” Security ArchitectureΒΆ

Authentication FlowΒΆ

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#0f62fe','primaryTextColor':'#fff','primaryBorderColor':'#fff','lineColor':'#8a3ffc','secondaryColor':'#33b1ff','tertiaryColor':'#42be65','noteBkgColor':'#262626','noteTextColor':'#fff'}}}%%
sequenceDiagram
    participant User
    participant UI
    participant Keycloak
    participant Gateway

    User->>UI: Access application
    UI->>Keycloak: Redirect to login
    Keycloak->>User: Login page
    User->>Keycloak: Credentials
    Keycloak->>UI: OAuth2 token
    UI->>Gateway: Request + token
    Gateway->>Keycloak: Validate token
    Keycloak->>Gateway: Token valid
    Gateway->>UI: Response

API Key AuthenticationΒΆ

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#0f62fe','primaryTextColor':'#fff','primaryBorderColor':'#fff','lineColor':'#8a3ffc','secondaryColor':'#33b1ff','tertiaryColor':'#42be65','noteBkgColor':'#262626','noteTextColor':'#fff'}}}%%
sequenceDiagram
    participant SDK
    participant Gateway
    participant DB

    SDK->>Gateway: Request + API key
    Gateway->>DB: Validate key
    DB->>Gateway: Key valid + user
    Gateway->>SDK: Response

πŸ“Š Resource RequirementsΒΆ

Minimum (Local Deployment)ΒΆ

  • CPU: 8 cores
  • RAM: 16 GB
  • Disk: 100 GB
  • GPU: None (CPU only)
  • Worker Nodes: 3-5
  • CPU per Node: 8-16 cores
  • RAM per Node: 32-64 GB
  • GPU: NVIDIA GPU for training
  • Storage: 200-500 GB

πŸ” Monitoring & ObservabilityΒΆ

LogsΒΆ

  • Application Logs: Captured by Kubernetes
  • Access Logs: Gateway API logs
  • Training Logs: Stored in object storage

MetricsΒΆ

  • MLflow: Training metrics and model performance
  • Kubernetes: Resource utilization
  • Prometheus: (Optional) System metrics

Health ChecksΒΆ

  • Gateway API: /health endpoint
  • Database: Connection pooling
  • Object Storage: Bucket accessibility

πŸ› οΈ ExtensibilityΒΆ

Custom ProcessorsΒΆ

Add custom processing steps to inference pipelines: - Python-based processors - Docker container integration - Configurable parameters

Model IntegrationΒΆ

Support for various model formats: - PyTorch checkpoints - ONNX models - TensorFlow SavedModel - Custom model loaders

Data ConnectorsΒΆ

Integrate with data sources: - Sentinel Hub - AWS S3 - Google Earth Engine - Custom APIs

πŸ“š Learn MoreΒΆ


← Back: What is Geospatial Studio? Next: Key Concepts β†’