Troubleshooting GuideΒΆ
Common issues and solutions when working with IBM Geospatial Studio.
Deployment-Specific Commands
This guide provides commands for both Local (Lima VM) and Cluster (Kubernetes/OpenShift) deployments. Use the tabs to switch between deployment types where applicable.
- Local Deployment: Uses Lima VM running Kubernetes on your laptop/workstation
- Cluster Deployment: Uses production Kubernetes or OpenShift clusters
π Deployment IssuesΒΆ
Services Fail to StartΒΆ
Problem: Services fail to start or pods/containers exit immediately.
Solutions:
-
Check Lima VM status:
-
Check pod status in Lima VM:
-
View pod logs:
-
Ensure sufficient resources:
- Minimum 16GB RAM
- 100GB free disk space
-
Check Lima VM disk space:
limactl shell studio df -h -
Restart Lima VM if needed:
-
Check pod status:
-
View pod logs:
-
Describe pod for details:
-
Check resource quotas:
-
Verify node resources:
-
Check for ImagePullBackOff errors:
Port Forwarding IssuesΒΆ
Problem: Port forwarding fails or disconnects frequently.
Solutions:
-
Check if port forwarding is active:
-
Restart port forwarding:
# Kill existing port-forwards pkill -f "kubectl port-forward" # Set kubeconfig and namespace export KUBECONFIG="$HOME/.lima/studio/copied-from-guest/kubeconfig.yaml" export OC_PROJECT=default # Restart all port-forwards kubectl port-forward -n $OC_PROJECT svc/keycloak 8080:8080 >> studio-pf.log 2>&1 & kubectl port-forward -n $OC_PROJECT svc/postgresql 54320:5432 >> studio-pf.log 2>&1 & kubectl port-forward -n $OC_PROJECT svc/geofm-geoserver 3000:3000 >> studio-pf.log 2>&1 & kubectl port-forward -n $OC_PROJECT deployment/geofm-ui 4180:4180 >> studio-pf.log 2>&1 & kubectl port-forward -n $OC_PROJECT deployment/geofm-gateway 4181:4180 >> studio-pf.log 2>&1 & kubectl port-forward -n $OC_PROJECT deployment/geofm-mlflow 5000:5000 >> studio-pf.log 2>&1 & kubectl port-forward -n $OC_PROJECT svc/minio 9001:9001 >> studio-pf.log 2>&1 & kubectl port-forward -n $OC_PROJECT svc/minio 9000:9000 >> studio-pf.log 2>&1 & -
Check Lima VM network:
-
Check if port forwarding is active:
-
Restart port forwarding:
# Kill existing port-forwards pkill -f "port-forward" # Restart required port-forwards kubectl port-forward -n <namespace> svc/minio 9000:9000 >> studio-pf.log 2>&1 & kubectl port-forward -n <namespace> svc/minio 9001:9001 >> studio-pf.log 2>&1 & kubectl port-forward -n <namespace> svc/postgresql 54320:5432 >> studio-pf.log 2>&1 & kubectl port-forward -n <namespace> svc/keycloak 8080:8080 >> studio-pf.log 2>&1 & kubectl port-forward -n <namespace> svc/geofm-geoserver 3000:3000 >> studio-pf.log 2>&1 & kubectl port-forward deployment/geofm-ui 4180:4180 >> studio-pf.log 2>&1 & kubectl port-forward deployment/geofm-gateway 4181:4180 >> studio-pf.log 2>&1 & kubectl port-forward deployment/geofm-mlflow 5000:5000 >> studio-pf.log 2>&1 & -
Use kubectl proxy as alternative:
Permission Denied ErrorsΒΆ
Problem: Permission errors when running scripts or accessing files.
Solutions:
-
Check service account permissions:
-
Check RBAC permissions:
-
For OpenShift, check Security Context Constraints (SCC):
-
Check pod security policies:
Configuration Not LoadingΒΆ
Problem: Services can't find configuration or environment variables.
Solutions:
-
Verify workspace env files exist:
-
Check environment variable format:
-
Validate environment variables:
-
Source environment files:
-
Verify kubeconfig is set:
-
Check ConfigMaps:
-
Check Secrets:
-
Verify environment variables in pod:
-
Update ConfigMap and restart pods:
Storage IssuesΒΆ
Problem: PVC not binding or storage errors.
Solutions:
-
Check PVC status:
-
Check storage classes:
-
Check PV availability:
-
Verify IBM Object CSI Driver (for S3 storage):
-
Check node labels (for local storage):
π Authentication IssuesΒΆ
Cannot Generate API KeyΒΆ
Problem: API key generation fails in UI.
Solutions:
- Check if you have existing keys:
- Maximum 2 active keys per user
-
Delete old keys before creating new ones
-
Verify authentication:
- Log out and log back in
-
Clear browser cache and cookies
-
Check backend logs:
Keycloak Authentication FailsΒΆ
Problem: Cannot log in or Keycloak returns errors.
Solutions:
-
Check Keycloak pod:
-
Verify Keycloak setup:
-
Check port forwarding:
-
Restart Keycloak port-forward if needed:
-
Check Keycloak pod:
-
Verify Keycloak configuration:
-
Check OAuth environment variables:
-
For OpenShift, check routes:
SDK Authentication FailsΒΆ
Problem: Client() initialization fails with authentication error.
Solutions:
-
Verify API key format:
-
Ensure correct URL:
-
Check SSL certificate:
-
Verify API key is valid:
π Data IssuesΒΆ
MinIO/S3 Connection FailsΒΆ
Problem: Cannot connect to object storage.
Solutions:
-
Check MinIO pod:
-
Verify MinIO credentials:
-
Test MinIO connection:
-
Restart MinIO port-forwards if needed:
-
Verify buckets were created:
-
Check MinIO pod:
-
Verify MinIO service:
-
Check MinIO TLS secret:
-
Test MinIO connectivity:
-
Verify buckets were created:
Dataset Onboarding FailsΒΆ
Problem: Dataset upload or onboarding process fails.
Solutions:
- Check file format:
- Must be a ZIP file
- Contains matching data and label pairs
-
Files have correct suffixes
-
Verify file structure:
-
Check file size limits:
- Individual files: < 2GB
-
Total dataset: < 10GB
-
Validate band configuration:
Cannot Access Pre-computed ExamplesΒΆ
Problem: Example datasets not visible in UI or SDK.
Solutions:
-
Check if examples are loaded:
-
Verify backend is running:
- Check database initialization:
File Not Found Errors in NotebooksΒΆ
Problem: You see errors like:
Solutions:
Option 1: Clone the repository (Recommended)
git clone https://github.com/terrastackai/geospatial-studio.git
cd geospatial-studio/workshop/docs/notebooks
jupyter notebook
Option 2: Download missing files
If you downloaded notebooks individually, you need to also download the JSON configuration files:
- Lab 3 requires:
template-seg.json-
tune-prithvi-eo-flood.json -
Lab 4 requires:
backbone-Prithvi_EO_V2_300M.jsondataset-burn_scars.jsontemplate-seg.jsonDownload these files from the notebooks directory and place them in the same directory as your notebook.
Verify files are in the correct location:
π€ Model Training IssuesΒΆ
Fine-tuning Job FailsΒΆ
Problem: Training job fails or gets stuck.
Solutions:
- Check GPU availability:
# Check GPU nodes
kubectl get nodes -o json | jq '.items[].status.capacity."nvidia.com/gpu"'
# Check GPU operator
kubectl get pods -n gpu-operator-resources
# Verify node labels
kubectl get nodes --show-labels | grep nvidia
# Check GPU resource allocation
kubectl describe node <node-name> | grep -A 5 "Allocated resources"
-
Verify dataset is onboarded:
-
Check training parameters:
-
Monitor MLflow logs:
- Access MLflow UI at
http://localhost:5000 -
Check experiment logs for errors
-
Ensure MLflow port-forward is active:
Out of Memory (OOM) ErrorsΒΆ
Problem: Training fails with CUDA out of memory.
Solutions:
-
Reduce batch size:
-
Use gradient accumulation:
-
Enable mixed precision:
-
Clear GPU cache:
-
Check GPU memory:
π Inference IssuesΒΆ
Inference Request FailsΒΆ
Problem: Inference submission returns error.
Solutions:
-
Verify model is deployed:
-
Check spatial domain format:
-
Validate temporal domain:
-
Check data availability:
- Ensure satellite data exists for your date range
- Try a different date if no data available
Inference Takes Too LongΒΆ
Problem: Inference job runs for hours without completing.
Solutions:
-
Reduce spatial extent:
-
Check task status:
-
Monitor backend logs:
Cannot Download Inference ResultsΒΆ
Problem: Download links expired or files not found.
Solutions:
-
Check task completion:
-
Regenerate download links:
-
Use SDK download widget:
π Network IssuesΒΆ
Cannot Access UIΒΆ
Problem: Cannot reach the Studio UI in browser.
Solutions:
Problem: Browser cannot reach https://localhost:4180.
-
Check Lima VM is running:
-
Check if pods are running:
-
Verify port forwarding is active:
-
Test endpoint:
-
Check firewall settings:
-
Try different browser:
- Clear cache and cookies
- Try incognito/private mode
-
Accept self-signed certificate
-
Check Lima VM logs:
Problem: Cannot reach Studio UI via ingress/route.
-
Check ingress/route status:
-
Verify DNS resolution:
-
Test internal connectivity:
-
Check ingress controller:
-
Verify TLS certificates:
SSL Certificate ErrorsΒΆ
Problem: Browser shows SSL/TLS errors.
Solutions:
- Accept self-signed certificate:
- Click "Advanced" β "Proceed to localhost"
-
Add exception in browser settings
-
For cluster deployments, check certificate:
-
Regenerate certificates if needed:
Geoserver Connection IssuesΒΆ
Problem: Cannot access Geoserver or layers not loading.
Solutions:
-
Check Geoserver pod:
-
Verify Geoserver port-forward:
-
Test Geoserver endpoint:
-
Verify Geoserver credentials:
-
Re-run Geoserver setup:
-
Check Geoserver pod:
-
Verify Geoserver service:
-
Test Geoserver connectivity:
-
Re-run Geoserver setup:
-
For OpenShift with SCC issues:
ποΈ Database IssuesΒΆ
Database Connection FailsΒΆ
Problem: Services cannot connect to PostgreSQL.
Solutions:
-
Check PostgreSQL pod:
-
Verify database credentials:
-
Check database port-forward:
-
Test database connection:
-
Check database logs:
-
Re-create databases if needed:
-
Check PostgreSQL pod:
-
Verify database credentials:
-
Check database connectivity:
-
Check PVC status:
-
Re-create databases:
Database Migration FailsΒΆ
Problem: Database schema migration errors.
Solutions:
- Check database logs for errors:
-
Verify database exists:
-
Re-run database creation:
π Debugging TipsΒΆ
Enable Debug LoggingΒΆ
Check Service HealthΒΆ
# Set kubeconfig
export KUBECONFIG="$HOME/.lima/studio/copied-from-guest/kubeconfig.yaml"
# Check all pods
kubectl get pods -n default
# Check specific pod logs
kubectl logs <pod-name> -n default -f
# Check resource usage
kubectl top pods -n default
kubectl top nodes
# Check Lima VM status
limactl list
# Check Lima VM resources
limactl shell studio
# Inside VM:
df -h
free -h
top
Inspect Container/PodΒΆ
# Set kubeconfig
export KUBECONFIG="$HOME/.lima/studio/copied-from-guest/kubeconfig.yaml"
# Enter running pod
kubectl exec -it <pod-name> -n default -- /bin/bash
# Check environment variables
kubectl exec <pod-name> -n default -- env
# Check file system
kubectl exec <pod-name> -n default -- ls -la
# Copy files from pod
kubectl cp default/<pod-name>:/path/to/file ./local-file
# Access Lima VM directly
limactl shell studio
# Enter running pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash
# Check environment variables
kubectl exec <pod-name> -n <namespace> -- env
# Check file system
kubectl exec <pod-name> -n <namespace> -- ls -la
# Copy files from pod
kubectl cp <namespace>/<pod-name>:/path/to/file ./local-file
Monitor EventsΒΆ
Check Helm DeploymentΒΆ
Validate Environment ConfigurationΒΆ
# Use the validation script
python deployment-scripts/validate-env-files.py \
--env-file workspace/<deployment-env>/env/.env \
--env-sh-file workspace/<deployment-env>/env/env.sh \
--env-variables "studio_api_key,access_key_id,secret_access_key" \
--env-sh-variables "DEPLOYMENT_ENV,OC_PROJECT,CLUSTER_URL"
π Getting HelpΒΆ
If you're still experiencing issues:
- Collect logs:
# Set kubeconfig
export KUBECONFIG="$HOME/.lima/studio/copied-from-guest/kubeconfig.yaml"
# Collect all pod logs
kubectl logs -l app=geospatial-studio -n default --all-containers=true > logs.txt
# Collect events
kubectl get events -n default --sort-by='.lastTimestamp' > events.txt
# Collect Lima VM info
limactl list > lima-status.txt
limactl shell studio df -h > lima-disk.txt
# Collect port-forward logs
cat studio-pf.log > port-forward-logs.txt
- Gather system information:
# Host system info
limactl --version
kubectl version --client
helm version
python --version
pip list
# Lima VM info
limactl list
export KUBECONFIG="$HOME/.lima/studio/copied-from-guest/kubeconfig.yaml"
kubectl version
kubectl get nodes -o wide
# Workspace info
ls -la workspace/lima/env/
cat workspace/lima/env/env.sh | grep -E "DEPLOYMENT_ENV|OC_PROJECT"
- Check deployment configuration:
# Review workspace environment files
cat workspace/lima/env/env.sh
cat workspace/lima/env/.env
# Review Helm values
cat workspace/lima/values/geospatial-studio/values-deploy.yaml
# Check Lima VM configuration
cat deployment-scripts/lima/studio.yaml # macOS
cat deployment-scripts/lima/studio-linux.yaml # Linux
# Review workspace environment files
cat workspace/<deployment-env>/env/env.sh
cat workspace/<deployment-env>/env/.env
# Review Helm values
cat workspace/<deployment-env>/values/geospatial-studio/values-deploy.yaml
- Search existing issues:
- Geospatial Studio Issues
-
Create a new issue:
- Include error messages
- Provide steps to reproduce
- Share relevant logs
-
Mention your environment (OS, deployment type, cluster version, etc.)
-
Community support:
- Check FAQ for common questions
- Review Additional Resources for documentation