Building an OGC API: The Foundation for AI GIS

📅 February 11, 2026 ✍️ Dhruv Koli

Introduction

Enabling AI agents to perform complex spatial analysis tasks such as identifying new city parks within a five‑minute walk of a school requires a reliable, standards‑compliant map engine. The OGC API - Processes specification provides exactly that: a well‑defined set of HTTP endpoints that any client, including AI systems, can invoke consistently.

This post documents the backend I built for my Google Summer of Code (GSoC) 2026 application with 52°North. I focus on the technical decisions, the trade‑offs I evaluated, and how you can reproduce a production‑grade, OGC‑compliant spatial service.

The Code Challenge

Task 1: Deploy and rigorously configure a backend that fully implements the OGC API – Processes standard, including containerization and production considerations.

Task 2: Validate the setup by implementing a meaningful geospatial process, registering it, and exercising the complete execution lifecycle (submission, asynchronous monitoring, and result retrieval).

The following sections walk through my approach, the stack I selected, and the architectural reasoning behind each choice.

What's in this post

Approach Task 1: The Backend & Standards
Choosing the Tech Stack
Approach Task 2: Executing a Spatial Process
Understanding the Async Lifecycle
Debugging config error
Performance Metrics
Trade‑offs & Scaling Considerations
What's Next

Approach Task 1: The Backend & Standards

Before writing any code, it is essential to understand what the OGC API – Processes standard actually defines.

The specification establishes a common contract for exposing geoprocessing capabilities over HTTP. By adhering to it, we avoid the fragmentation that would otherwise force every GIS backend to invent its own API, making it very difficult for AI agents or generic clients to interoperate.

Endpoint	Description
`GET /processes`	Lists all processes the server can execute.
`GET /processes/{id}`	Returns the definition (inputs/outputs) of a specific process.
`POST /processes/{id}/execution`	Triggers execution of the chosen process with supplied parameters.
`GET /jobs/{jobId}`	Retrieves the current status of an asynchronous job.

Choosing the Tech Stack

For the implementation I selected pygeoapi. As the reference open‑source implementation of the OGC API standards, it guarantees strict compliance out of the box and provides a minimal, extensible Python framework.

Why pygeoapi over a custom solution? Building a standards‑compliant server from scratch would require re‑implementing request validation, OpenAPI generation, and many edge‑case behaviours that the OGC community has already codified. Leveraging pygeoapi lets me focus on the domain‑specific process logic while trusting the underlying stack to handle protocol details.

OGC API Architecture Diagram showing a client requesting an HTTP endpoint hitting Gunicorn, pygeoapi, and Shapely.

Execution Environment (Docker Container): Guarantees reproducible builds and isolates dependencies.
Web Server Gateway (Gunicorn): Provides a robust, production‑ready HTTP front‑end.
Server Engine (pygeoapi on Flask): Interprets requests, enforces OGC compliance, and orchestrates processing.
Spatial Math (Shapely): Performs geometry operations such as buffering.
State Management (TinyDB): Tracks asynchronous job status in a lightweight JSON store for the prototype.

Approach Task 2: Executing a Spatial Process

To demonstrate the backend, I implemented a classic geometry buffer process.

A buffer expands a geometry outward by a specified distance, producing a new polygon that represents the area of influence.

In pygeoapi, a process is defined by a Python class that declares its inputs (a GeoJSON geometry and a distance) and its output (the buffered geometry).

Code snippet showing the implementation of the GeometryBufferProcessor class.

The processor is registered via pygeoapi's YAML configuration:

Code snippet showing the pygeoapi YAML configuration for the GeometryBufferProcessor.

This is how my buffer looked like in the end.

Understanding the Async Lifecycle

The challenge explicitly required exercising the full lifecycle of an asynchronous process execution.

Spatial operations can be computationally intensive; a request that buffers an entire country would exceed typical HTTP timeouts. Therefore, the API must support asynchronous job handling.

My client script follows these steps:

Client script for the GeometryBufferProcessor.

Submission: POST a square geometry to /processes/geometry-buffer/execution.
Tracking: The server immediately returns 201 Created with a unique jobId.
Monitoring: The client polls GET /jobs/{jobId} until the status changes to successful.
Retrieval: Finally, GET /jobs/{jobId}/results returns the buffered GeoJSON.

JSON output showing the expanded coordinates of the buffered polygon

The mathematical result: Our 1x1 square expanded outward represented in GeoJSON geometry.

Here's a visualuzation of our 1x1 square before and after buffering.

Visualization graphic showing a 1x1 polygon being expanded into a larger buffered polygon with rounded corners.

Visualizing the Buffer: Note how the original square's sharp corners become perfectly rounded in the expansion.

Debugging config error

Containerizing the application initially caused an infinite restart loop. docker ps showed the container in a restart loop

CONTAINER ID   IMAGE              STATUS
0713fd49497f   ogc-mcp-pygeoapi   Restarting (255) 4 seconds ago

pygeoapi loads custom processors dynamically, so the container needed to expose the directory containing my code via PYTHONPATH.

ENV PYTHONPATH="/pygeoapi"

Adding this line to the Dockerfile resolved the import error, and all 35 lifecycle tests passed.

Terminal output showing 17 passed tests in a row with green checkmarks

Performance Metrics

After fixing all tests, I ran validate_tests.py against a local Docker container to capture real-world latency for each endpoint. The suite performs 35 assertions and a 5× repeated latency benchmark on the lightweight endpoints.

All measurements are single-node, unwarmed, on localhost (no network hop). They represent the baseline overhead of the full pygeoapi + Gunicorn + Shapely stack.

Endpoint / Operation	p50 latency	avg latency	Notes
`GET /`	7 ms	20 ms	Cold-start variance; warm avg ≈ 7 ms
`GET /conformance`	6 ms	7 ms	Pure metadata, sub-10 ms consistently
`GET /processes`	9 ms	10 ms	Process registry scan
`GET /processes/geometry-buffer`	9 ms	9 ms	Single-process description
`POST /execution` — hello-world (sync)	21 ms	21 ms	Trivial echo process
`POST /execution` — buffer Point (sync)	37 ms	37 ms	Shapely buffer call included
`POST /execution` — buffer LineString (sync)	14 ms	14 ms	Simpler geometry, faster
`POST /execution` — async submit	15 ms	15 ms	Returns 201 + Location header immediately
Async job polling (total wait)	16 ms	16 ms	Job finished within the first poll cycle
`GET /jobs/{id}/results`	9 ms	9 ms	TinyDB read + JSON serialisation

Important takeaway: even the heaviest synchronous buffer operation completes in under 40 ms on localhost. The async path is even faster to respond (15 ms) because the server returns a job handle immediately and the geometry computation happens in the background.

        Bottleneck to watch: At scale, the bottleneck would shift from Shapely (which is pure-C and very fast for typical inputs) to TinyDB's file-lock contention under concurrent writes. A PostgreSQL-backed job manager would eliminate this.
    

What’s Next

1. MCP (Model Context Protocol) Integration

The longer-term goal is to expose these OGC endpoints as MCP tools, so that LLM agents (Claude, Gemini, etc.) can invoke spatial processes directly via natural language. This means wrapping each OGC endpoint in an MCP tool definition and hosting a compliant MCP server alongside pygeoapi.

2. Replace TinyDB with a Production Job Store

TinyDB is a single-file JSON store with no concurrency guarantees. Under load, parallel job submissions will cause write conflicts. The next step is to replace it with a PostgreSQL-backed manager or a dedicated queue like Celery + Redis, which pygeoapi already supports via its manager plugin architecture.

3. True Async Workers

Currently, the "async" mode in pygeoapi runs the process in a thread pool managed by the WSGI server, not a separate worker process. For CPU-intensive tasks (e.g., buffering a country-scale geometry with millions of vertices), the computation should be offloaded to a dedicated Celery worker so the HTTP server remains non-blocking.

4. Additional OGC Processes

The buffer process demonstrates the pattern, but a useful MCP-connected backend would expose a richer library: intersection, union, simplification, area/perimeter calculation, and coordinate reprojection. Each can be implemented as a new BaseProcessor subclass with minimal boilerplate.

5. Authentication & Rate Limiting

A public OGC API endpoint needs OAuth 2.0 / API-key authentication and per-client rate limiting to prevent abuse. pygeoapi supports security plugins, and a reverse proxy layer (NGINX or AWS API Gateway) can enforce per-IP limits at the edge.

6. CI/CD Pipeline

The natural next step is to run tests on every pull request via GitHub Actions, building the Docker image, spinning up the container, running the test suite, and failing the build if any check regresses.

Thanks for reading

You can check out my full open-source codebase, including the Docker setup and validation scripts, right here on GitHub.