FlatGeobuf: Streaming Spatial Data Without a Server

A common problem in web mapping: you have a large vector dataset (say, 2 million parcel polygons) and you want to display them on a map filtered to the current viewport. The options feel like a binary choice — either serve vector tiles (complex infrastructure, pre-generation cost) or send the entire dataset to the client (impractical beyond a few thousand features).

FlatGeobuf offers a third path: a binary format with a built-in spatial index that supports HTTP range requests. A web client can send a request for features within a bounding box, and the server — which can be any HTTP server, including a static file host or S3 bucket — responds with only the relevant bytes. No vector tile pipeline. No API server. Just a file and an HTTP server that supports range requests.

# What Makes FlatGeobuf Different

FlatGeobuf is built on FlatBuffers, Google’s zero-copy serialisation library. The key design properties:

No parsing overhead: FlatBuffers are designed so that the binary data can be accessed in place without deserialization. Reading a field means reading a few bytes at a known offset, not decoding a JSON string or deserialising a protobuf.

Built-in spatial index: FlatGeobuf embeds a Hilbert R-tree index at the start of the file. The index stores the bounding box of each feature and the byte offset at which that feature’s data begins. Given a query bbox, the index tells you exactly which byte ranges to request.

HTTP range request compatibility: because the index is at the start of the file and each feature has a known byte offset, a client can:

Fetch the index (a single range request for the first ~N KB)
Compute which feature offsets intersect the query bbox
Fetch only those byte ranges (potentially many small range requests, or a few merged ranges)

This turns a static file into a spatially queryable data source.

# Reading FlatGeobuf in Python

import geopandas as gpd
from shapely.geometry import box

# Read entire file
gdf = gpd.read_file("parcels.fgb")

# Read with bounding box filter — only fetches relevant features from disk/HTTP
bbox = (152.9, -27.6, 153.1, -27.4)  # (xmin, ymin, xmax, ymax)
gdf_subset = gpd.read_file("parcels.fgb", bbox=bbox)
print(f"Loaded {len(gdf_subset):,} features in bbox")

When bbox= is specified, GDAL (which backs GeoPandas file I/O) uses the FlatGeobuf spatial index to read only the relevant features.

# Writing FlatGeobuf

import geopandas as gpd

gdf = gpd.read_parquet("parcels.parquet")

# Write as FlatGeobuf — spatial index is always included
gdf.to_file("parcels.fgb", driver="FlatGeobuf")

The output file includes the Hilbert R-tree index automatically. No additional configuration is needed.

# Remote HTTP Streaming

The most powerful use of FlatGeobuf is streaming from remote HTTP. Using the flatgeobuf Python library (which supports range requests directly):

import asyncio
import flatgeobuf
from shapely.geometry import shape, box

async def stream_bbox(url: str, query_bbox: tuple):
    """Fetch features within bbox from a remote FlatGeobuf file via HTTP range requests."""
    features = []
    async for feature in flatgeobuf.load_http(url, rect=query_bbox):
        geom = shape(feature['geometry'])
        props = feature['properties']
        features.append({'geometry': geom, **props})
    return features

# Example: read from a public S3 bucket or GitHub release
url = "https://storage.example.com/australia/parcels.fgb"
bbox = (152.9, -27.6, 153.1, -27.4)

features = asyncio.run(stream_bbox(url, bbox))
print(f"Fetched {len(features):,} features with {len(features)} HTTP range requests")

The number of HTTP requests depends on the spatial clustering of matching features. With a good Hilbert curve ordering (which FlatGeobuf uses by default), geographically adjacent features tend to be adjacent in the file, so the index typically generates 5–50 range requests rather than one per feature.

# Performance Comparison for Bbox Queries

Scenario: 2 million parcel polygons, bounding box query returning ~50,000 features (2.5% of dataset).

Method	Infrastructure	Latency (local)	Latency (S3)
Load full GeoParquet	None	4.1s	~30s
GeoParquet + row group filter	None	0.8s	~8s
FlatGeobuf bbox query	None	0.3s	~2s
Vector tiles (MVT)	Tile server	~50ms	~50ms
PostGIS API	PostgreSQL	~0.1s	~0.1s

FlatGeobuf’s streaming approach is not as fast as a vector tile server for display purposes, but it requires zero infrastructure beyond file hosting. For analytical queries over a specific area of interest, it is significantly faster than loading a full GeoParquet file without spatial partitioning.

# Writing FlatGeobuf in Python with Fiona

For more control over the write process:

import fiona
import geopandas as gpd
from fiona.crs import from_epsg

gdf = gpd.read_parquet("input.parquet")

# Write via Fiona with explicit schema
schema = {
    'geometry': 'Polygon',
    'properties': {col: 'str' for col in gdf.columns if col != 'geometry'}
}

with fiona.open("output.fgb", 'w', driver='FlatGeobuf', schema=schema,
                crs=from_epsg(4326)) as f:
    for _, row in gdf.iterrows():
        f.write({
            'geometry': row.geometry.__geo_interface__,
            'properties': {col: str(row[col]) for col in gdf.columns if col != 'geometry'}
        })

# FlatGeobuf vs GeoParquet: Which to Use

These formats solve different problems and are complementary, not competing:

Concern	FlatGeobuf	GeoParquet
Web streaming / HTTP range	Excellent (design goal)	Poor (Parquet not range-request optimised)
Columnar analytics	Poor (row-oriented)	Excellent (design goal)
Compression ratio	Good (~3–5× vs GeoJSON)	Excellent (~8–15× vs GeoJSON)
DuckDB / Spark support	Limited	Native
Browser JavaScript support	Yes (flatgeobuf npm package)	Limited
Write once, read many (analytical)	OK	Preferred
Bbox streaming without infrastructure	Yes	No

The common pattern is to maintain both: a GeoParquet file (or partitioned set) for analytical pipelines, and a FlatGeobuf file for web-facing bbox queries.

# JavaScript/Browser Integration

FlatGeobuf has first-class JavaScript support via the flatgeobuf npm package, enabling direct browser streaming:

import { flatgeobuf } from 'flatgeobuf';
import maplibregl from 'maplibre-gl';

async function loadBbox(map, bbox) {
  const url = 'https://data.example.com/parcels.fgb';
  const features = [];

  // Streams features via HTTP range requests
  for await (const feature of flatgeobuf.deserialize(url, bbox)) {
    features.push(feature);
  }

  map.getSource('parcels').setData({
    type: 'FeatureCollection',
    features
  });
}

// Update on map move
map.on('moveend', () => {
  const bounds = map.getBounds();
  loadBbox(map, {
    minX: bounds.getWest(),
    minY: bounds.getSouth(),
    maxX: bounds.getEast(),
    maxY: bounds.getNorth()
  });
});

This pattern replaces a vector tile server with a static file on any CDN or object storage. For datasets up to a few hundred MB, the performance is sufficient for interactive web mapping — features typically stream in under 200ms for a viewport-sized bbox query.

# Serving FlatGeobuf from S3

The only server requirement is that your HTTP host supports range requests. AWS S3 supports range requests by default:

# Upload to S3
aws s3 cp parcels.fgb s3://your-bucket/data/parcels.fgb \
  --content-type "application/octet-stream" \
  --acl public-read

# The file is now queryable via FlatGeobuf's range-request API
# URL: https://your-bucket.s3.amazonaws.com/data/parcels.fgb

GitHub Releases, Cloudflare R2, and most CDNs also support range requests. FlatGeobuf turns any static file host into a spatially queryable data service.