A common problem in web mapping: you have a large vector dataset (say, 2 million parcel polygons) and you want to display them on a map filtered to the current viewport. The options feel like a binary choice — either serve vector tiles (complex infrastructure, pre-generation cost) or send the entire dataset to the client (impractical beyond a few thousand features).
FlatGeobuf offers a third path: a binary format with a built-in spatial index that supports HTTP range requests. A web client can send a request for features within a bounding box, and the server — which can be any HTTP server, including a static file host or S3 bucket — responds with only the relevant bytes. No vector tile pipeline. No API server. Just a file and an HTTP server that supports range requests.
# What Makes FlatGeobuf Different
FlatGeobuf is built on FlatBuffers, Google’s zero-copy serialisation library. The key design properties:
No parsing overhead: FlatBuffers are designed so that the binary data can be accessed in place without deserialization. Reading a field means reading a few bytes at a known offset, not decoding a JSON string or deserialising a protobuf.
Built-in spatial index: FlatGeobuf embeds a Hilbert R-tree index at the start of the file. The index stores the bounding box of each feature and the byte offset at which that feature’s data begins. Given a query bbox, the index tells you exactly which byte ranges to request.
HTTP range request compatibility: because the index is at the start of the file and each feature has a known byte offset, a client can:
- Fetch the index (a single range request for the first ~N KB)
- Compute which feature offsets intersect the query bbox
- Fetch only those byte ranges (potentially many small range requests, or a few merged ranges)
This turns a static file into a spatially queryable data source.
# Reading FlatGeobuf in Python
import geopandas as gpd
from shapely.geometry import box
# Read entire file
gdf = gpd.read_file("parcels.fgb")
# Read with bounding box filter — only fetches relevant features from disk/HTTP
bbox = (152.9, -27.6, 153.1, -27.4) # (xmin, ymin, xmax, ymax)
gdf_subset = gpd.read_file("parcels.fgb", bbox=bbox)
print(f"Loaded {len(gdf_subset):,} features in bbox")
When bbox= is specified, GDAL (which backs GeoPandas file I/O) uses the FlatGeobuf spatial index to read only the relevant features.
# Writing FlatGeobuf
import geopandas as gpd
gdf = gpd.read_parquet("parcels.parquet")
# Write as FlatGeobuf — spatial index is always included
gdf.to_file("parcels.fgb", driver="FlatGeobuf")
The output file includes the Hilbert R-tree index automatically. No additional configuration is needed.
# Remote HTTP Streaming
The most powerful use of FlatGeobuf is streaming from remote HTTP. Using the flatgeobuf Python library (which supports range requests directly):
import asyncio
import flatgeobuf
from shapely.geometry import shape, box
async def stream_bbox(url: str, query_bbox: tuple):
"""Fetch features within bbox from a remote FlatGeobuf file via HTTP range requests."""
features = []
async for feature in flatgeobuf.load_http(url, rect=query_bbox):
geom = shape(feature['geometry'])
props = feature['properties']
features.append({'geometry': geom, **props})
return features
# Example: read from a public S3 bucket or GitHub release
url = "https://storage.example.com/australia/parcels.fgb"
bbox = (152.9, -27.6, 153.1, -27.4)
features = asyncio.run(stream_bbox(url, bbox))
print(f"Fetched {len(features):,} features with {len(features)} HTTP range requests")
The number of HTTP requests depends on the spatial clustering of matching features. With a good Hilbert curve ordering (which FlatGeobuf uses by default), geographically adjacent features tend to be adjacent in the file, so the index typically generates 5–50 range requests rather than one per feature.
# Performance Comparison for Bbox Queries
Scenario: 2 million parcel polygons, bounding box query returning ~50,000 features (2.5% of dataset).
| Method | Infrastructure | Latency (local) | Latency (S3) |
|---|---|---|---|
| Load full GeoParquet | None | 4.1s | ~30s |
| GeoParquet + row group filter | None | 0.8s | ~8s |
| FlatGeobuf bbox query | None | 0.3s | ~2s |
| Vector tiles (MVT) | Tile server | ~50ms | ~50ms |
| PostGIS API | PostgreSQL | ~0.1s | ~0.1s |
FlatGeobuf’s streaming approach is not as fast as a vector tile server for display purposes, but it requires zero infrastructure beyond file hosting. For analytical queries over a specific area of interest, it is significantly faster than loading a full GeoParquet file without spatial partitioning.
# Writing FlatGeobuf in Python with Fiona
For more control over the write process:
import fiona
import geopandas as gpd
from fiona.crs import from_epsg
gdf = gpd.read_parquet("input.parquet")
# Write via Fiona with explicit schema
schema = {
'geometry': 'Polygon',
'properties': {col: 'str' for col in gdf.columns if col != 'geometry'}
}
with fiona.open("output.fgb", 'w', driver='FlatGeobuf', schema=schema,
crs=from_epsg(4326)) as f:
for _, row in gdf.iterrows():
f.write({
'geometry': row.geometry.__geo_interface__,
'properties': {col: str(row[col]) for col in gdf.columns if col != 'geometry'}
})
# FlatGeobuf vs GeoParquet: Which to Use
These formats solve different problems and are complementary, not competing:
| Concern | FlatGeobuf | GeoParquet |
|---|---|---|
| Web streaming / HTTP range | Excellent (design goal) | Poor (Parquet not range-request optimised) |
| Columnar analytics | Poor (row-oriented) | Excellent (design goal) |
| Compression ratio | Good (~3–5× vs GeoJSON) | Excellent (~8–15× vs GeoJSON) |
| DuckDB / Spark support | Limited | Native |
| Browser JavaScript support | Yes (flatgeobuf npm package) | Limited |
| Write once, read many (analytical) | OK | Preferred |
| Bbox streaming without infrastructure | Yes | No |
The common pattern is to maintain both: a GeoParquet file (or partitioned set) for analytical pipelines, and a FlatGeobuf file for web-facing bbox queries.
# JavaScript/Browser Integration
FlatGeobuf has first-class JavaScript support via the flatgeobuf npm package, enabling direct browser streaming:
import { flatgeobuf } from 'flatgeobuf';
import maplibregl from 'maplibre-gl';
async function loadBbox(map, bbox) {
const url = 'https://data.example.com/parcels.fgb';
const features = [];
// Streams features via HTTP range requests
for await (const feature of flatgeobuf.deserialize(url, bbox)) {
features.push(feature);
}
map.getSource('parcels').setData({
type: 'FeatureCollection',
features
});
}
// Update on map move
map.on('moveend', () => {
const bounds = map.getBounds();
loadBbox(map, {
minX: bounds.getWest(),
minY: bounds.getSouth(),
maxX: bounds.getEast(),
maxY: bounds.getNorth()
});
});
This pattern replaces a vector tile server with a static file on any CDN or object storage. For datasets up to a few hundred MB, the performance is sufficient for interactive web mapping — features typically stream in under 200ms for a viewport-sized bbox query.
# Serving FlatGeobuf from S3
The only server requirement is that your HTTP host supports range requests. AWS S3 supports range requests by default:
# Upload to S3
aws s3 cp parcels.fgb s3://your-bucket/data/parcels.fgb \
--content-type "application/octet-stream" \
--acl public-read
# The file is now queryable via FlatGeobuf's range-request API
# URL: https://your-bucket.s3.amazonaws.com/data/parcels.fgb
GitHub Releases, Cloudflare R2, and most CDNs also support range requests. FlatGeobuf turns any static file host into a spatially queryable data service.
Related reading: GeoParquet: The Analytical Format for Vector Geodata · Vector Tiles and the Modern Web Mapping Stack · DuckDB Spatial Analytics on GeoParquet