Using H3 for Aggregated Spatial Analytics at Scale

One of the most common performance problems in spatial web applications is the attempt to render too many individual features at once. A map showing 500,000 delivery locations, 2 million GPS pings, or 10 million transaction events cannot render individual points at a national zoom level — and even if it could, the resulting visualisation would be a solid mass of overlapping dots that communicates nothing.

The standard solution is spatial aggregation: group nearby features into summary cells and represent the count or value in each cell, rather than the individual features. The question is which grid system to use for this aggregation, and how to handle the transition between zoom levels efficiently.

H3 — Uber’s open source hierarchical hexagonal grid system — has emerged as the leading answer to both questions. It provides a globally consistent grid at 16 resolutions, efficient cell lookup and traversal operations, and a neat hierarchical relationship between resolutions that makes zoom-level transitions almost trivially easy to implement.

# Why Hexagons?

Before examining H3 specifically, it is worth understanding why hexagons are preferred over the more obvious square grid.

A square grid has one fundamental geometric problem: cells are not equidistant from all their neighbours. The four edge-adjacent cells in a square grid are closer to the cell centre than the four corner-adjacent cells (by a factor of √2). This asymmetry introduces directional bias into any analysis that uses adjacency or distance.

Hexagonal grids do not have this problem. Every hexagon has exactly six neighbours, and the centre of each neighbour is equidistant from the centre of the reference cell. All adjacencies are equal. This property makes hexagons mathematically cleaner for aggregation, smoothing, and any analysis that considers neighbouring cells.

Hexagons also tile the plane more efficiently in terms of perimeter-to-area ratio. For a given cell area, a hexagon has a shorter perimeter than a square, meaning that a hexagonal cell approximates a circle more closely. This reduces the impact of the “modifiable areal unit problem” — the distortion introduced by the choice of aggregation unit.

# H3: The Hierarchical Hexagonal Grid

H3 was developed by Uber to support their ride-sharing analytics at global scale and open-sourced in 2018. It has since become a standard component of spatial analytics stacks at many organisations.

H3 divides the globe into hexagonal cells at 16 resolution levels (0–15). At resolution 0, the globe is divided into 122 cells (110 hexagons and 12 pentagons — required by the topology of a sphere). Each resolution step divides each cell into approximately 7 child cells.

Resolution	Avg. cell area	Typical use case
2	~86,700 km²	Continental regions
3	~12,400 km²	Large countries
4	~1,770 km²	States/provinces
5	~252 km²	Major cities
6	~36 km²	City districts
7	~5.2 km²	Neighbourhoods
8	~0.74 km²	City blocks
9	~0.11 km²	Street segments
10	~15,000 m²	Individual buildings

The key H3 operations for aggregation workflows are:

latLngToCell(lat, lng, resolution) — find the H3 cell containing a point at a given resolution
cellToBoundary(h3index) — get the polygon boundary of a cell (for rendering)
cellToParent(h3index, parentResolution) — get the parent cell at a coarser resolution
cellToChildren(h3index, childResolution) — get the child cells at a finer resolution

# The Pre-Aggregation Pattern

The key to performant H3 analytics is pre-processing. Rather than aggregating on the fly at query time, you compute the H3 aggregations at multiple resolutions in advance and store the results. At query time, you simply look up the appropriate resolution table.

Here is a complete Python processing pipeline using GeoPandas and the h3 library:

import pandas as pd
import geopandas as gpd
import h3
from shapely.geometry import Polygon

# ── Step 1: Load your point data ────────────────────────────────────────────
# Example: delivery events with lat/lon and a value to aggregate
df = pd.read_parquet('delivery_events.parquet')
# df has columns: lat, lon, value, timestamp

# ── Step 2: Aggregate into H3 cells at multiple resolutions ─────────────────
RESOLUTIONS = [4, 6, 7, 8]   # coarse → fine

aggregated = {}

for res in RESOLUTIONS:
    # Assign each point to its H3 cell at this resolution
    df[f'h3_{res}'] = df.apply(
        lambda row: h3.latlng_to_cell(row['lat'], row['lon'], res),
        axis=1
    )

    # Aggregate: count events and sum value per cell
    agg = df.groupby(f'h3_{res}').agg(
        count=('value', 'count'),
        total_value=('value', 'sum'),
        mean_value=('value', 'mean'),
    ).reset_index().rename(columns={f'h3_{res}': 'h3index'})

    aggregated[res] = agg
    print(f"Resolution {res}: {len(agg):,} cells from {len(df):,} points")

# ── Step 3: Save each resolution as GeoJSON for static serving ───────────────
for res, agg_df in aggregated.items():
    features = []
    for _, row in agg_df.iterrows():
        # Get the hexagon boundary as a list of [lat, lng] pairs
        boundary = h3.cell_to_boundary(row['h3index'])
        # GeoJSON uses [lng, lat] order; h3 returns [lat, lng]
        coords = [[lng, lat] for lat, lng in boundary]
        coords.append(coords[0])   # close the ring

        features.append({
            "type": "Feature",
            "geometry": {"type": "Polygon", "coordinates": [coords]},
            "properties": {
                "h3index":     row['h3index'],
                "count":       int(row['count']),
                "total_value": float(row['total_value']),
                "mean_value":  round(float(row['mean_value']), 2),
            }
        })

    geojson = {"type": "FeatureCollection", "features": features}

    import json
    with open(f'public/h3_res{res}.geojson', 'w') as f:
        json.dump(geojson, f)

    print(f"Written h3_res{res}.geojson ({len(features):,} cells)")

This produces four lightweight GeoJSON files. Resolution 4 (state-sized cells) might have 50 cells. Resolution 8 (block-sized) might have 80,000 cells. Each file is served statically — no server-side query at render time.

# Serving from Pre-Processed H3 Files

For the best performance, serve the H3 GeoJSON files from cloud storage or a CDN. For datasets that update infrequently (daily batch), this is the right approach. For continuously updating data, consider a lightweight API that returns H3 cells for a given bounding box and resolution — but even then, caching the responses aggressively reduces load.

Alternatively, use Tippecanoe to pre-tile the H3 GeoJSON into PMTiles for very large cell counts:

# Combine multiple resolutions into a single PMTiles archive
# Each resolution becomes a separate source-layer
tippecanoe -o h3_aggregated.pmtiles \
  --named-layer=res4:h3_res4.geojson \
  --named-layer=res6:h3_res6.geojson \
  --named-layer=res7:h3_res7.geojson \
  --named-layer=res8:h3_res8.geojson \
  --minimum-zoom=0 \
  --maximum-zoom=16

# The Leaflet Implementation: Dynamic Resolution Switching

The Leaflet example below loads all four resolution GeoJSON files on startup and switches between them based on the current zoom level — with no additional network requests after the initial load.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>H3 Aggregated Analytics</title>
  <link rel="stylesheet" href="https://unpkg.com/leaflet@1.9.4/dist/leaflet.css">
  <style>
    html, body, #map { height: 100%; margin: 0; }
    #resolution-badge {
      position: absolute; top: 10px; right: 10px; z-index: 1000;
      background: rgba(5,9,15,0.85); color: #00e5ff;
      padding: 6px 12px; border-radius: 6px; font-family: monospace;
      font-size: 13px; border: 1px solid rgba(0,229,255,0.2);
    }
  </style>
</head>
<body>
<div id="map"></div>
<div id="resolution-badge">Resolution: —</div>

<script src="https://unpkg.com/leaflet@1.9.4/dist/leaflet.js"></script>
<script>
// ── Map setup ────────────────────────────────────────────────────────────────
const map = L.map('map', { center: [51.5, -0.1], zoom: 6 });

L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
  attribution: '© OpenStreetMap contributors',
  opacity: 0.3   // subtle basemap so our data layer is dominant
}).addTo(map);

// ── Which H3 resolution to show at each Leaflet zoom level ──────────────────
function h3ResForZoom(zoom) {
  if (zoom <= 6)  return 4;   // ~1,770 km² — region level
  if (zoom <= 9)  return 6;   // ~36 km²    — district level
  if (zoom <= 12) return 7;   // ~5.2 km²   — neighbourhood level
  return 8;                   // ~0.74 km²  — block level
}

// ── Colour scale: low → high count ──────────────────────────────────────────
function countToColour(count, maxCount) {
  // Quantile-based 5-colour scale
  const t = Math.min(count / maxCount, 1);
  const colours = ['#0a1628','#0d3b6e','#0066cc','#00aaff','#00e5ff'];
  const idx = Math.floor(t * (colours.length - 1));
  return colours[Math.min(idx, colours.length - 1)];
}

// ── Load all resolution GeoJSON files on startup ─────────────────────────────
const resolutionData = {};
const resolutions = [4, 6, 7, 8];
let loaded = 0;

resolutions.forEach(res => {
  fetch(`/h3_res${res}.geojson`)
    .then(r => r.json())
    .then(data => {
      // Pre-compute max count for colour scaling
      const counts = data.features.map(f => f.properties.count);
      const maxCount = Math.max(...counts);
      resolutionData[res] = { data, maxCount };

      loaded++;
      if (loaded === resolutions.length) {
        renderCurrentResolution();
      }
    });
});

// ── Active layer management ──────────────────────────────────────────────────
let activeLayer = null;
let currentRes = null;

function renderCurrentResolution() {
  const targetRes = h3ResForZoom(map.getZoom());
  if (targetRes === currentRes) return;   // already showing this resolution
  if (!resolutionData[targetRes]) return; // data not loaded yet

  if (activeLayer) map.removeLayer(activeLayer);

  const { data, maxCount } = resolutionData[targetRes];

  activeLayer = L.geoJSON(data, {
    style: feature => ({
      fillColor:   countToColour(feature.properties.count, maxCount),
      fillOpacity: 0.75,
      color:       '#ffffff',
      weight:      0.4,
      opacity:     0.6,
    }),
    onEachFeature: (feature, layer) => {
      const p = feature.properties;
      layer.bindTooltip(
        `<strong>${p.count.toLocaleString()} events</strong><br>
         Avg value: ${p.mean_value}<br>
         H3 index: <code>${p.h3index}</code>`,
        { sticky: true }
      );
    }
  }).addTo(map);

  currentRes = targetRes;
  document.getElementById('resolution-badge').textContent =
    `H3 Resolution: ${targetRes} (~${['','','','','1,770 km²','252 km²','36 km²','5.2 km²','0.74 km²'][targetRes]})`;
}

map.on('zoomend', renderCurrentResolution);
</script>
</body>
</html>

The result is a map that seamlessly transitions between coarser and finer H3 resolutions as the user zooms in — each transition swapping the Leaflet GeoJSON layer for the pre-loaded version at the appropriate resolution. Because all four GeoJSON files are loaded at startup (total size is typically a few hundred KB to a few MB depending on data density), the transitions are instantaneous.

# When to Use H3 vs Other Approaches

H3 is not always the right tool. It is excellent for:

Event counting and density mapping at multiple scales (delivery events, transactions, incidents)
Joining datasets that come from different sources (H3 provides a common spatial key)
Ride-sharing and logistics analytics (H3’s origin use case)
Any application that needs smooth multi-scale aggregation in a web context

It is less appropriate for:

Administrative reporting — H3 cells do not respect administrative boundaries. If you need counts by council ward or census area, use those boundaries directly.
Precise area calculations — H3 cells at the same resolution are not exactly equal in area (they are approximately equal, with the discrepancy increasing at higher resolutions near the poles).
Narrow corridor analysis — roads, rivers, and linear features do not aggregate naturally into hexagons.

# Indexing Existing Spatial Data with H3

H3 indexes can be added to existing PostGIS tables as a computed column, enabling hybrid workflows:

-- Add H3 index columns at multiple resolutions
ALTER TABLE events ADD COLUMN h3_r6 TEXT;
ALTER TABLE events ADD COLUMN h3_r8 TEXT;

-- Populate using the h3-pg PostgreSQL extension
UPDATE events
SET
  h3_r6 = h3_lat_lng_to_cell(ST_Y(location::geometry), ST_X(location::geometry), 6)::text,
  h3_r8 = h3_lat_lng_to_cell(ST_Y(location::geometry), ST_X(location::geometry), 8)::text;

-- Create indexes for fast groupby queries
CREATE INDEX ON events (h3_r6);
CREATE INDEX ON events (h3_r8);

-- Now aggregation is extremely fast — no spatial join required
SELECT h3_r8, COUNT(*) as event_count
FROM events
WHERE event_date >= '2024-01-01'
GROUP BY h3_r8;

This approach turns a spatial aggregation query (which would require a spatial join or point-in-polygon test) into a simple group-by on a text column — typically an order of magnitude faster.

H3 represents a genuinely useful abstraction for the broad middle ground of spatial analytics problems: scale-independent, globally consistent, and with excellent tooling across Python, JavaScript, Go, Java, and SQL. For any workflow that involves aggregating large volumes of point events for web visualisation, it is worth making it your default aggregation grid.