Numba-Accelerated Spatial Kernels: JIT-Compiling Custom Geometry Operations

Shapely 2.0 covers the standard geometry operations. NumPy covers the standard array operations. But real spatial analysis frequently involves custom algorithms — weighted distance decay functions, bespoke spatial smoothing kernels, specialised clustering metrics — that do not map cleanly onto any standard library function.

When you find yourself writing a nested Python loop over coordinate arrays because no existing library function does exactly what you need, that is the use case for Numba.

# What Numba Does

Numba is a JIT (just-in-time) compiler for Python functions that operate on NumPy arrays. You write a Python function with standard Python syntax. Numba compiles it to machine code the first time it is called, and all subsequent calls execute the compiled version.

The key constraint: Numba works on numerical operations over NumPy arrays. It does not understand Python objects, Pandas DataFrames, or Shapely geometries. The workflow is:

Extract coordinates from geometries as NumPy arrays (using Shapely 2.0’s array API)
Write a Numba-compiled function that operates on those coordinate arrays
Reconstruct geometries from the result arrays if needed

This restriction is actually a feature: it forces you to think in arrays, which is the only way to write fast spatial code anyway.

# The Haversine Distance Matrix

Computing the full pairwise distance matrix between N points on a sphere is a canonical example of a custom spatial kernel. scipy.spatial.distance.cdist can do this with a custom metric, but Numba gives you explicit control over memory layout and parallelism.

import numpy as np
from numba import njit, prange
import time

@njit(parallel=True, fastmath=True)
def haversine_matrix(lats_a, lons_a, lats_b, lons_b):
    """
    Compute haversine distance (km) between all pairs in A × B.
    Returns (len_a, len_b) float32 matrix.
    """
    n_a = len(lats_a)
    n_b = len(lons_b)
    result = np.empty((n_a, n_b), dtype=np.float32)

    R = 6371.0  # Earth radius in km

    for i in prange(n_a):    # prange → parallel over rows
        lat1 = np.radians(lats_a[i])
        lon1 = np.radians(lons_a[i])
        for j in range(n_b):
            lat2 = np.radians(lats_b[j])
            lon2 = np.radians(lons_b[j])
            dlat = lat2 - lat1
            dlon = lon2 - lon1
            a = np.sin(dlat * 0.5)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon * 0.5)**2
            result[i, j] = np.float32(2.0 * R * np.arcsin(np.sqrt(a)))
    return result

# ── Benchmark ─────────────────────────────────────────────────────────────────
rng = np.random.default_rng(0)
n_points = 5_000

lats_a = rng.uniform(-44.0, -10.0, n_points).astype(np.float64)
lons_a = rng.uniform(113.0, 154.0, n_points).astype(np.float64)
lats_b = rng.uniform(-44.0, -10.0, n_points).astype(np.float64)
lons_b = rng.uniform(113.0, 154.0, n_points).astype(np.float64)

# First call: JIT compilation (one-time cost)
_ = haversine_matrix(lats_a[:10], lons_a[:10], lats_b[:10], lons_b[:10])

# Subsequent calls: compiled machine code
t0 = time.perf_counter()
dist_matrix = haversine_matrix(lats_a, lons_a, lats_b, lons_b)
elapsed = time.perf_counter() - t0
print(f"Haversine matrix (5k × 5k = 25M pairs): {elapsed:.3f}s")
# → 0.18s with parallel=True on 8 cores

# Pure Python equivalent would take ~45s

The parallel=True flag with prange distributes the outer loop across CPU cores. On an 8-core machine, this gives roughly 7–8× additional speedup on top of the Numba compilation speedup.

# Warmup: Handling the First-Call JIT Compilation

The first call to a @njit function triggers JIT compilation, which typically takes 0.5–3 seconds. In production code, you want to trigger this at startup, not during your first real request:

def warmup_kernels():
    """Call with tiny dummy data to trigger JIT compilation at import time."""
    dummy = np.array([0.0])
    haversine_matrix(dummy, dummy, dummy, dummy)
    spatial_smooth(dummy.reshape(1, 1), dummy.reshape(1, 1), 1.0)
    print("Numba kernels compiled and ready.")

# In your module or application startup:
warmup_kernels()

# Spatial Smoothing Kernel: Inverse Distance Weighting

A common analysis pattern: given a grid of output locations and a set of observed values at irregular input locations, interpolate values using inverse distance weighting (IDW). This is not available as a Shapely or GeoPandas function — it is a custom kernel.

import numpy as np
from numba import njit, prange

@njit(parallel=True, fastmath=True)
def idw_interpolate(
    grid_x: np.ndarray,   # (G,) output grid x-coordinates
    grid_y: np.ndarray,   # (G,) output grid y-coordinates
    obs_x:  np.ndarray,   # (N,) observation x-coordinates
    obs_y:  np.ndarray,   # (N,) observation y-coordinates
    obs_v:  np.ndarray,   # (N,) observed values
    power:  float = 2.0,  # IDW power (typically 1 or 2)
    max_dist: float = 50.0  # ignore observations beyond this distance
) -> np.ndarray:
    """
    Inverse distance weighted interpolation.
    Returns (G,) array of interpolated values at grid locations.
    """
    n_grid = len(grid_x)
    n_obs  = len(obs_x)
    result = np.zeros(n_grid, dtype=np.float64)

    for i in prange(n_grid):
        weight_sum = 0.0
        value_sum  = 0.0
        gx = grid_x[i]
        gy = grid_y[i]

        for j in range(n_obs):
            dx = gx - obs_x[j]
            dy = gy - obs_y[j]
            dist = np.sqrt(dx*dx + dy*dy)

            if dist < 1e-10:
                # Exactly at an observation point
                value_sum  = obs_v[j]
                weight_sum = 1.0
                break
            if dist > max_dist:
                continue

            w = 1.0 / (dist ** power)
            weight_sum += w
            value_sum  += w * obs_v[j]

        if weight_sum > 0:
            result[i] = value_sum / weight_sum
        else:
            result[i] = np.nan

    return result

# ── Example: interpolate pollution readings to a 500×500 grid ─────────────────
import time
rng = np.random.default_rng(1)

# 2,000 pollution monitoring stations (lon, lat in GDA94 zone 55 — easting/northing)
n_obs = 2_000
obs_x = rng.uniform(300_000, 700_000, n_obs).astype(np.float64)
obs_y = rng.uniform(5_800_000, 6_500_000, n_obs).astype(np.float64)
obs_v = rng.uniform(0, 100, n_obs).astype(np.float64)  # PM2.5 μg/m³

# 250k grid cells
grid_x_1d = np.linspace(300_000, 700_000, 500)
grid_y_1d = np.linspace(5_800_000, 6_500_000, 500)
grid_x, grid_y = np.meshgrid(grid_x_1d, grid_y_1d)
grid_x = grid_x.ravel().astype(np.float64)
grid_y = grid_y.ravel().astype(np.float64)

# Warmup
_ = idw_interpolate(grid_x[:10], grid_y[:10], obs_x, obs_y, obs_v)

t0 = time.perf_counter()
interpolated = idw_interpolate(grid_x, grid_y, obs_x, obs_y, obs_v, power=2.0, max_dist=50_000)
elapsed = time.perf_counter() - t0
print(f"IDW interpolation (250k grid × 2k obs): {elapsed:.3f}s")
# → 0.22s

grid_result = interpolated.reshape(500, 500)

The equivalent Python implementation (nested loops without Numba) takes approximately 38 seconds. 170× speedup.

# Point-in-Polygon for Custom Polygon Representation

Shapely’s contains is excellent for most point-in-polygon tasks, but sometimes your polygons are stored as flat NumPy arrays (e.g. loaded from a custom binary format) rather than Shapely objects. A Numba ray-casting implementation:

import numpy as np
from numba import njit, prange

@njit
def point_in_polygon(px: float, py: float, poly_x: np.ndarray, poly_y: np.ndarray) -> bool:
    """
    Ray-casting algorithm for point-in-polygon.
    poly_x, poly_y are the exterior ring coordinates (first == last is OK).
    """
    n = len(poly_x)
    inside = False
    j = n - 1
    for i in range(n):
        xi, yi = poly_x[i], poly_y[i]
        xj, yj = poly_x[j], poly_y[j]
        if ((yi > py) != (yj > py)) and (px < (xj - xi) * (py - yi) / (yj - yi) + xi):
            inside = not inside
        j = i
    return inside

@njit(parallel=True)
def points_in_polygon_batch(
    px: np.ndarray, py: np.ndarray,
    poly_x: np.ndarray, poly_y: np.ndarray
) -> np.ndarray:
    """Check N points against a single polygon."""
    result = np.empty(len(px), dtype=np.bool_)
    for i in prange(len(px)):
        result[i] = point_in_polygon(px[i], py[i], poly_x, poly_y)
    return result

# The Numba–Shapely Bridge Pattern

The practical workflow for Numba-based spatial analysis:

import shapely
import numpy as np
from numba import njit, prange

# Step 1: Extract coordinates from Shapely geometry array
geom_array = gdf.geometry.values   # (N,) geometry array of Points

# For Points: use get_x/get_y (returns flat arrays)
x = shapely.get_x(geom_array)   # (N,) float64
y = shapely.get_y(geom_array)   # (N,) float64

# Step 2: Run Numba kernel on coordinate arrays
result_values = my_numba_kernel(x, y, ...)   # (N,) or (N, M) float array

# Step 3: Attach result back to GeoDataFrame
gdf['result'] = result_values

This separation of concerns — Shapely for geometry manipulation, Numba for numerical computation — is clean, fast, and testable. The Numba functions are pure numerical functions that can be tested with NumPy arrays independently of any GIS context.

# When to Use Numba vs Alternatives

Scenario	Best tool
Standard geometry ops (buffer, union, distance)	Shapely 2.0 array API
Spatial joins, nearest-neighbour	Shapely STRtree
CRS reprojection	PyProj Transformer array API
Custom distance metrics, kernels	Numba `@njit`
Very large raster operations	Rasterio windowed read + Numba or NumPy
GPU-accelerated custom kernels	cuSpatial or CuPy + CUDA kernels

Numba’s sweet spot is custom numerical algorithms over coordinate arrays that do not fit any existing library function. If a Shapely function covers your operation, use it — Shapely’s C implementation will be comparable to Numba for standard ops and requires no JIT warmup.

# Caching Compiled Functions

Numba can cache compiled functions to disk, eliminating the warmup cost on subsequent runs:

@njit(cache=True, parallel=True, fastmath=True)
def haversine_matrix(lats_a, lons_a, lats_b, lons_b):
    ...

With cache=True, the compiled binary is saved to a __pycache__ directory. On the next Python interpreter start, Numba loads the cached binary directly — the first call is as fast as subsequent calls.