Shapely 2.0 covers the standard geometry operations. NumPy covers the standard array operations. But real spatial analysis frequently involves custom algorithms — weighted distance decay functions, bespoke spatial smoothing kernels, specialised clustering metrics — that do not map cleanly onto any standard library function.
When you find yourself writing a nested Python loop over coordinate arrays because no existing library function does exactly what you need, that is the use case for Numba.
# What Numba Does
Numba is a JIT (just-in-time) compiler for Python functions that operate on NumPy arrays. You write a Python function with standard Python syntax. Numba compiles it to machine code the first time it is called, and all subsequent calls execute the compiled version.
The key constraint: Numba works on numerical operations over NumPy arrays. It does not understand Python objects, Pandas DataFrames, or Shapely geometries. The workflow is:
- Extract coordinates from geometries as NumPy arrays (using Shapely 2.0’s array API)
- Write a Numba-compiled function that operates on those coordinate arrays
- Reconstruct geometries from the result arrays if needed
This restriction is actually a feature: it forces you to think in arrays, which is the only way to write fast spatial code anyway.
# The Haversine Distance Matrix
Computing the full pairwise distance matrix between N points on a sphere is a canonical example of a custom spatial kernel. scipy.spatial.distance.cdist can do this with a custom metric, but Numba gives you explicit control over memory layout and parallelism.
import numpy as np
from numba import njit, prange
import time
@njit(parallel=True, fastmath=True)
def haversine_matrix(lats_a, lons_a, lats_b, lons_b):
"""
Compute haversine distance (km) between all pairs in A × B.
Returns (len_a, len_b) float32 matrix.
"""
n_a = len(lats_a)
n_b = len(lons_b)
result = np.empty((n_a, n_b), dtype=np.float32)
R = 6371.0 # Earth radius in km
for i in prange(n_a): # prange → parallel over rows
lat1 = np.radians(lats_a[i])
lon1 = np.radians(lons_a[i])
for j in range(n_b):
lat2 = np.radians(lats_b[j])
lon2 = np.radians(lons_b[j])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = np.sin(dlat * 0.5)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon * 0.5)**2
result[i, j] = np.float32(2.0 * R * np.arcsin(np.sqrt(a)))
return result
# ── Benchmark ─────────────────────────────────────────────────────────────────
rng = np.random.default_rng(0)
n_points = 5_000
lats_a = rng.uniform(-44.0, -10.0, n_points).astype(np.float64)
lons_a = rng.uniform(113.0, 154.0, n_points).astype(np.float64)
lats_b = rng.uniform(-44.0, -10.0, n_points).astype(np.float64)
lons_b = rng.uniform(113.0, 154.0, n_points).astype(np.float64)
# First call: JIT compilation (one-time cost)
_ = haversine_matrix(lats_a[:10], lons_a[:10], lats_b[:10], lons_b[:10])
# Subsequent calls: compiled machine code
t0 = time.perf_counter()
dist_matrix = haversine_matrix(lats_a, lons_a, lats_b, lons_b)
elapsed = time.perf_counter() - t0
print(f"Haversine matrix (5k × 5k = 25M pairs): {elapsed:.3f}s")
# → 0.18s with parallel=True on 8 cores
# Pure Python equivalent would take ~45s
The parallel=True flag with prange distributes the outer loop across CPU cores. On an 8-core machine, this gives roughly 7–8× additional speedup on top of the Numba compilation speedup.
# Warmup: Handling the First-Call JIT Compilation
The first call to a @njit function triggers JIT compilation, which typically takes 0.5–3 seconds. In production code, you want to trigger this at startup, not during your first real request:
def warmup_kernels():
"""Call with tiny dummy data to trigger JIT compilation at import time."""
dummy = np.array([0.0])
haversine_matrix(dummy, dummy, dummy, dummy)
spatial_smooth(dummy.reshape(1, 1), dummy.reshape(1, 1), 1.0)
print("Numba kernels compiled and ready.")
# In your module or application startup:
warmup_kernels()
# Spatial Smoothing Kernel: Inverse Distance Weighting
A common analysis pattern: given a grid of output locations and a set of observed values at irregular input locations, interpolate values using inverse distance weighting (IDW). This is not available as a Shapely or GeoPandas function — it is a custom kernel.
import numpy as np
from numba import njit, prange
@njit(parallel=True, fastmath=True)
def idw_interpolate(
grid_x: np.ndarray, # (G,) output grid x-coordinates
grid_y: np.ndarray, # (G,) output grid y-coordinates
obs_x: np.ndarray, # (N,) observation x-coordinates
obs_y: np.ndarray, # (N,) observation y-coordinates
obs_v: np.ndarray, # (N,) observed values
power: float = 2.0, # IDW power (typically 1 or 2)
max_dist: float = 50.0 # ignore observations beyond this distance
) -> np.ndarray:
"""
Inverse distance weighted interpolation.
Returns (G,) array of interpolated values at grid locations.
"""
n_grid = len(grid_x)
n_obs = len(obs_x)
result = np.zeros(n_grid, dtype=np.float64)
for i in prange(n_grid):
weight_sum = 0.0
value_sum = 0.0
gx = grid_x[i]
gy = grid_y[i]
for j in range(n_obs):
dx = gx - obs_x[j]
dy = gy - obs_y[j]
dist = np.sqrt(dx*dx + dy*dy)
if dist < 1e-10:
# Exactly at an observation point
value_sum = obs_v[j]
weight_sum = 1.0
break
if dist > max_dist:
continue
w = 1.0 / (dist ** power)
weight_sum += w
value_sum += w * obs_v[j]
if weight_sum > 0:
result[i] = value_sum / weight_sum
else:
result[i] = np.nan
return result
# ── Example: interpolate pollution readings to a 500×500 grid ─────────────────
import time
rng = np.random.default_rng(1)
# 2,000 pollution monitoring stations (lon, lat in GDA94 zone 55 — easting/northing)
n_obs = 2_000
obs_x = rng.uniform(300_000, 700_000, n_obs).astype(np.float64)
obs_y = rng.uniform(5_800_000, 6_500_000, n_obs).astype(np.float64)
obs_v = rng.uniform(0, 100, n_obs).astype(np.float64) # PM2.5 μg/m³
# 250k grid cells
grid_x_1d = np.linspace(300_000, 700_000, 500)
grid_y_1d = np.linspace(5_800_000, 6_500_000, 500)
grid_x, grid_y = np.meshgrid(grid_x_1d, grid_y_1d)
grid_x = grid_x.ravel().astype(np.float64)
grid_y = grid_y.ravel().astype(np.float64)
# Warmup
_ = idw_interpolate(grid_x[:10], grid_y[:10], obs_x, obs_y, obs_v)
t0 = time.perf_counter()
interpolated = idw_interpolate(grid_x, grid_y, obs_x, obs_y, obs_v, power=2.0, max_dist=50_000)
elapsed = time.perf_counter() - t0
print(f"IDW interpolation (250k grid × 2k obs): {elapsed:.3f}s")
# → 0.22s
grid_result = interpolated.reshape(500, 500)
The equivalent Python implementation (nested loops without Numba) takes approximately 38 seconds. 170× speedup.
# Point-in-Polygon for Custom Polygon Representation
Shapely’s contains is excellent for most point-in-polygon tasks, but sometimes your polygons are stored as flat NumPy arrays (e.g. loaded from a custom binary format) rather than Shapely objects. A Numba ray-casting implementation:
import numpy as np
from numba import njit, prange
@njit
def point_in_polygon(px: float, py: float, poly_x: np.ndarray, poly_y: np.ndarray) -> bool:
"""
Ray-casting algorithm for point-in-polygon.
poly_x, poly_y are the exterior ring coordinates (first == last is OK).
"""
n = len(poly_x)
inside = False
j = n - 1
for i in range(n):
xi, yi = poly_x[i], poly_y[i]
xj, yj = poly_x[j], poly_y[j]
if ((yi > py) != (yj > py)) and (px < (xj - xi) * (py - yi) / (yj - yi) + xi):
inside = not inside
j = i
return inside
@njit(parallel=True)
def points_in_polygon_batch(
px: np.ndarray, py: np.ndarray,
poly_x: np.ndarray, poly_y: np.ndarray
) -> np.ndarray:
"""Check N points against a single polygon."""
result = np.empty(len(px), dtype=np.bool_)
for i in prange(len(px)):
result[i] = point_in_polygon(px[i], py[i], poly_x, poly_y)
return result
# The Numba–Shapely Bridge Pattern
The practical workflow for Numba-based spatial analysis:
import shapely
import numpy as np
from numba import njit, prange
# Step 1: Extract coordinates from Shapely geometry array
geom_array = gdf.geometry.values # (N,) geometry array of Points
# For Points: use get_x/get_y (returns flat arrays)
x = shapely.get_x(geom_array) # (N,) float64
y = shapely.get_y(geom_array) # (N,) float64
# Step 2: Run Numba kernel on coordinate arrays
result_values = my_numba_kernel(x, y, ...) # (N,) or (N, M) float array
# Step 3: Attach result back to GeoDataFrame
gdf['result'] = result_values
This separation of concerns — Shapely for geometry manipulation, Numba for numerical computation — is clean, fast, and testable. The Numba functions are pure numerical functions that can be tested with NumPy arrays independently of any GIS context.
# When to Use Numba vs Alternatives
| Scenario | Best tool |
|---|---|
| Standard geometry ops (buffer, union, distance) | Shapely 2.0 array API |
| Spatial joins, nearest-neighbour | Shapely STRtree |
| CRS reprojection | PyProj Transformer array API |
| Custom distance metrics, kernels | Numba @njit |
| Very large raster operations | Rasterio windowed read + Numba or NumPy |
| GPU-accelerated custom kernels | cuSpatial or CuPy + CUDA kernels |
Numba’s sweet spot is custom numerical algorithms over coordinate arrays that do not fit any existing library function. If a Shapely function covers your operation, use it — Shapely’s C implementation will be comparable to Numba for standard ops and requires no JIT warmup.
# Caching Compiled Functions
Numba can cache compiled functions to disk, eliminating the warmup cost on subsequent runs:
@njit(cache=True, parallel=True, fastmath=True)
def haversine_matrix(lats_a, lons_a, lats_b, lons_b):
...
With cache=True, the compiled binary is saved to a __pycache__ directory. On the next Python interpreter start, Numba loads the cached binary directly — the first call is as fast as subsequent calls.
Related reading: Shapely 2.0: The Vectorisation Revolution in Python Geometry · cuSpatial: GPU-Accelerated Spatial Analytics with RAPIDS · Rasterio Windowed Reading for Large Rasters