Architecture

Tools & Technology

Analytics & Intelligence

Fundamentals

Python Performance

Python Performance

cuSpatial: GPU-Accelerated Spatial Analytics with RAPIDS

cuSpatial brings GPU acceleration to spatial operations — point-in-polygon, nearest-neighbour, trajectory analysis, and more — via the RAPIDS ecosystem. When CPU-based tools hit their ceiling on very large datasets, cuSpatial can deliver 10–100x additional speedups.

12 min read
Python Performance

PostGIS, GeoPandas, and DuckDB: A Decision Framework for Spatial Analytics

Three tools dominate Python spatial analytics: PostGIS for indexed server queries, GeoPandas for in-memory Python-native analysis, and DuckDB for serverless SQL on files. Understanding which to reach for — and when to combine them — is the difference between a fast pipeline and a slow one.

12 min read
Python Performance

Dask-GeoPandas: Parallel Spatial Processing Across CPU Cores

Dask-GeoPandas partitions a GeoDataFrame across CPU cores for parallel spatial operations. When Shapely's vectorised API saturates a single core, Dask-GeoPandas scales across all cores. Understand partitioning strategy, operation compatibility, and a real 50-million-parcel workflow.

11 min read
Python Performance

H3 Hierarchical Smoothing: Continuous Spatial Surfaces from Sparse Data

Sparse point data creates noisy hexbin maps with many empty cells. H3's K-ring neighbourhood and parent-child hierarchy enable spatial smoothing that fills gaps and creates continuous surfaces — without interpolation libraries or complex kernels.

10 min read
Python Performance

H3 as a Spatial Join Accelerator: Approximate Joins at Scale

Exact spatial joins via STRtree are precise but can be slow for very large datasets. H3 cell assignment reduces spatial joins to fast hash joins — trading a controlled approximation for orders-of-magnitude speed improvements. When this tradeoff makes sense and how to implement it.

10 min read
Python Performance

H3 Parent Rollup: Multi-Scale Aggregation Without Re-aggregating

H3's strict parent-child hierarchy enables pre-computing aggregations at fine resolution and rolling them up to coarser scales in microseconds. Build a multi-resolution data cube that answers queries at any resolution without reprocessing the source data.

10 min read
Python Performance

GDAL VRT: Virtual Raster Mosaics Without Copying Data

A GDAL Virtual Raster (VRT) file stitches hundreds of GeoTIFFs into a single virtual mosaic in milliseconds — no data copying, no reprojection, no storage cost. Essential for national-scale raster workflows in Python.

9 min read
Python Performance

Rasterio Windowed Reading: Processing Large Rasters Without Loading Them

Rasterio's windowed read API lets you process arbitrarily large rasters by reading and processing small tiles at a time. Learn the block-aligned reading pattern, CRS-to-pixel coordinate conversion, and a complete tiling pipeline for a 50GB DEM.

10 min read
Python Performance

stackstac: Lazy Satellite Time Series from STAC Catalogues

stackstac turns STAC API search results into lazy Dask-backed XArray DataArrays — loading only the pixels you need, when you need them. The practical guide to building cloud-native satellite time series pipelines without downloading data.

11 min read
Python Performance

XArray and Zarr: Chunked Raster Storage for Cloud Workflows

Zarr is the cloud-native array storage format that pairs with XArray to make petabyte raster datasets queryable without full downloads. Understand chunk strategy, compression, and how to build read-optimised Zarr stores for spatial workflows.

12 min read
Python Performance

XArray GroupBy for Temporal Raster Aggregation: Monthly Means at Scale

XArray's GroupBy and resample operations run on lazy Dask arrays, enabling temporal aggregation of multi-year satellite raster stacks without loading data into memory. Walk through computing monthly mean NDVI for a decade of Landsat tiles.

11 min read
Python Performance

DuckDB Spatial Analytics on GeoParquet: In-Process SQL for Geodata

DuckDB's spatial extension brings full PostGIS-style SQL to GeoParquet files without a database server. Run spatial joins, aggregations, and geometry operations directly on files at PostGIS-comparable speeds — in-process, zero infrastructure.

12 min read
Python Performance

FlatGeobuf: Streaming Spatial Data Without a Server

FlatGeobuf is a binary vector format with a built-in spatial index that supports HTTP range requests — enabling bbox-filtered streaming of large datasets directly from cloud storage without any server infrastructure. Here is how it works and when to use it.

9 min read
Python Performance

GeoParquet Spatial Partitioning: Optimising for Parallel Reads

Spatial partitioning of GeoParquet files turns a slow sequential scan into a parallelisable set of file reads. Learn how to partition by H3 cell, bounding grid, and administrative region — with DuckDB and Dask-GeoPandas integration patterns.

10 min read
Python Performance

GeoParquet: The Analytical Format for Vector Geodata

GeoParquet brings the columnar, compressed, analytics-optimised Parquet format to vector geometry. Understand the format internals, why it outperforms Shapefile and GeoJSON for analytical workloads, and how to use it effectively in Python.

11 min read
Python Performance

Numba-Accelerated Spatial Kernels: JIT-Compiling Custom Geometry Operations

When Shapely's built-in functions don't cover your algorithm, Numba's JIT compiler can accelerate custom spatial kernels by 100x. Learn the patterns for writing vectorised spatial code with Numba, including a Haversine distance matrix and weighted spatial smoothing example.

13 min read
Python Performance

PyProj Transformer Array API: Reprojecting Millions of Coordinates

PyProj's Transformer.transform() accepts NumPy arrays directly, enabling reprojection of millions of coordinates in a single C-level call. Eliminate per-point CRS transforms that throttle your spatial pipelines.

9 min read
Python Performance

Shapely STRtree: Mass Nearest-Neighbour and Intersection Queries

The STRtree spatial index in Shapely 2.0 enables nearest-neighbour and intersection queries on millions of geometries without nested loops. Learn query_nearest, query, and bulk pattern matching with benchmarks.

12 min read
Python Performance

GeoPandas at Scale: Escaping the .apply() Trap

Using .apply() for spatial operations is the single most common GeoPandas performance anti-pattern. Learn the vectorised replacements that deliver 10–100x improvements, with benchmarks from a real 2-million-ping GPS classification task.

11 min read
Python Performance

Shapely 2.0: The Vectorisation Revolution in Python Geometry

Shapely 2.0 rewrote geometry operations from Python-level loops to C-level array operations, delivering 50–200x speedups for batch geometry tasks. Understand what changed, why it matters, and how to rewrite your code to take advantage.

10 min read