A national elevation dataset arrives as 2,847 1-degree tiles, each a separate GeoTIFF. A Sentinel-2 mosaic for Australia covers 847 scenes. A LIDAR survey is delivered as 3,400 LAZ-derived rasters. In every case, the data you need is conceptually a single continuous raster — but physically it is thousands of separate files.
The brute-force solution: merge all tiles into a single GeoTIFF. For a national 1m DEM that is a 50GB write operation that creates a 50GB file and doubles your storage cost. And when the source data is updated, you have to redo it.
The correct solution: a GDAL VRT file. It is a tiny XML file (typically a few KB) that describes a virtual mosaic — how the source tiles fit together, how they should be reprojected or resampled, and what the overall bounds and resolution are. Opening the VRT with GDAL, Rasterio, or XArray looks exactly like opening a single raster. No data is copied. No storage is consumed. Construction takes milliseconds.
# Building a VRT from Python
import subprocess
from pathlib import Path
import glob
# List of source GeoTIFF files
tif_files = sorted(glob.glob("/data/dem_tiles/*.tif"))
print(f"Found {len(tif_files)} tiles")
# Build a VRT mosaic
output_vrt = "/data/national_dem.vrt"
result = subprocess.run(
["gdalbuildvrt", output_vrt] + tif_files,
capture_output=True, text=True
)
print(result.stdout)
print(f"VRT size: {Path(output_vrt).stat().st_size / 1024:.1f} KB")
# → VRT size: 148.3 KB (for 2847 tiles)
Or using GDAL’s Python bindings directly:
from osgeo import gdal
vrt = gdal.BuildVRT(
"/data/national_dem.vrt",
tif_files,
options=gdal.BuildVRTOptions(
resampleAlg="bilinear",
outputBounds=None, # auto-compute from tiles
xRes=None, yRes=None, # use native resolution
targetAlignedPixels=True
)
)
vrt = None # flush and close
# Reading a VRT with Rasterio
Once built, the VRT is transparent to any GDAL-backed reader:
import rasterio
from rasterio.windows import from_bounds
with rasterio.open("/data/national_dem.vrt") as src:
print(f"Total extent: {src.bounds}")
print(f"CRS: {src.crs}")
print(f"Shape: {src.width} × {src.height} pixels")
# → 150,000 × 120,000 (if 1m resolution, national scale)
# Read a specific region — only the relevant tiles are accessed
brisbane_window = from_bounds(
152.9, -27.6, 153.2, -27.3,
transform=src.transform
)
data = src.read(1, window=brisbane_window)
print(f"Brisbane DEM shape: {data.shape}")
# → (33000, 33000) for a 33km × 33km area at 1m
# Only the tiles that intersect this window are read from disk
# VRT Options: Reprojection and Resampling
VRT files can include on-the-fly reprojection:
from osgeo import gdal
# Build a VRT in a different CRS than the source tiles
vrt = gdal.BuildVRT(
"/data/national_dem_wgs84.vrt",
tif_files,
options=gdal.BuildVRTOptions(
outputSRS="EPSG:4326", # reproject to WGS84 on-the-fly
resampleAlg="bilinear",
xRes=0.0001, yRes=0.0001 # ~10m at this latitude
)
)
vrt = None
Reading through this VRT applies the reprojection transparently on each pixel read.
For a mosaic with tiles at different resolutions (e.g. a mix of 1m and 2m tiles):
# Build VRT at a consistent target resolution
vrt = gdal.BuildVRT(
"/data/dem_1m_mosaic.vrt",
tif_files,
options=gdal.BuildVRTOptions(
xRes=1.0, yRes=1.0, # 1m target resolution
resampleAlg="bilinear",
targetAlignedPixels=True # align to whole-pixel grid
)
)
# Banded VRT: Multi-temporal Stack
VRT can also stack multiple single-band rasters into a multi-band dataset, useful for composing band stacks from per-band files (common in satellite products):
from osgeo import gdal
# Sentinel-2 scene delivered as separate band files
band_files = {
"B02": "S2_20220615_B02.tif",
"B03": "S2_20220615_B03.tif",
"B04": "S2_20220615_B04.tif",
"B08": "S2_20220615_B08.tif",
}
# Build a 4-band VRT without copying data
vrt_path = "S2_20220615_stack.vrt"
vrt = gdal.BuildVRT(
vrt_path,
list(band_files.values()),
options=gdal.BuildVRTOptions(separate=True) # each file → separate band
)
vrt = None
# Now read all 4 bands as a single array
with rasterio.open(vrt_path) as src:
print(f"Band count: {src.count}") # 4
data = src.read() # (4, height, width)
# XArray Integration
XArray’s Rasterio engine reads VRT files natively:
import xarray as xr
import rioxarray # extends xarray with rasterio support
# Open a VRT as an XArray DataArray (lazy)
dem = xr.open_dataset("/data/national_dem.vrt", engine="rasterio")
print(dem)
# Or with rioxarray
dem = rioxarray.open_rasterio("/data/national_dem.vrt", chunks={"x": 1024, "y": 1024})
# This creates a Dask-backed lazy DataArray — the VRT handles tile access,
# Dask handles parallel computation
# VRT for Overviews (Pyramid Levels)
When the full-resolution mosaic is too slow to browse or thumbnail, add overview levels to the VRT:
# Build overview pyramids inside the VRT (no data copy — stores overview tiling info)
gdaladdo /data/national_dem.vrt 2 4 8 16 32
Or from Python:
from osgeo import gdal
ds = gdal.Open("/data/national_dem.vrt", gdal.GA_Update)
ds.BuildOverviews("AVERAGE", [2, 4, 8, 16, 32])
ds = None
After adding overviews, a web mapping application rendering the mosaic at a small zoom level reads the low-resolution overview data rather than the full-resolution tiles.
# Tiled VRT for Large Mosaics
For mosaics with thousands of tiles, GDAL has to inspect all tile extents at open time. Pre-tile the VRT to reduce this overhead:
# Build with internal tiling — faster open time for large tile counts
gdalbuildvrt -tileindex /data/national_dem.vrt /data/dem_tiles/*.tif
Or use a VRT with an OGR tile index sidecar:
from osgeo import gdal
# Build a tile-indexed VRT — O(1) tile lookup vs O(N) scan
gdal.BuildVRT(
"/data/dem_tileindex.vrt",
tif_files,
options=gdal.BuildVRTOptions(outputBounds=None, resolution="highest")
)
# Comparing VRT to Physical Merges
| Operation | VRT | gdal_merge / rasterio merge |
|---|---|---|
| Construction time | Milliseconds | Minutes to hours |
| Storage cost | ~100KB metadata | Same as total input data |
| Read performance | Same as tiled GeoTIFF | Same |
| Source update propagation | Automatic | Requires re-merge |
| Edge blending (seamlines) | None (nearest tile) | Yes (with feathering) |
| COG compatibility | VRT → COG conversion available | Yes |
VRT is not the right choice if you need seamline blending or if you need a single portable file. For everything else — national-scale analysis, pipeline automation, lazy mosaic access — VRT is the correct tool and physical merge is the wrong one.
Related reading: Rasterio Windowed Reading for Large Rasters · XArray and Zarr: Chunked Raster Storage for Cloud Workflows · The Modern Case for Rasters