Notes on processing large files with GDAL

Context

I have recently had a project where I was provided with a reasonable number of large aerial image files. These were 600-800MB each and there were 10-15 of them. I needed to cut out the area of interest (which covered some part of all of the images), mosaic the images into a single representation of the site and find a way to share the data with non-specialists.

The workflow in my head was Clip – Mosaic – Webmap

Things to note

The images were too slow to render for me to mess about trying to do this using a GUI system, and the GDAL functions in QGIS3 (for me at least) don’t seem to be up and running properly (milage may vary).

Certain things that I have found when doing this in GDAL are that it’s best to define a data type for the bands, and to set the data structure as cloud optimised. The easiest way to create cloud optimised geotiffs is to use the following:

gdal_translate in.tif out.tif -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=DEFLATE

Initially I was going to use gdal_merge.py to mosaic the clipped images as that is what is used in QGIS. However, gdal_merge.py reads everything into memory and I quickly found out that 16GB of RAM wasn’t enough as my swap partition started to be used and the whole thing ground to a halt.

The trick is to use the CPU if you can. First make a virtual raster using gdalbuildvrt, and then use gdal_translate (which utilises the CPU) to change the vrt file to whatever format you want (e.g. cloud optimised geotiff). This was actually really fast, and used the structure found in the following commands:

gdalbuildvrt output.vrt /path/to/folder/of/*.tif
gdal_translate -of GTiff output.vrt mosaic.tif

This ends up with an image that will be relatively large (1.2 GB in my case) which is still tricky to share. But fear not! We can use gdal2tiles.py to create a whole load of tiles and automatically generate some leaflet code to allow you to see the data. Then it’s just a question of moving that folder onto a web server and sending your contacts the link.

A quick thing to note/remember is that the webmap zoom factors relate to the following:

  • 0 represents the whole world (1:500,000,000)
  • 19 is very close up (1:1000)

Workflow

The workflow that I used in the end was as follows:

First crop each aerial image to the site shapefile

maindir="/path/to/folder/of/incoming/aerial/files"
cd ${maindir}

for f in *.tif

do
gdalwarp -ot Byte -of GTiff -cutline siteBoundary.shp 
-crop_to_cutline -dstnodata 0.0 -overwrite ${f} 
outputfolder/${f}_cropped.tif-co TILED=YES -co COPY_SRC_OVERVIEWS=YES
 -co COMPRESS=DEFLATE
done

Then build the virtual image

gdalbuildvrt output.vrt /path/to/folder/of/*.tif

Output the virtual image to a cloud optimised geotiff

gdal_translate -ot Byte -of GTiff -co TILED=YES 
-co COPY_SRC_OVERVIEWS=YES -co COMPRESS=DEFLATE output.vrt mosaic.tif

Create the web tiles and map file using the following command

gdal2tiles.py -s EPSG:27700 -z 11-19 mosaic.tif

Sources

https://lostingeospace.blogspot.com/2011/04/rapid-mosaicking-with-gdal.html

http://www.cogeo.org/

Advertisement

One thought on “Notes on processing large files with GDAL

Add yours

  1. nice post… but to be honest, I’m not a big fan of gdal2tiles.py, and love mapserv CGI to serve whole tiles from single .tif raster.

    adding simple .htaccess REWRITE will serve tile better IMHO,
    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d

    RewriteRule myuniqname/(.*)/(.*)/(.*)\.png$ “/cgi-bin/mapserv?map=/path/to/my/map/file.map&mode=tile&layers=layername&tilemode=gmap&tile=$2 $3 $1” [L]

    The drawback is it’s only serve EPSG 3785 google Pseudo Mercator XYZ tile… and It may use more CPU on server.. but less un-necessary tiles generated …

    btw, TIA

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

A WordPress.com Website.

Up ↑

%d bloggers like this: