7.5 File formats

Geographic datasets are usually stored as files or in spatial databases.File formats can either store vector or raster data, while spatial databases such as PostGIS can store both (see also Section 9.6.2).Today the variety of file formats may seem bewildering but there has been much consolidation and standardization since the beginnings of GIS software in the 1960s when the first widely distributed program (SYMAP) for spatial analysis was created at Harvard University (Coppock and Rhind 1991).

GDAL (which should be pronounced “goo-dal”, with the double “o” making a reference to object-orientation), the Geospatial Data Abstraction Library, has resolved many issues associated with incompatibility between geographic file formats since its release in 2000.GDAL provides a unified and high-performance interface for reading and writing of many raster and vector data formats.Many open and proprietary GIS programs, including GRASS, ArcGIS and QGIS, use GDAL behind their GUIs for doing the legwork of ingesting and spitting out geographic data in appropriate formats.

GDAL provides access to more than 200 vector and raster data formats.Table 7.2 presents some basic information about selected and often used spatial file formats.

Table 7.2: Selected spatial file formats.
NameExtensionInfoTypeModel
ESRI Shapefile.shp (the main file)Popular format consisting of at least three files. No support for: files > 2GB; mixed types; names > 10 chars; cols > 255.VectorPartially open
GeoJSON.geojsonExtends the JSON exchange format by including a subset of the simple feature representation.VectorOpen
KML.kmlXML-based format for spatial visualization, developed for use with Google Earth. Zipped KML file forms the KMZ format.VectorOpen
GPX.gpxXML schema created for exchange of GPS data.VectorOpen
GeoTIFF.tiffPopular raster format similar to .tif format but stores raster header.RasterOpen
Arc ASCII.ascText format where the first six lines represent the raster header, followed by the raster cell values arranged in rows and columns.RasterOpen
R-raster.gri, .grdNative raster format of the R-package raster.RasterOpen
SQLite/SpatiaLite.sqliteStandalone relational database, SpatiaLite is the spatial extension of SQLite.Vector and rasterOpen
ESRI FileGDB.gdbSpatial and nonspatial objects created by ArcGIS. Allows: multiple feature classes; topology. Limited support from GDAL.Vector and rasterProprietary
GeoPackage.gpkgLightweight database container based on SQLite allowing an easy and platform-independent exchange of geodataVector and rasterOpen

An important development ensuring the standardization and open-sourcing of file formats was the founding of the Open Geospatial Consortium (OGC) in 1994.Beyond defining the simple features data model (see Section 2.2.1), the OGC also coordinates the development of open standards, for example as used in file formats such as KML and GeoPackage.Open file formats of the kind endorsed by the OGC have several advantages over proprietary formats: the standards are published, ensure transparency and open up the possibility for users to further develop and adjust the file formats to their specific needs.

ESRI Shapefile is the most popular vector data exchange format.However, it is not an open format (though its specification is open).It was developed in the early 1990s and has a number of limitations.First of all, it is a multi-file format, which consists of at least three files.It only supports 255 columns, column names are restricted to ten characters and the file size limit is 2 GB.Furthermore, ESRI Shapefile does not support all possible geometry types, for example, it is unable to distinguish between a polygon and a multipolygon.33Despite these limitations, a viable alternative had been missing for a long time.In the meantime, GeoPackage emerged, and seems to be a more than suitable replacement candidate for ESRI Shapefile.Geopackage is a format for exchanging geospatial information and an OGC standard.The GeoPackage standard describes the rules on how to store geospatial information in a tiny SQLite container.Hence, GeoPackage is a lightweight spatial database container, which allows the storage of vector and raster data but also of non-spatial data and extensions.Aside from GeoPackage, there are other geospatial data exchange formats worth checking out (Table 7.2).