Wednesday, 2 February 2011
An Exploration of Bad Polygons
I'd hoped to get further with my Urban Atlas simulation, but have been distracted by badly formed polygons. In using osm2pgsql, I relied on it for conversion of OSM data to polygons in PostGIS. The polygons get created and load fine, but once I started clipping sets of data I noticed that I was losing some landuse polygons. Specifically I noticed that the residential landuse for the two large villages (or small towns) of Bingham and Radcliffe-on-Trent were missing. The image above shows the Harlequin area of Radcliffe with residential landcover (red), but the polygon for the rest of the village has gone.
I was not at all sure where the problem lay. I reimported the data with a different version of osm2pgsql; I used an older data set; I even rendered the area using a modified version of the OSM mapnik stylesheet (see image to right with the missing polygons highlighted) . In all cases the Bingham and Radcliffe polygons could be retrieved and displayed in QGIS but disappeared on clipping. They were successfully rendered by mapnik. When I tried to perform the clipping in PostGIS the error messages were much more explicit. A bit of delving in the PostGIS manual led me to the ST_IsValid and ST_IsValidReason functions. Even better a quick search found a nifty function called CleanGeometry (link to code here) which I have now installed in my template OSM database on Postgres. Running this on the landuse polygons got rid of the intersections, so problem solved.
Only partially. It's really much better to find the problem at source and resolve it there. I had tried the JOSM validator on the data but it did not report any problems, so I was still uncertain if it was a hidden bug in osm2pgsql or a data problem.
I left the issue for a couple of days, until, whilst checking some address data, I noticed Geofabrik's OSM Inspector had a set of Geometry validation tools. I'd never found a use for these in the past. Of course, Jochen Topf and Frederik Ramm thought about this sort of problem long ago and I could instantly see the exact location of the problems, and even click on an icon to start-up Potlatch in the right location. Just another illustration of the rich endowment of the OSM ecosystem.
Great Britain has around 1500 badly formed polygons (based on data from Jan 22) or about 0.2% of the total data. Of these about 80% are self-intersecting and the rest are mainly self-intersecting rings. Many are buildings (400 or so, as seen in the screen-shot above), with the rest more or less evenly distributed between landuse, woodland (natural=wood) and water (natural=water). Overall the error rate is extraordinarily low given that most OSM contributors are, like me, probably don't do formal checks on the geometry of their data.
So I can go back to my simulation, having once more found that OSM provides the tools I need.