Thursday, 31 October 2013

Not very INSPIREd: Land Registry 'Open' Data

Many OpenStreetMap contributors have been very excited about the potential for importing address information from data sets released under the European Union's INSPIRE directive. Our experience in the UK tends to make us more cautious in our expectations, and so it proved with the latest release of Open Government Data under this programme.

Comparison of Land Registry parcels with gardens on OSM in Sutton Coldfield
Scatter plot with log-log scale.
Firstly, (another) quick word about the complexities of organisation of cadastral data in the UK. There are separate cadastral agencies for each of England and Wales, Scotland, and Northern Ireland (note the Oxford comma), which run in different ways: not least because the legal framework of Scot's Law is quite different from English Law. Thus the data released only relates to England and Wales.

Although the data is released under the OGL, there is an important caveat:

  • third party rights the Information Provider is not authorised to license;

As all Land Registry parcels are created using very detailed Ordnance Survey maps, this means that in effect the polygons are restricted by Ordnance Survey licensing terms (and, probably, over-onerous interpretation of these terms by OS lawyers'). What this means is that I am not going to show a single land parcel in this post: which makes it quite difficult to illustrate. [Chris Hill beat me to posting about Land Registry data on his blog, he's been braver].

When the data were first made available I was sceptical as to how they might be used. I was proved right by John Flitton who wrote to the Ordnance Survey and received a long list of things he could not do.
Under ordinary circumstances most OSM contributors in Great Britain have not evinced a great deal of interest in land parcels: they are not obvious on the ground, and add somewhere between nothing and not a lot to hows OSM data is used in practice. Therefore although the highly restrictive licensing wasn't a surprise, it wasn't particularly annoying to not be able to use this data. In fact I only returned to look at it because I was irritated by the Ordnance Survey's high-handed list of restrictions (not to say the absurdly expensive costs of licensing the data, starting at £0.19).

As with many other open data sets, one of the under-appreciated uses is to check the quality of OSM data. I know of several places in England where the gardens of most houses have been mapped. Typically a garden is highly likely to correspond to the freehold land parcel. Therefore I chose Sutton Coldfield, mapped about 3 years ago from Bing imagery by blackadder, to compare OSM private gardens with LR parcels.

To start with I downloaded fairly arbitary bboxes from both datasets around Sutton Coldfield: I didn't try and use the same bbox for both because I grabbed the data from different sources (bbbike for OSM, Birmingham download in QGIS for LR).

In order to reduce the set of comparisons I calculated centroid for all parcels in both sets of data, and worked only with those parcels which had a centroid from the other data set within it's bounds. Parcels containing 2 or more centroid s were filtered out to simplify the comparison. Comparisons were run both ways (OSM centroids in LR parcels, LR centroids in OSM parcels). These data sets provided around 7000 comparable parcels, with under 10% rejected because of the 2 centroids criterion.

Visualisation of the two datasets showed a good obvious correspondence with the OSM data frequently being slightly displaced to the North West. Unfortunately I am not going to risk showing any of these visualisations in case it catches the ire of OSGB's legal eagles.

Scatter of plot size correspondence
the outlier is an OSM house parcel whose centroid located it on a Golf Course
(this may have reflected construction work at the time).

With the parcels from the two datasets linked by primary key it was easy to generate a number of comparisons (total area, overlapping area, centroid displacement etc). The scatter plot at the top of the post shows a good correspondence between the areas of plots which were assumed to be the same place based on location of a centroid within a plot. I haven't done any formal stats, but a quick plot of distributions suggests that the correlation is OK:

Histogram of size of OSM plots expressed as a percentage of LR plots.


  • Ordnance Survey Polygon Pricing is Absurd. I used an initial subset 30000 of LR polygons, even at the cheapest prices (19p /polygon) I would have needed to pay the OS nearly £6,000 to perform this analysis, had I shared the data. There must be thousands of business (big and small) who from time to time would find it useful to perform one-off analyses where this sort of data might be useful. Obviously they won't. These huge up-front costs are also obviously a barrier for any organisation which might want to do GIS consultancy in the UK. So the Ordnance Survey and it's resellers enjoy a nice little pseudo-monopoly. Note that OS pricing does not reflect the marginal utility of their product to the end user : a classic symptom of a non-competitive market.
  • OS legal advice. By explicitly stating that the polygons cannot be shown on a blog-post, the OS may be misinterepreting the fair-dealing terms of the Copyright Act under which their data is protected. It does not behove a government funded body to misrepresent the rights of the public under the relevant legislation. Thanks to Chris Hill for pointing this out.
  • OSM Polygons are pretty good. The size and displacement of OSM polygons from the LR data is small, well within the anticipated accuracy expected for data traced from aerial imagery. Rear boundaries of gardens were particularly hard to trace accurately because they are often obscured by trees.
  • OSM Accuracy can be further improved. The data shows a general displacement of the OSM polygons by around 3 metres. Simply shifting the polygons by this displacement would increase accuracy. More accurate tools for aligning aerial images (such as accurately located ground control points), may also improve accuracy.
  • Conflation. I haven't looked a this in detail, but it is obvious that matching LR and OSM polygons well enough to perform conflation (merging of the data between the two sets) requires more sophisticated method than I have used (my centroid method was simple, but produced some extreme outliers). As overall enthusiasm for using INSPIRE data increases the OSM community MUST work to develop robust tools for managing conflation. I know that Osmose has some features supporting conflation, but am not familiar with it in detail.

1 comment:

  1. Thanks for this interesting blog. The limitations put on use of INSPIRE polygons do seem crazy! I'm interested in the OSM polygons - what kind of size are they? What boundaries are they based on? Any help or signposting would be really helpful! Thanks!