Wednesday 4 December 2013

British Postcodes on OpenStreetMap

British contributors to OpenStreetMap are known for an apparently unhealthy obsession with postcodes.
OSM_Postcodes_by_Postsector_over5pct_completeness_London
London Postcode Sectors with more than 5% of regular postcodes mapped on OpenStreetMap.
See image for copyright notices. All completeness percentages are shown below.
We are not alone in Britain: there are many groups who need access to postcode data. This is because postcodes in the United Kingdom provide an excellent proxy for geolocation, and as such are widely used: in SatNavs; for geodemographics (such as Acorn, Mosaic, and the (open) ONS Output Area Classfication ); in a host of analytic applications, as well as their more prosaic role in delivering the post. In 2010 postcode centroids were made available through the Ordnance Survey Open Data scheme, under the brand CodePoint Open. Subsequently it was found that the license associated with this data prevented it being used directly in OSM. More recently the Office of National Statistics have released an (identical) data set which is not encumbered by the license of CodePoint Open.



Current State of Postcodes on OSM


OSM_Postcodes_by_Postsector_completeness_GB
Percentage of regular Great Britain postcodes mapped on OSM
shown by Postcode Sector.
However, despite the availability of this data, we had (as of October 2013) only 43,000 postcodes on OSM for Great Britain. In other words less than 3% of the current total of 1,685,340.

WE NEED TO DO BETTER THAN THIS! 


The map shown above is also available as a PDF file and the raw data as a CSV. To highlight the places which have a reasonable amount of mapping of postcodes, I also show ONLY those places with at least 5% of normal postcodes mapped. 

osm_pc_by_ps_pct5a
Same map as before but only showing postal sectors with more than 5% of standard postcodes mapped.

Eliminating the Oddities

Out of the nearly 1.7 million postcodes quite a few are irrelevant to our quest. It is therefore very useful to identity these oddities and eliminate them from the basic work of assignment. Nearly all oddities (and quite a few useful postcodes) can be identified by being co-located with other postcodes, or from having unusual postcodes. Eliminating them seriously reduces (probably by around 10%) the number of postcodes which need to be included in any address mapping strategy.

OSGB Null Island

About 380 postcodes are located at the OSGB grid reference with eastings and northings both 0. This is the local equivalent of Null Island. I have no idea what these postcodes are used for!

Postal Delivery Offices

The largest group of co-located postcodes represent either post offices (main or general post offices), sorting offices, and delivery offices. These include any postcodes assigned to PO Box addresses and presumably some others used for internal purposes.


I've been using the wonderful todo plugin in JOSM to examine a few of these. Often they are mapped, but I've usually been able to add detail. The most frequently used tag is amenity=post_depot.
When they aren't mapped it is usually very easy to confirm that these locations on industrial estates are owned by the Royal Mail: a large number of red vehicles in the car park is a dead giveaway (Swinton Delivery Office is shown in the above excerpt from Bing Maps).

Large Firms

Most large firms with extensive use of postal services will have not just their own postcodes, but individual postcode districts. Experian and Boots in Nottingham use postcodes in NG80 and NG90 respectively. Similarly DVLA uses SA99.  Assuming most postal districts with numeric codes over 50 have such a purpose (the exceptions are probably in the Scottish Highlands) then there are over 100,000 such postcodes.

Not every firm operates this way. I found one on the outskirts of Swindon which is belongs to the Nationwide Building Society. I worked out which firm it was because a friend did some risk management consultancy there in the 1990s!

Local Authorities

A number of cases of many postcodes in one place involve the head offices of local authorities. For instance this building in Workington, and Aberdeen council offices.  Again I have no idea why this is a feature for only a small number of councils.

Using Centroids Directly

There are a number of basic heuristics which can be used for assigning postcodes to houses and streets: unfortunately they often have minor exceptions.
  • A house number and a postcode are typically unique. (The exceptions are rare and usually reflect what the Royal Mail refer to as a subsidiary street: my current example is Naranjan Mews off Gedling Grove. In practice these exceptions have more in common with houses divided into flats.) So far I only know of one example of the exception in Nottingham, so perhaps around 1:6000 postcodes.
  • A postcode will only refer to one named street. (see above for exceptions). In country areas where streets and roads may not be named then a postcode may cover an entire village (such as Kilchenzie in Argyll).
  • Blocks of flats, even when built on a plot where the original housenumber can be inferred will usually have separate postcodes. Beddington Gardens in Wallington, Surrey is an excellent (although somewhat complicated example).
  • Named streets with a small number of houses will have only one postcode.
  • It is not possible to tell without examining how postcodes are distributed whether postcodes are assigned to even and odd sides of the road or in contiguous blocks along a road. A particular convention tends to be consistent in any one area.
  • The postcode centroid is always placed over one of the buildings (delivery points) making up the postcode. This means a certain care needs to be taken in determining to which street a postcode applies.

ncc_road_length
Shorter streets in Nottingham: candidates for streets with a single postcode.
With these points in mind the best practice to use postcode centroids directly is to work on short streets first. These postcodes are usually the easiest to assign, and once assigned make identifying the correct assignment of other postcodes is easier. Also the postcode can be directly added to the street way without having to know any of the house numbers. This type of assignment therefore does not require a detailed survey of addresses up-front.

It is important to remember that absolutely accurate assignment of postcodes is relatively unimportant. If one or two houses are assigned to an adjacent postcode it is not going to affect things like navigational use-cases. Furthermore as more open data (see below) becomes available and we make more use of tools such as Open PostcodeFinder then we can refine the accuracy of the assignments iteratively.

Other Available Data

Food Hygiene

In my view this is the single most useful source of detailed postcode data. The Food Hygiene Rating Scheme covers most local authorities in the United Kingdom, and at present includes over 400,000 premises where food is sold or prepared. The data set contains over 230,000 individual postcodes (> 15 % of total) with detailed address information. In many cases the premises will already have been mapped on OSM, or can be identified from aerial images or OS OpenData StreetView maps (shops, schools, etc. are usually fairly distinctive). (I don't recommend mapping POI data solely from FHRS because retail POIs tend to change fairly rapidly: I've seen a figure of 20% a year quoted).

fhrs_postcodes_assigned_to_named_highway
Enriching open postcode data with street names.
By merging OSM road data with FHRS postcode locations and addresses
it is possible to assign postcode centroids to a single street (thick red lines).
Postcodes at bottom centre are in the University of Nottingham and dont have street names in their addresses.
I have already made a start on using this data (see above image from London OSM Hack Weekend), and will blog shortly on the details of manipulating the data to get something useful out of it.

Land Registry Price Paid

Whilst writing this article, the Land Registry of England and Wales released several years (from 1995) of data of prices paid in property transactions. This data includes addresses and postcodes. It is therefore a very useful complement to the FHRS data as it is mainly concerned with residential rather than private property.

Stop Press: I downloaded this data whilst finishing off some maps for this blog. It is very useful: providing the associated street for over 1 million postcodes in England and Wales.

Other Open Data 

Any other source of open data which contains addresses with postcodes can be used to assist resolving postcodes. Nottingham Open Data, for instance, includes Licensed Premises, Planning Applications, Places of Worship, Community Centres, Schools, Childcare facilities etc.

Basic Postcode Assignment


NG5 6xx Postcodes and Road line data
Postcode centroids for part of the Daybrook area of Nottingham.
Lines show roads closest to postcode centroids.
When direct evidence of a postcode for a range of addresses is not available, we need to assign postcodes from the postcode centroid. The accompanying maps show how postcodes were assigned for an area of suburban Nottingham. In most cases the appropriate street is very easy to identify. If assignment is carried out at the property level it is also possible to check for errors by computing the OSM postcode centroid and comparing it with the one from ONS or CodePoint Open.

Postcode Validation (Daybrook area)
Comparison of calculated postcode centroids from OSM (red) from original postcode centroids (blue).
Postcodes were deduced from open postcode centroid data and assigned to
individual properties as these were mapped in OSM.
Postcodes tend to have been assigned the same way over quite large areas: so once a few have been established from two or more sources (survey, open data etc) it is often possible to safely infer most others. Even if aerial imagery data are not available, the Ordnance Survey StreetView tiles show most properties which is usually the key to sensible allocation of postcodes.

Setting some targets

I suggest four targets for the medium term (say through the end of 2014).

OSM_Postcodes_by_Postsector_completeness_London
London Postcode Sectors with all percentage regular postcodes mapped on OpenStreetMap.
See image for copyright notices.

  1. The simplest, but not necessarily the easiest target, is to map at least one postcode in each postcode sector. This is harder than it appears because obvious things to map in sparsely populated rural areas may require surveys. For instance FHRS data has two B&Bs in Port Wemyss on Islay, but the names are not shown on the OS Open Data StreetView. Similarly a degree of caution must be exercised on farms in the Rhinns of Isaly and on the Oa because individual farmsteads may include two or three properties (perhaps all owned by the same extended family, but nonetheless distinct.

  2. Achieve 5% completion. This reflects a DOUBLING of current postcode data, and therefore must be regarded as ambitious. This is however, the minimum condition for breaking the back of the postcode problem. I believe with a concerted effort we could achieve this in 3 months, using conventional crowd-sourcing techniques.

  3. Achieve 10% completion. A second doubling will probably require more tool based support. The obvious targets are semi-automated matching of FHRS & Land Registry data, and semi-automated identification of single postcode streets.

  4. Postcodes along major roads (A & B roads). These may require some survey work, but again because many retail outlets are along such roads there is already a decent amount of information available from FHRS.
One last caveat. When I first drafted this I called it UK postcodes, but I realised most, if not all, available material is much harder to use in Northern Ireland than in the rest of the UK. In principle everything can be applied to Northern Ireland too, but the relative absence of open data from Stormont and local authorities makes everything less straightforward. (Incidentally the FHRS data from Northern Ireland is the one with most unusual characters).

7 comments:

  1. Interesting post. Coincidentally, yesterday I tried using the Food Hygiene data to add addresses to pubs in south Nottinghamshire (you, of course, have already done this for most pubs in the city itself). It is certainly a good way of adding addresses and postcodes quickly. I think I added around 100 addresses (although I was left with a list of another 20 pubs that I couldn't find in the Food Hygiene data and therefore might have closed).

    In general, while address mapping, I am now making constant use of the open data sources available. Part of this is adding as many accurate postcodes as possible. For NG9, making full use of all open sources I can find, I have been able to add postcodes to about 99% of addresses. This is while taking a cautious approach and only adding a postcode when I'm reasonably confident it is correct.

    I find digging through the different open sources quite time consuming and, mostly for my own use, have created this page. It just roughly combines all the Nottingham sources into one searchable list - I find it useful for quickly making best use of all the information available.

    Regarding your postcode completeness list, the percentage of NG9 postcodes mapped is interesting. I would expect all the postcode sectors to be around 100%, but NG9 1 is 91.6% and NG9 2 is just 77.5%. A handful of postcodes in the ONS list aren't in OSM because all the addresses associated with them have recently been demolished, but that doesn't account for the shortfall. NG9 has three points where a considerable number of postcodes are co-located and my guess is that at least one of these groups is being counted: two of these points are close together at the Nottingham Mail Centre (main sorting office for all of the NG postcode area) and the third is at Beeston Town Hall. All the other NG9 sectors are over 100%, which makes sense due to the postcodes created since the ONS list was created, plus a few inevitable errors either in the source data or by me in inputting it.

    ReplyDelete
  2. More fantastic work, well done!

    Quick question: when you say a postcode is "complete", do you mean that it has been added somewhere?

    I have tried to add postcodes using Matt William's land registry data tool, adding them to every property with the postcode. It's sometimes easy to work out where one begins and another ends, so you can get the complete set. From this you can work out a centroid, and be sure that if you pop a postcode into the search you get the right place. But if I were to just add one postcode to one pub at the end of the postcode area, that's less useful.

    Second quick question: when you suggest adding postcodes for short streets, do you mean to the properties or to the street way itself?

    ReplyDelete
  3. Thanks Tom.

    Yes I have a very crude measure of completeness : at least one OSM object with a postcode assigned. I should cross-check that this are correct (e.g., by comparing a computed centroid to the ONS/Code Point open one). The process of moving from one postcode done to all delivery points in a postcode mapped may be simple or highly complex: but in most cases it wont affect things like use on SatNavs.

    I really want to use postcodes as a series of stepping stones to addresses: with nearly 30 million addresses they are a lot harder to break up into smaller tasks which can be visualised at the national level.

    For short streets I just add the postcode to the street (unless I also have house numbers). A couple of examples from the last couple of days: http://www.openstreetmap.org/way/4237773 and http://www.openstreetmap.org/way/8030929

    ReplyDelete
  4. How is the "centroid" calculated? I would have thought a true centroid wouldn't necessarily fall inside a building's outline.

    ReplyDelete
    Replies
    1. see http://digimap.edina.ac.uk/webhelp/os/data_files/os_manuals/codepoint_v2.1.pdf where it says "The point is given the ADDRESS-POINT coordinates of the nearest delivery point to the calculated mean position of the delivery points in the unit."

      Delete
  5. Does your statement that "the Office of National Statistics have released an (identical) data set which is not encumbered by the license of CodePoint Open." need to be updated in the light of the update at the end of this post: http://mapgubbins.tumblr.com/post/69079667760/the-ons-postcode-directory-open-data-but-which ?

    ReplyDelete
  6. I don't know is the short answer.

    Clearly it was not part of discussions which the DWL WG, specifically Mike Collinson, had with the Ordnance Survey. If it is genuinely released under OSGB OGL then its OK (notwithstanding your own personal opinions on the matter). However, as the ONS didn't get the metadata right first time round I'm not convinced that it wont change again. I closely follow Owen Boswara's blog because it is by far and away the most rational source of information on open government data.

    However, I think it would be useful to try and create a postcode dataset which avoids use of postcode centroids entirely. The Land Registry Prices Paid is entirely suitable and when combined with OSM named roads can provide decent approximations which will meet most needs.

    ReplyDelete

Sorry, as Google seem unable to filter obvious spam I now have to moderate comments. Please be patient.