Thursday, 22 August 2013

PfAFing About : opening UK address data


IMG_8283a
The Postcode Address File doesn't work all the time.
I recently mentioned that the Royal Mail's Postcode Address File (PAF) had elicited a lot of recent controversy. Well the controversy continues as the UK government has announced that the PAF business will be privatised along with the Royal Mail despite protests from some well-known people:

Naturally this has led to discussion on twitter as to whether one can create a fully open alternative to PAF.

The rest of this post is probably mainly of interest to mappers and OpenData advocates in the UK. To the outside world UK OpenStreetMappers seem to be oddly obsessed with postcodes. I hope some of the reasons for this may become apparent from this post.

PAF a personal history


I first came across PAF some 20 years ago when doing a mail shot to health service professionals: a  company checked the addresses, stuffed the envelopes and posted about 2000 flyers. As far as I remember this was a little firm based in a small unit on an industrial estate in Brentford. They were an example of numerous, mainly smallish, firms which acted as PAF-resellers for Royal Mail. For our purposes they were perfect: handling an activity which was a one-off for us. Addresses which had been pre-processed by PAF received a discount from the Royal Mail as they involved less likelihood of errors occurring in the sorting and delivery process.

My next encounter with PAF was at a major teaching hospital in South London. This was so long ago that Windows had not been widely adopted. They were able to place a small Terminate-and-stay resident (TSR) program which would look-up a postcode and return the address from PAF, which could then be pasted into any underlying application. This was great because it required no changes to the multitude of different applications which are (or were) run in hospitals. Again there were expected to be good financial benefits in having more consistent and checked address data captured at the point of entry.

It is usually possible to identify firms which have use PAF because they can ask for the postcode first (either on the phone or on a web-page) before any other address information. The bulk of the address is then automatically filled in from the underlying database. It is equally obvious that many (even quite large) companies do not do this. When I was regularly working with customer address data a rule of thumb was that 10-15% of addresses would not be accurate. Of course a lot of this was because addresses were out of date, but a reasonable proportion were because good quality addresses had not been captured in the first case. For a company with even a few hundred thousand actual (or potential) customers, substantial savings in a number of activities (mailing of statements, direct mail advertising, staff time, customer support desk, etc.) could often be achieved through better management of addresses.

In all these cases organisations using PAF could reduce both direct and indirect costs. Use of the software had tangible benefits. Thus when it started being licensed by the Royal Mail there were a set of mutually beneficial relationships: the Royal Mail recovered some of the costs of an essential expense; their customers could save money and improve customer service. It also created a niche for the PAF-resellers.

Twenty years on, the problems with this are much clearer. The main one is that the Royal Mail has a monopoly on address data and despite relying on Local Authorities for a substantial part of the process of creation of new addresses, expects them to pay for data they themselves have created. (They are not alone, the Ordnance Survey has done the same). Equally for private firms, as there is no market it is impossible to set a sensible price for address data when there is only one supplier. In this context statements that the PAF business is worth from £500-900 million should tax peoples credulity : on the other hand plenty of folk bought into the absurd valuations of the 2000 internet boom, and several booms after that. Governments are particularly bad at valuing their businesses, either selling them off for a song, or hopelessly over-valuing them.

Address mapping in OpenStreetMap

This account is my own take on adding address data to OpenStreetMap. Whilst this post was being drafted, a couple of excellent entries in the OSM User diaries were made by Ed Loach and Will Phillips. Both belong to the small coterie of OSM mappers who have mapped addresses in a large contiguous area. Will's comments I find a particularly useful, because in a dataset of 30,000 one starts coming across the edge cases which really have to work when working at a national level (if one encounters these in 0.1% sample, then it's a good indication that they need to be handled).

From addresses on OSM towards OpenPAF

Most OSM addresses are tagged using the Karlsruhe scheme. This is very useful and quite simple. In most cases all that is necessary is to tag a street name, a house number and a postcode.  Usually the place (post town) can be deduced. Will's comments discuss how this schema has been extended to handle blocks of flats. It would be interesting to know how many addresses mapped in this form correspond accurately to the equivalent PAF rows.

I would expect something like 90-95% of addresses to fit fairly comfortably within the simple version of the Karlsruhe Scheme. Matt Williams has developed a tool to show OSM addresses by postcode, the OSM Postcode Finder. Data such as those created for NG9 and Tendring ought to allow this proposition to be assessed. If we are likely to achieve even 75% correspondence with PAF for areas where addresses have been completely mapped then I think OSM is a good platform for trying to build an OpenPAF.

In the shorter term using OSM to provide useful post-code information is more achievable. Many bodies use postcode derived data for a whole host of purposes, and although centroids are available as Open Data these are not linked to addresses. As most postcodes apply to a single street, just identifying the street to which a postcode applies can be useful. This is one reason why I have been using the Nottingham Licensed Premises and Food Hygiene Open Data: it provides a means to identify over 20% of all postcodes in the city (approx 1200 from 6000).

ncc_road_length
Streets which are likely to have a single postcode for all addresses.
Identified by total length of OpenStreetMap ways highway=*.
Using OSM data to identify short streets which will usually only have one postcode is another way in which the number of un-mapped postcodes could be reduced quite quickly. The postcode assigned to the street can just be added to the way marking the street pending detailed collection of house numbers.

Outstanding Issues

We know that there are a number of problems, Starting at the bottom end of the scale:
  • Flats with complex addresses. My example are flats in a former Tannery local to me. The building is called Leen Court, and it is on the street Leen Gate. The flats are organised around staircases or entrances which have slightly twee names, such as, "The Garland". Flats are numbered for each entrance, but numbers are not unique for the whole complex. Technically the postcode identifies the complex so that the street name is a redundant element in the address, but I suspect that Royal Mail treat Leen Court as a 'dependent street', with the staircases been treated as individual blocks of flats. I haven't seen the PAF data for these addresses so this is just speculation.
  • Dependent Streets. When houses on a street have non-unique house numbers it is usually because part of the street has an additional name. Often these also have a distinct post-code, but Naranjan Mews off Gedling Gove does not, so there are two houses with the number 1 for postcode NG7 4DU, one on the main street, 1, Gedling Grove, and the other 1, Naranjan Mews, Gedling Grove. This phenomenon is quite common in areas where older houses have been partially replaced by new developments
  • More than one street with the same name. Several streets in Nottingham have identical names and can only be separated as addresses by either the postcode or adding additional place information before the post town. Again I presume for reasons of redundant checking I presume these additional elements exist in the PAF
  • Streets with no names. Apparently this was prevelant in Keith, where more than 50 streets lacked a name until earlier this year. Presumably each street has a separate postcode. I have no idea how they organsied postal delivery before post codes were created. In the countryside many, if not most houses, do not have a street name.
  • Post Towns appended to Place name. More recently the Royal Mail has extended the usage of the post town so that it is the main place element, and village names become dependant. So one gets Whitwell, Worksop. The post town name may well be completely different from the administrative structure (Whitwell is in Derbyshire), West Brdigford, Nottingham is not in Nottingham. we have no way to handle this type of relationship in OSM at present.
  • Use of 'by'. The most absurd example I can think of is Kinlochbervie, by Lairg. Kinlochbervie is actually about 80 km from Lairg. I have no idea what the 'by' means but it applies to more remote locations mainly in Scotland. Given the huge distances between some of these places there is no way we can infer this type of relationship.
  • Non-existent places. A large portion of addresses in Greater London include Middlesex in the address. Middlesex ceased to exist in 1965, and many places in the former county of Middlesex always had London addresses anyway. Obviously we do not have a locality called Middlesex in OSM. The former county name is needed in a few cases as there are duplicated place names in Greater London: notably Hayes, Middlesex and Hayes, Kent. When I worked in the former town we regularly lost visitors who had gone to the wrong Hayes (in one case to their considerable financial loss as they missed out on the project we were putting together). Wikipedia has an account of this and other discrepancies, although the county was formally removed from PAF in 2000.
There are no real conclusions to be drawn here, just some ideas about what we need to think about in the future, and a suggestion of focusing address capture at the target of at least one valid address per post code. That at least is what I am working towards for the City of Nottingham.


 



3 comments:

  1. IIRC "by" is effectively the equivalent of "Via" - it's too far away for any sane person to want to write it as part of the same post town but goes through there on the way to the destination.

    The best address to illustrate the absurdity of the POSTTOWN was one eBay package I had to post to:
    House with Welsh Name
    name of lane in Welsh
    nr Welsh Village Name (I forget the actual prefix, was in Welsh though and it was actually about 10km from there)
    HEREFORD
    Cymru
    HR3...


    Hereford being in the English county of, erm, Herefordshire...

    ReplyDelete
  2. My gran's from near Kinlochbervie. I've never travelled there by train myself, but I understand that if you do, Lairg is the nearest station.

    The ITO map of railways seems to bear this out:

    http://www.itoworld.com/map/171?lon=-4.28386&lat=58.03433&zoom=9

    I assume the incoming post mostly travels by rail. In fact, I just googled it and it appears there is (or was?) a "post bus" service, which also takes passengers, from Lairg onward in that direction.

    Is anyone actually working on the project mooted here (and in the twitter links) of patching all the open data together to create a free alternative? Seems like an interesting challenge.

    ReplyDelete
  3. "Middlesex ceased to exist in 1965". Only as an official administrative unit. In my opinion it is beyond the power of a transitory administration to abolish a historic county. Napoleon didn't successfully 'abolish' Artois, Picardy, Gascony and the other historic French provinces, and I don't think we English need take any cognisance of the absurd actions taken in the late 20th century, especially the later vandalism of Heath and Walker. Yes, we should map existing administrative divisions; I'm not arguing against that. But we shouldn't give in to the idea that our history has been destroyed.

    ReplyDelete