Saturday, 15 September 2018

A few oddities of OpenStreetMap history files

I've been working with the OSM history data for Great Britain for a while now. This has mainly removing bugs in the initial processing (yup, out-by-one errors mainly) and, as a side effect, working out how to improve the speed of extracting way geometries. En route I have noted a few quirks which may be helpful for others working with the data.

Some of these should be obvious, but I feel that they are worth stating nonetheless.

Sunday, 9 September 2018

Where the streets have no name: Coach Road Estate, Washington

Serlby Close, Usworth. The houses on the left are part of Coach Road Estate
Source: © Alex McGregor at Geograph CC-BY-SA-2.0
Microsoft, like a number of other large tech firms, now has teams working to improve OpenStreetMap. A couple of weeks ago they turned their attention to the UK, and asked about some roads apparently missing street names.

I checked a few of these examples using available Open Data sources, primarily those of the Ordnance Survey. For the most part, our existing mapping seems correct: there is no official street name. Many were places like caravan parks and industrial estates, but one really stood out. This was an area on the north side of Washington, County Durham called Coach Road Estate. As I investigated it soon became apparent that the estate is interesting for other reasons too.

Wednesday, 25 July 2018

Coda on shop completion rates on OSM

Thanks to John Baker (Rovastar) for a few suggestions discussing my recent blog post in the pub last night:

  • What do the graphs of numbers of unique shop tags look like with heavier filtering of relatively poorly used tags.
  • E-cigarette shops are a recent phenomenon, and should represent a genuinely novel tag rather than the mix of typos, synonyms etc which characterise much of the long tail of shop tags.

These were easy to follow up, so I present the graphs here:

Unique shop tags over time on OpenStreetMap for Great Britain,
filtered to remove tags with a restricted number of uses as at June 2017.
For virtually any level of filtering the curves level out around 2010-2011. Thus the core set of shop tags looks to be very stable. A good place to judge the extent of likely synonymy for shops in Britain is the LUA script used by SomeoneElse for his "Useful Maps".

Growth of mapped e-cigarette shops in GB on OSM
As expected e-cigarette shops first appeared rather late, at the end of 2013, and there are a decent number mapped (over 200 by mid 2017). I haven't checked, but I suspect the sharp increase in 2017 was caused by some tagging rationalisation. It's not unusal for new things to acquire a range of synonyms before tagging stabilises and one value becomes favoured. (It's equally true that in some cases this does not happen).

I've had a couple of other requests which it will take rather longer to look at, but if you have ideas relating to shops in Great Britain I can look at the data right now.

Tuesday, 24 July 2018

Can we identify 'completeness' of OpenStreetMap features from the data?

At the Milan SotM conference Stefan Keller from the Geometalab at HSR (Rapperswil) will talk about recent work of his group on identifying "Areas of Interest" (AoI) from OpenStreetMap data. Stefan has been kind enough to involve me in some discussions about this work as it has progressed, but in this post I am solely concerned with a separate issue arising from the use of points of interest in this work.

Growth of shops mapped on OSM for selected Local Authorities
(See Analysis section below for commentary)

Areas of Interest were introduced on Google Maps back in 2016. Loosely they correspond to shopping, entertainment and cultural areas with large clusters of relevant points of interest. No doubt Google not only used map features, but also other sources of data such as location of Android phones to calculate the footprints for Areas of Interest (shown in a pale orange or salmon colour on Google Maps).

There are issues with the Google implementation, some discussed in this CityLab article from 2016. My own examination of Google Maps confirms that shopping areas which are otherwise equivalent in range and type of shops are chosen as AoI in wealthy areas, but not in poorer areas dominated by social housing. I also found some places, notably the UBS IT centre in Altstetten, Zurich, which have erroneously been identified as AoI by Google. The work of Geometalab is therefore interesting not just in terms of whether OSM data can be used to calculate similar areas, but also to provide suitable data where biases based on socioeconomic status can, at least, be identified and corrected because data and code are open.

Zurich, centre and Aussersihl districts, showing Areas of Interest.
Work of Geometalab, derived from OpenStreetMap data.
The starting point for this type of work relies on areas where POI mapping density is high and reasonably complete (for instance, the areas of Switzerland which Stefan's group have looked at, and areas of the English East Midlands and Germany which I have looked at both recently, and in the past). Given that it is possible to calculate reasonable AoIs from OSM data where PoI density is high, the question arises "Can we identify which areas are 'reasonably' complete?". Normally, this type of work has involved comparing OSM data to some external reference data which are assumed for the purposes of comparison to be complete (for instance Peter Reed's work on UK retail). However, in many parts of the world, and for many topic domains there is no readily usable data for this purpose. So the ancillary clause for the question is ", and we do this with OSM data alone?"

This post is a first look at the problem for one class of POIs:  shops.

Wednesday, 25 April 2018

Linear or 1D maps from OpenStreetMap

1-D map of Clumber Street
Clumber Street, a pedestrian shopping street in Nottingham

We are all familiar with 1D, or linear, maps.

Wednesday, 30 August 2017

Mapping a specific building form

Arts-and-Craft style semi-detacheed houses, Edwards Lane Estate, Nottingham

My interest in many aspects of urban environments has increased greatly since I started contributing to OpenStreetMap.

I suppose this was always there but largely latent. Wandering around familiar places to capture details to add to OSM often forces me to ask questions about the area. Why is it there? Why is it laid out in that way? Who designed the buildings? When was it built? Why are there gaps in house numbering? What was planned for the little stub street? What used to be on the land with newer houses?

Monday, 24 October 2016

Using Open Data for Statistical Purposes

A tweet by Owen Boswarva drew my attention to a recent report by Public Health England (PHE) on the correlation of density of fast food outlets and deprivation.

Number of Fast Food outlets normalised to 100,000 population for Local Authorities in England
Source: Food Hygiene Rating Scheme (Takeaway class)
Specifically my interest was directed at the source of fast food outlet counts. PHE used data from PointX, a joint venture of Landmark Information and the Ordnance Survey. I instantlly wondered if one could do the same thing with Food Hygiene Ratings (FHRS) open data. This is a quick report on doing exactly that.

I already had a complete set of FHRS data for September 2016. I needed to download various administrative and census geographies, population figures for Lower Layer Super Output Areas (LSOAs), Index of Multiple Deprivation (IMD) Scores for LSOAs and various files showing the linkages between the geographies.

A certain amount of data wrangling was needed to merge this data (for instance linkages, population and IMD) all came in spreadsheets with awkward column names, multiple sheets and other minor inconveniences. Once these were sorted out I had a table with base figures at LSOA level which could be readily aggregated to Middle Layer Super Output Areas (MSOAs) and local authorities. The IMD score is rebased by summing LSOA scores multiplied by population and then dividing by total population.

Using R I constructed simple scatter plots with a regression line and 95% confidence limits for both MSOA and Local authorities.

Number of Fast Food outlets (normalised) vs calculated
Index of Multiple Deprivation for Middle Super Output Areas

Number of Fast Food outlets (normalised) vs calculated
Index of Multiple Deprivation for Local Authorities
(outlier of City of London excluded)

For comparison the relevant plot from the PHE report is shown below:

Scatter plot from PHE report for Local Authorities

The final comparison I made was perhaps one I should have done at the outset. Comparing raw counts of fast food outlets from the Open Data source (FHRS) and the PointX data. PHE provided a table of counts at ward level. It took me a while to find a shape file and codes which fitted (the codes change year-on-year), but then it was easy to do a Point-in-Polygon count of the FHRS data for a direct comparison. The correlation of values was plotted in R again.

Comparison of number of Fast Food outlets by 2015 ward boundaries
derived from Food Hygiene Data or from Landmark/Ordnance Survey

Doing this took longer than I hoped: but almost entirely because I don't know my way around the various formats of boundary data related to the census and more changeable boundaries such as the wards.

I haven't done a formal comparison of the outputs, but the visuals presented above strongly suggest that FHRS data is just as useful as the PointX data for this purpose. The main explanation for the lower count coming from FHRS is that the PointX data includes outlets which do food delivery which may include places classified as Restaurants in FHRS.

I had expected more issues with FHRS because there is clearly an under-reporting issue in inner city areas due to rapid turnover of management of takeaways (see the recent Guardian article for an in-depth appreciation of this issue). The other week at the London OpenStreetMap pub meeting in Islington I insisted that we should check the 'scores-on-the-doors' before choosing where to eat our Burritos (a habit I've learnt from Dr Sian Thomas). The three fast food outlets next to the pub didn't feature at all on the FHRS data.

In conclusion: now that FHRS data covers nearly every major authority in the country (Rutland were the last still hold out) it is entirely suitable for a range of statistical purposes.