Wednesday 25 July 2018

Coda on shop completion rates on OSM

Thanks to John Baker (Rovastar) for a few suggestions discussing my recent blog post in the pub last night:

  • What do the graphs of numbers of unique shop tags look like with heavier filtering of relatively poorly used tags.
  • E-cigarette shops are a recent phenomenon, and should represent a genuinely novel tag rather than the mix of typos, synonyms etc which characterise much of the long tail of shop tags.

These were easy to follow up, so I present the graphs here:

Unique shop tags over time on OpenStreetMap for Great Britain,
filtered to remove tags with a restricted number of uses as at June 2017.
For virtually any level of filtering the curves level out around 2010-2011. Thus the core set of shop tags looks to be very stable. A good place to judge the extent of likely synonymy for shops in Britain is the LUA script used by SomeoneElse for his "Useful Maps".

Growth of mapped e-cigarette shops in GB on OSM
As expected e-cigarette shops first appeared rather late, at the end of 2013, and there are a decent number mapped (over 200 by mid 2017). I haven't checked, but I suspect the sharp increase in 2017 was caused by some tagging rationalisation. It's not unusal for new things to acquire a range of synonyms before tagging stabilises and one value becomes favoured. (It's equally true that in some cases this does not happen).

I've had a couple of other requests which it will take rather longer to look at, but if you have ideas relating to shops in Great Britain I can look at the data right now.

Tuesday 24 July 2018

Can we identify 'completeness' of OpenStreetMap features from the data?

At the Milan SotM conference Stefan Keller from the Geometalab at HSR (Rapperswil) will talk about recent work of his group on identifying "Areas of Interest" (AoI) from OpenStreetMap data. Stefan has been kind enough to involve me in some discussions about this work as it has progressed, but in this post I am solely concerned with a separate issue arising from the use of points of interest in this work.

Growth of shops mapped on OSM for selected Local Authorities
(See Analysis section below for commentary)

Areas of Interest were introduced on Google Maps back in 2016. Loosely they correspond to shopping, entertainment and cultural areas with large clusters of relevant points of interest. No doubt Google not only used map features, but also other sources of data such as location of Android phones to calculate the footprints for Areas of Interest (shown in a pale orange or salmon colour on Google Maps).

There are issues with the Google implementation, some discussed in this CityLab article from 2016. My own examination of Google Maps confirms that shopping areas which are otherwise equivalent in range and type of shops are chosen as AoI in wealthy areas, but not in poorer areas dominated by social housing. I also found some places, notably the UBS IT centre in Altstetten, Zurich, which have erroneously been identified as AoI by Google. The work of Geometalab is therefore interesting not just in terms of whether OSM data can be used to calculate similar areas, but also to provide suitable data where biases based on socioeconomic status can, at least, be identified and corrected because data and code are open.

Zurich, centre and Aussersihl districts, showing Areas of Interest.
Work of Geometalab, derived from OpenStreetMap data.
The starting point for this type of work relies on areas where POI mapping density is high and reasonably complete (for instance, the areas of Switzerland which Stefan's group have looked at, and areas of the English East Midlands and Germany which I have looked at both recently, and in the past). Given that it is possible to calculate reasonable AoIs from OSM data where PoI density is high, the question arises "Can we identify which areas are 'reasonably' complete?". Normally, this type of work has involved comparing OSM data to some external reference data which are assumed for the purposes of comparison to be complete (for instance Peter Reed's work on UK retail). However, in many parts of the world, and for many topic domains there is no readily usable data for this purpose. So the ancillary clause for the question is ", and we do this with OSM data alone?"

This post is a first look at the problem for one class of POIs:  shops.