Showing posts with label Open Data. Show all posts
Showing posts with label Open Data. Show all posts

Sunday, 15 January 2023

A little surprise hidden in OS Terrain50 open elevation data

The reporting and discussion of a mountain rescue on the the minor Lakeland peak of Barf continues to generate a mass of things to investigate and follow-up. I've already written two OSM diary entries on assigning slope angle to paths because of it: here and here.

To recap, recently the Keswick Mountain Rescue were called out to help 3 people who were stuck ("cragfast") on steep ground on Barf. They didn't feel confident to retrace their steps back down very steep (30 degrees or over) scree and had reached a small crag with no obvious visible route through it. They had been using one of the mobile phone outdoor hiking apps which uses OpenStreetMap data for suggesting walks.

The event was reported in the national press (The Guardian) and, of most relevance, by a specialist magazine, The Great Outdoors. The latter did a great job of thoroughly researching a news article, getting comments from the OpenStreetMap Foundation. Their specialist advisor Alex Roddie decided to publish the content of his initial response to Carey Davies, the author of the article. Both articles are well worth reading for an in-depth initial response to the situation and the issues arising. There have also been lively exchanges both on Twitter and Mastodon, again containing useful comments from people with a lot of experience of the outdoor scene in the UK.

I won't comment here on the whys and wherefores of what kinds of paths and hiking routes should get mapped on OSM. The subject is complicated enough in Great Britain as all this discussion shows. One of the side distractions this did prompt for me was looking at how we could show contour data for OSM in the British Isles (and specifically on Andy Townsend's rural hiking map).

A toot by Nigel Parish did highlight that typically maps based on OpenStreetMap have contours which are greatly smoothed compared with the actual terrain. His example was from Andy Allan's Thunderforest Outdoor style which is incorporated into a number of hiking apps. Like most regular OSM contributors, I'm used to OSM-based maps using SRTM data to create contours: both because it's free and because it has more-or-less worldwide coverage. However, we are also aware of the deficiencies of this data: it has a relatively low resolution of around 1 second of arc (so roughly 30 metres resolution) which smooths contour lines and hillshade overlays.

I had recently downloaded the Ordnance Survey's own open data elevation model Terrain50, which I used for the first of my diary entries. This has a grid interval of 50 m, so although undoubtedly more accurate than SRTM, it is of somewhat lower resolution. I therefore expected contours generated from the DTM to be comparable with those of SRTM.

I'd forgotten that I'd used the vector dataset, so my first step was to generate contours from the raster tiles provided by Terrain50.

 

Barf with contours derived from the raster DTM of Terrain50.

After I'd done this, I remembered that the Geopackage was easy to use, so I added that. To my surprise the contours had rather more detail than those derived from ostensibly the same dataset.

With contours from Terrain50 vector data

Lastly, I created 10 m contours from the 1m Lidar data available from the Environment Agency for NW22NW. This, as expected shown both high resolution and accuracy.


With 10m contours derived from Environment Agency 1m Lidar.

To compare the three types of 10 m contours I changed colours and their background. Here are all three at a scale of 1:1000 on the steep SW slope of Barf between the Bishop of Barf and the crag which balks some walkers. In the image below it can be seen that the vector Terrain50 data is of much higher resolution than the raster data from the same source. In most places vector Terrain50 data is much closer in alignment with the contours derived from 1m Lidar data. As Nigel pointed out the more detailed contours give a significantly different impression of the shape of the hillside.

All three types of contours shown together. Red derived from OS Terrain50 raster; Orange: OS Terrain50 vector; and Green derived from Environment Agency 1m Lidar DTM. Scale 1:1000.

Nigel also pointed out that lots of relevant terrain features, shown on printed maps (Harvey & OS), and available as online tiles, were missing. This type of detail is often difficult to generate automatically using software, and even then is unlikely to match the output of skilled draughtsman working to detailed composition rules. I tried adding a hillshade layer from EA data, but it didn't really pick out this detail. Then I remembered that Luke Smith of Grough had used a layer provided with OS VectorMap District called "Ornament" when he prototyped maps based on open data. I therefore added this layer. It works quite well at low zooms, but is rather ugly at anything higher than 1:5000.

Added hillshade from Environment Agency data and Ornament from OS VectorMap District open data.
Hillshade is less effective than I expected at high zooms, and the ornaments only really work in a range 1:5k (as here) to 1;25k or thereabouts.

Lastly, and more for fun than anything else, I regenerated contours at 15 m intervals and clipped these by scree layers mapped on OpenStreetMap and changed the colour of the contours to a shade of grey to replicate how this information is shown on the Harvey Map. The end result is a good illustration of these differences: the Harvey Map generalises features such as cliffs and places them to enhance the map users' impression of the terrain, whereas the OS ornament looks nice, but is nothing like as good a guide for the walker.

With 15 m contours (from EA 1m DTM) and with contours on scree coloured grey rather than brown (as done by Harvey Maps), plus OS VMD ornament.



The bottom line from this is quite simple: don't bother with Environment Agency Lidar data if you just want contours: Terrain50 vector data is much easier to get up and running. The data is also packaged as mbtiles, but I couldn't find a way to style this in QGIS.

Lastly a big thank you to Nigel Parrish who caused to to look at the data, and thus discover the difference between the vector & raster versions.


Monday, 24 October 2016

Using Open Data for Statistical Purposes

A tweet by Owen Boswarva drew my attention to a recent report by Public Health England (PHE) on the correlation of density of fast food outlets and deprivation.

Number of Fast Food outlets normalised to 100,000 population for Local Authorities in England
Source: Food Hygiene Rating Scheme (Takeaway class)
Specifically my interest was directed at the source of fast food outlet counts. PHE used data from PointX, a joint venture of Landmark Information and the Ordnance Survey. I instantlly wondered if one could do the same thing with Food Hygiene Ratings (FHRS) open data. This is a quick report on doing exactly that.

I already had a complete set of FHRS data for September 2016. I needed to download various administrative and census geographies, population figures for Lower Layer Super Output Areas (LSOAs), Index of Multiple Deprivation (IMD) Scores for LSOAs and various files showing the linkages between the geographies.

A certain amount of data wrangling was needed to merge this data (for instance linkages, population and IMD) all came in spreadsheets with awkward column names, multiple sheets and other minor inconveniences. Once these were sorted out I had a table with base figures at LSOA level which could be readily aggregated to Middle Layer Super Output Areas (MSOAs) and local authorities. The IMD score is rebased by summing LSOA scores multiplied by population and then dividing by total population.

Using R I constructed simple scatter plots with a regression line and 95% confidence limits for both MSOA and Local authorities.

Number of Fast Food outlets (normalised) vs calculated
Index of Multiple Deprivation for Middle Super Output Areas

Number of Fast Food outlets (normalised) vs calculated
Index of Multiple Deprivation for Local Authorities
(outlier of City of London excluded)

For comparison the relevant plot from the PHE report is shown below:

Scatter plot from PHE report for Local Authorities

The final comparison I made was perhaps one I should have done at the outset. Comparing raw counts of fast food outlets from the Open Data source (FHRS) and the PointX data. PHE provided a table of counts at ward level. It took me a while to find a shape file and codes which fitted (the codes change year-on-year), but then it was easy to do a Point-in-Polygon count of the FHRS data for a direct comparison. The correlation of values was plotted in R again.

Comparison of number of Fast Food outlets by 2015 ward boundaries
derived from Food Hygiene Data or from Landmark/Ordnance Survey

Doing this took longer than I hoped: but almost entirely because I don't know my way around the various formats of boundary data related to the census and more changeable boundaries such as the wards.

I haven't done a formal comparison of the outputs, but the visuals presented above strongly suggest that FHRS data is just as useful as the PointX data for this purpose. The main explanation for the lower count coming from FHRS is that the PointX data includes outlets which do food delivery which may include places classified as Restaurants in FHRS.

I had expected more issues with FHRS because there is clearly an under-reporting issue in inner city areas due to rapid turnover of management of takeaways (see the recent Guardian article for an in-depth appreciation of this issue). The other week at the London OpenStreetMap pub meeting in Islington I insisted that we should check the 'scores-on-the-doors' before choosing where to eat our Burritos (a habit I've learnt from Dr Sian Thomas). The three fast food outlets next to the pub didn't feature at all on the FHRS data.

In conclusion: now that FHRS data covers nearly every major authority in the country (Rutland were the last still hold out) it is entirely suitable for a range of statistical purposes.

Friday, 1 July 2016

How far are Hedgehogs from a road?

My last hedgehog siting (2010)2887a
My last hedgehog sighting in Britain: Elston, Nottinghamshire 2010.


One of my great joys with OpenStreetMap (and other (mainly) geographical Open Data) is that it provides a way into answering intriguing analytical questions.

A few weeks ago the query was from a Hedgehog ecologist: naturally I learnt of the query through OSM (via IRC to be precise).

The question was very simple:  

What proportion of Britain's land area is more than 100 m from a road?  

The reason it is germane for hedgehogs is that historically they have had a very high mortality from crossing roads. These days they are so rare, that spotting a squashed hedgehog is itself a rarity. Certainly this cartoon would not have the same resonance it did when it first appeared in the 1970s.

To answer the query is fairly straightforward: providing one has either a GIS tool or database to hand AND a full data set of British roads. QGIS and PostGIS were available & I also have a full set of OSM data for May 2015 in the latter.


Monday, 18 January 2016

UK Open Data and Buildings in OpenStreetMap

I've finally (after 8 months) got around to looking at the OpenMap Local buildings. This new dataset was launched at the first OpenDataCamp, and I've had the SU 100 kilometre square data on the PC since then (it's contains Southampton, where Ordnance Survey are based). I use Meridian 2 OS Open Data regularly and extensively, but these days don't make much use of the larger scale vector data.

Nottingham City Centre: OSM/OSGB Building Comparison
Comparison of Building polygons for Central Nottingham
OSM has more detail and does not merge discrete buildings.
Contains Ordnance Survey data (c) copyright and database right 2015, OSM data (c) OpenStreetMap contributors 2015, Lidar data from Environemnt Agency under OGL 3.0, (c) Crown Copyright and database right 2015. Image CC-BY-SA, the author.
I needed them for something else which caused me to download the SK data. Co-incidentally Christian Ledermann had asked on talk-gb about using this data to add buildings to OpenStreetMap for Newark-on-Trent. A little earlier the Environment Agency had released Lidar data for England, and this is also useful as input for mapping buildings.

OpenMap Local

Apart from the area I originally needed which were in SK41 (no buildings in OSM), I've also looked at areas which I know much better & compared some selected areas where we have good building coverage around Nottingham. The comparisons I made are shown visually, with my main observations summarised at the end. Note that comparisons have not been made on any systematic basis.

uon_univertiy_park
University Park, University of Nottingham
an area of predominantly large academic buildings.
OpenStreetMap and OS OpenMap are largely in agreement: the minor differences applying to newer buildings which post-date the Bing imagery.
Contains Ordnance Survey data (c) copyright and database right 2015, OSM data (c) OpenStreetMap contributors 2015, Lidar data from Environemnt Agency under OGL 3.0, (c) Crown Copyright and database right 2015. Aerial Imagery via Bing, (c) as in image. Image CC-BY-SA, the author.

uon_science_city_buildings
The Science City part of the University Park campus.
A new large lecture theatre block is not present in OpenMap data, and the outline of the building top centre (Tower Building) is over-simplified.
Contains Ordnance Survey data (c) copyright and database right 2015, OSM data (c) OpenStreetMap contributors 2015, Lidar data from Environemnt Agency under OGL 3.0, (c) Crown Copyright and database right 2015. Image CC-BY-SA, the author.


newark_buildings2
Central Newark. OpenMap vs Lidar.
Many instances of building merging & over-simplification are apparent here, notably with the outline of the parish church.
Contains Ordnance Survey data (c) copyright and database right 2015, Lidar data from Environemnt Agency under OGL 3.0, (c) Crown Copyright and database right 2015. Image CC-BY-SA, the author.

newark_buildings1
Newark-on-Trent, residential areas, Showing inconistency in size for similar houses, and merging of terraced housing.
Contains Ordnance Survey data (c) copyright and database right 2015, OSM data (c) OpenStreetMap contributors 2015, Lidar data from Environemnt Agency under OGL 3.0, (c) Crown Copyright and database right 2015. Image CC-BY-SA, the author.
I have not made systematic comparisons, but these are my main observations (in brackets the 1km grid square where I've noted any particular issue):
  • Best for larger buildings. The data seem much more reliable (actually matching building footprints fairly well) for larger buildings. Even for large detached houses I would regard the data as unreliable: on our road of 40 detached houses, at least 16 are represented as terraces (SK5439). Similar artefacts occur in other areas with detached houses: apparently caused when a garage is close to both houses. Smaller houses are inherently simplified: no better than drawing one in JOSM and then copying the outline in fact.
  • Building fusion. This is particularly clearly seen in the city centre image, where a whole block of buildings has been simplified to a single building (centre of image), but also occurs in suburban housing (see above).
  • Inconsistency in geometry simplification. This is most noticeable in the city centre. (SK5739). For instance compare the OSM and the OpenMap Local outlines for St Peter's Church (bottom right in map above). In OpenMap Local the church is just shown as a rectangle, whereas in practice it is more complex. Modern buildings on the Jubilee Campus of Nottingham University are generally shown with more detail.
  • Inconsistency in building size. In SK5439 there are a very large number of houses which were identical when built. However, in the OpenMap Local they are often of different sizes. (This is also probably true of OSM, if buildings have not been created by duplication).
  • Voids. Gaps between closely packed buildings in the city centre appear slightly arbitrary in both placing and whether such a void exists or not.
  • Some selection inconsistency with small size buildings. Only 2 garages are shown in an area of around 500 houses. With OSM the figure is nearer 200+. (SK5439)
  • Demolished buildings. Whilst I would not expect the data to show the building demolished in the past month, I would expect it to not show one demolished 2 years ago, and I would certainly expect it not to show one demolished in 1970 (although MasterMap shows this too). (SK5439)
  • Better locational accuracy. If using the full transform it may be useful to take advantage of the better locational accuracy of this data. In the main OSM buildings are rarely more than 3 m displaced from the OS OpenMap Local. (SK5439) In general the more recently mapped buildings in Nottingham city centre have better locational accuracy than this (SK5739).
Taken together, my use of this directly within OSM would be along the following lines :
  • Selective transfer of larger buildings (schools, offices, public buildings, factories, warehouses, larger shops) on a case-by-case basis from a shapefile to a JOSM editing layer, or to Potlatch 2. Some minor refinement will probably be needed (for instance a university building here has long narrow courtyards which act as light wells which are not shown in OpenMap Local.
  • Only use it for houses and similar when shapes are very simple and everything has been double checked, at the very least, against aerial imagery. For simple shapes it's as quick to draw & copy in JOSM anyway. A similar principle holds for more complex building shapes on modern estates, where one building can be cloned.
  • Watch out for demolished buildings. This requires not just checking against Bing/MapBox imagery, but some local knowledge for sense checking.

Environment Agency Lidar Data

Another source of building data is the recently released Environment Agency Lidar data. This does not cover the whole country, and in many places may only be at 1 or 2 m resolution. It may also be quite old. However, because it does not suffer from parallax artefacts it can be used in conjunction with both aerial imagery (whether from Bing, MapBox or more local sources) and OS OpenData. I have provided examples from Nottingham, Newark, and Melton Mowbray of this data, combined with one or more of OSM buildings data, OS OpenMap or Bing aerial imagery.

Melton Mowbray. EA Lidar DSM (1m) overlaid on OSM.
The Lidar data was used to refine the OSM building outlines
which originally were traced from OS StreetView as block-sized polygons.
(see commentary)
Melton Mowbray illustrates many of the benefits of Lidar data. It is a fairly typical country town, with many of the buildings in the town centre ranging in age from 10 to 500 years old. Many extend back from the street in a series of outbuildings (e.g., stables) which have eventually been incorporated into the main building, but this process leaves lots of small courtyards, service yards, etc which are more or less impossible to discern on aerial imagery.
Butter Cross on Market Place, Melton Mowbray
Butter Cross in Market Place, Melton Mowbray
Despite the different styles & ages of the buildings, several have long ranges at the rear.
By doing a street-level ground survey one can identify which buildings are distinct on the street front. Lidar than helps to construct a building outline which is consistent with this. I surveyed the cetre of Melton in September, and this was the first place where I used Lidar data to aid in the interpretation of aerial imagery. In this case I find it essential to have adequate street level pictures to be able to relate to the aerial imagery: most useful are the presence and distribution of chimneys: because they throw shadows they are often visible even on poor quality imagery.

The Lidar data also allows one to do some other things: notably find building heights. I've done this for a 1980's estate on the edge of Maidenhead: particularly easy as the residential buildings fall into a small number of categories: bungalows, two-storey-houses & maisonettes (purpose built flats in a house-like structure.

A 1980s housing estate with building heights mapped from English Environment Agency LIDAR Open Data. Buildings fall into 3 height categories: bungalows (green: approx 4m high), 2-storey houses of various kinds (blue: approx 6 m high), and maisonettes (condominiums) which are about 7 m high (red). Heights were calculated in m, so the values represent minimum heights of the highest part of the building, which is nearly always the gable line.
Outpur via Overpass Turbo, styled with MapCSS.
There are many other useful blog posts about using this Lidar data, both specifically for OSM, but also generally. See posts by Chris Hill ("More Lidar Goodness" and "Building Heights") and Ed Loach for some of the specifics, and the write-up on the wiki. A nice post and map (v. slow in my browser) showing building heights in London on OpenMap Local may also be of interest. HousePrices has processed all the Lidar data from EA and Natural Resources Wales  as a hillshaded slippy map which is useful to look at what is available. Slightly unfortunately the map is in OSGB projection (ESPG:27700) and is not shown with other slippy maps which would make it a bit easier to locate oneself.

What kind of building data should be added to OSM?

From past experience single building outlines traced from OS StreetView, turn out to represent tens of buildings on the ground. Such simplified outlines just makes the work of splitting the buildings properly quite a lot harder. This can be particularly bad in town/city centres.

Usually if adding detail of POIs and addresses it is important to have individual buildings mapped: this makes it much easier to correlate photos to roofline features such as chimneys, gables etc. A single very simple outline may be OK, because for more detailed mapping it should just be a question of deleting the original outline. However, the question must be asked, as to what purpose such an outline fulfils on OSM, when the source data can be readily combined with OSM data for downstream consumption.
Granby Street, Leicester (geograph 2296099)
Granby Street, Leicester.
The multiple buildings shown here are represented in OSM as single buildings for each block
(imported from OS StreetView Open Data).
CC-BY-SA   © Copyright Malc McDonald and licensed for reuse under this Creative Commons Licence.
I think the fundamental question about straight imports into OpenStreetMap should be "Will it make life easier or harder for subsequent mappers?".

If the work involved refining a building outline takes longer than re-drawing the building then I doubt if its worth importing the building at all. This is particularly true if the outline is actually of multiple buildings. This is why large building outlines are most valuable: they are generally pretty good compared with what an initial hand-traced outline might look like, and they lend themselves better to stepwise refinement. One group of buildings I find particularly tedious to do well are schools which tend to be a sprawling mass of interconnected buildings. Starting with a decent polygon with orthogonalised angles make adding such detail much easier. The current quarterly project for UK-based mappers might be the time to test this.

Of course it may be that adding buildings assists in some other mapping goal. I've already mentioned that details of buildings are very useful for addresses. However OpenMap Local lacks the detail in precisely the areas where it would be most useful (city & town centres). For suburban or inner-city housing similar polygons can be created as quickly in OSM editors (notably in JOSM, by duplicating existing buildings or using the Terracer (or even UberTerracer) plugins.

The other thing which many people want is rendered maps largely derived from OSM, but showing more buildings. In practice, because many mappers do not have the know-how, wherewithal or time to create such a rendered view, they tend to want to import buildings. Historically, OSM tools for importing data are often much easier to use than ways to incorporate the same data and OSM data to  render maps and make them accessible on the web. Perhaps we need to do more to help people in the latter task: which is now getting more complicated again with the move to vector tiles (at least outwith use of MapBox Studio), and TileMill's effective status of being a legacy application.

Summary 

Sadly, although the new building outlines are better than what preceded them, in most cases they don't offer a decent route for iterative refinement with OpenStreetMap.

This absence of a simple way to improve building outlines means that ideally people wishing to use this data would merge it with OSM data outside of OSM. I do recognise this is often too much work, or too big a learning curve for many, and consequently there will always be a desire to add buildings to OSM because many people are much more comfortable with consuming only OSM data for their purposes.

Existing tools for drawing buildings in OSM are pretty powerful & getting more powerful all the time. Many of us, and I include myself in this group, are unaware of the full extent of these utilities. See bdiscoe's diary post about mass adjustment of circular buildings (huts) for some insights.

Wednesday, 2 December 2015

How accurately have Townlands in Northern Ireland been mapped?

From time-to-time newly released Open Data provides a nice opportunity to check OpenStreetMap for its accuracy in all its forms (see Hakaly (2008) for a breakdown of what this can mean).

Coastal Townlands, Cos. Derry & Antrim
Coastal Townlands, Counties Derry/Londonderry and Antrim.
Boundary lines see below. The deeper the colour of the area, the greater discrepancy in the area of the OSM polygon and the OSNI one. The pale base colour represents a divergence of under 2%. Townlands on the coast and on the UK/Ireland border seem to  be most likely to diverge in size. The small cluster centre right is caused by different ways of handling townlands which cross a Civil Parish boundary (OSM & the original source GSGS 3906 split these, the OSNI data does not).
We have known for a while that both the Ordnance Survey of Northern Ireland and the Ordnance Survey of Ireland were planning OpenData releases. When they came it was all in a rush. For now the hard work starts of checking license conditions for suitability for use in OSM and other places, as well as then working out what is really useful. However, because the townland boundaries of Northern Ireland are complete, it was an ideal opportunity to look at accuracy.

View along N side of MacGilligan Peninsula towards Inishown from Umbra

My reasons for doing this are not just pure interest. The usefulness of the Irish Vice County boundaries depends of their positional accuracy. Earlier my prediction was that such boundaries ought to be within 10 metres of their true location on the ground where they were based on townland boundaries, but this was largely based on experience with other OSM data rather than an objective statement. Thus investigating the accuracy using an independent data set provides an excellent way of testing this statement. The tests need to be done now, because (as we shall see) the nature of OSM is to fix issues spotted very quickly, and thus datasets become loosely coupled.

I adopted two approaches:
  1. A straight comparison of areas (or their ratios).
  2. Using a series of buffered boundaries from one source (OSNI) and seeing what proportion of the other source (OSM) was included in each buffer.
To choose which townlands to compare I followed a suggestion of Rory McCann and for each OSM townland selected the one which shared the most area in common from the OSNI data set. (I have also done it on matching names for a smaller set of data & get similar results). Note that I am comparing townland with townland, not boundary segment with boundary segment. This means that each boundary segment (other than coastal, lacustrine or riverine ones) will be included twice.

umbra_townland_cf
Buffering approach to investigating boundary accuracy.
Demonstrated with Umbra townland in County Derry/Londonderry.
This is predominantly coastal sand dunes, with a small river running along its S boundary.
Northern Ireland Townlands OSNI comparison
Northern Ireland using the same colouring.
At this scale very few boundary mismatches are apparent.
The buffering approach is based on that described by Hakaly (2008). I used buffers of 5, 10, 15 and 20 m, and then clipped the initial OSM way be each in turn.  On the scale of the whole country it is clear that most boundaries match closely. This is confirmed by checking what proportion of the boundaries fall into each buffer class: over 80% are within 5m, over 90% within 10m and nearly 95% within 20m.


Closer inspection (as with the Umbra) shows much of the discrepancy to be present along the coast. This is not surprising, coastlines on OSM were originally derived automatically, and even when refined by hand are unlikely to accord with Mean High Water (MHW). Certainly, for my purposes, it is merely important that the OSM coastlines do not stray above MHW.

NI Townlands, all boundaries within 5 m of OSNI
OSM townland Boundaries within 5m of OSNI data
The analysis described so far focusses on positional accuracy. Looking at areas highlights a range of other accuracy issues.

townlands_ni_cf9
Area comparison. Townlands are coloured according to absolute variance of ratio of areas from 1.
The redder they are the further the ratio is from 1.
Area discrepancies of over, say 5%, may be the result of any of the following:
  • Boundary discrepancy (such as coastlines). Mainly caused by coastlines, or difficulty of delineating some boundary feature, such as the course of the Umbra river above) 
  • Erroneous interpretation of the boundary on old maps causing selection of the wrong feature. This transfers land from one townland to another, therefore these should cluster. 
  • Missing townlands. When a single townland has been created without noticing one or more others inside it (Town Parks townland at Ballymoney is an example). 
  • Different treatment of townlands bisected by a Civil Parish. See caption of first image above. Incorrect tagging. 
  • Higher level administrative units having tags appropriate to a townland. I've noted two cases of this one of which was Ballyphilip CP on the Ards peninsula in County Down. 
  • Islands. Some offshore islands appear to be missing from the OSNI data (see The Skerries N of Portrush)
We've already caught a few examples in each of these classes through this analysis, and no doubt will find a few more. I have not yet investigated the very apparent discrepancy along the borders.

To conclude, townland boundaries show exactly the kind of positional accuracy we expected (or perhaps hoped). Perhaps 1% of the total data (90-100 townlands from about 9000) may need some form of correction. I'm biased, but this seems pretty good, for a project principally relying on rectified photo-reduced maps from 1939! It's also worth remembering, that unlike road comparisons, there is no widely available sensor data (ie GPS tracks/point) to help boundary alignments.

When time permits I'll extend this to include OSI Open Data too. A big thanks to both organisations for releasing their Open Data. OSNI staff have been contributors to OSM for a while: they host Missing Maps lunchtime sessions in their offices.

How accurately have Townlands in Northern Ireland been mapped?

From time-to-time newly released Open Data provides a nice opportunity to check OpenStreetMap for its accuracy in all its forms (see Hakaly (2008) for a breakdown of what this can mean).

Coastal Townlands, Cos. Derry & Antrim
Coastal Townlands, Counties Derry/Londonderry and Antrim.
Boundary lines see below. The deeper the colour of the area, the greater discrepancy in the area of the OSM polygon and the OSNI one. The pale base colour represents a divergence of under 2%. Townlands on the coast and on the UK/Ireland border seem to  be most likely to diverge in size. The small cluster centre right is caused by different ways of handling townlands which cross a Civil Parish boundary (OSM & the original source GSGS 3906 split these, the OSNI data does not).
We have known for a while that both the Ordnance Survey of Northern Ireland and the Ordnance Survey of Ireland were planning OpenData releases. When they came it was all in a rush. For now the hard work starts of checking license conditions for suitability for use in OSM and other places, as well as then working out what is really useful. However, because the townland boundaries of Northern Ireland are complete, it was an ideal opportunity to look at accuracy.

View along N side of MacGilligan Peninsula towards Inishown from Umbra

My reasons for doing this are not just pure interest. The usefulness of the Irish Vice County boundaries depends of their positional accuracy. Earlier my prediction was that such boundaries ought to be within 10 metres of their true location on the ground where they were based on townland boundaries, but this was largely based on experience with other OSM data rather than an objective statement. Thus investigating the accuracy using an independent data set provides an excellent way of testing this statement. The tests need to be done now, because (as we shall see) the nature of OSM is to fix issues spotted very quickly, and thus datasets become loosely coupled.

I adopted two approaches:
  1. A straight comparison of areas (or their ratios).
  2. Using a series of buffered boundaries from one source (OSNI) and seeing what proportion of the other source (OSM) was included in each buffer.
To choose which townlands to compare I followed a suggestion of Rory McCann and for each OSM townland selected the one which shared the most area in common from the OSNI data set. (I have also done it on matching names for a smaller set of data & get similar results). Note that I am comparing townland with townland, not boundary segment with boundary segment. This means that each boundary segment (other than coastal, lacustrine or riverine ones) will be included twice.

umbra_townland_cf
Buffering approach to investigating boundary accuracy.
Demonstrated with Umbra townland in County Derry/Londonderry.
This is predominantly coastal sand dunes, with a small river running along its S boundary.
Northern Ireland Townlands OSNI comparison
Northern Ireland using the same colouring.
At this scale very few boundary mismatches are apparent.
The buffering approach is based on that described by Hakaly (2008). I used buffers of 5, 10, 15 and 20 m, and then clipped the initial OSM way be each in turn.  On the scale of the whole country it is clear that most boundaries match closely. This is confirmed by checking what proportion of the boundaries fall into each buffer class: over 80% are within 5m, over 90% within 10m and nearly 95% within 20m.


Closer inspection (as with the Umbra) shows much of the discrepancy to be present along the coast. This is not surprising, coastlines on OSM were originally derived automatically, and even when refined by hand are unlikely to accord with Mean High Water (MHW). Certainly, for my purposes, it is merely important that the OSM coastlines do not stray above MHW.

NI Townlands, all boundaries within 5 m of OSNI
OSM townland Boundaries within 5m of OSNI data
The analysis described so far focusses on positional accuracy. Looking at areas highlights a range of other accuracy issues.

townlands_ni_cf9
Area comparison. Townlands are coloured according to absolute variance of ratio of areas from 1.
The redder they are the further the ratio is from 1.
Area discrepancies of over, say 5%, may be the result of any of the following:
  • Boundary discrepancy (such as coastlines). Mainly caused by coastlines, or difficulty of delineating some boundary feature, such as the course of the Umbra river above) 
  • Erroneous interpretation of the boundary on old maps causing selection of the wrong feature. This transfers land from one townland to another, therefore these should cluster. 
  • Missing townlands. When a single townland has been created without noticing one or more others inside it (Town Parks townland at Ballymoney is an example). 
  • Different treatment of townlands bisected by a Civil Parish. See caption of first image above. Incorrect tagging. 
  • Higher level administrative units having tags appropriate to a townland. I've noted two cases of this one of which was Ballyphilip CP on the Ards peninsula in County Down. 
  • Islands. Some offshore islands appear to be missing from the OSNI data (see The Skerries N of Portrush)
We've already caught a few examples in each of these classes through this analysis, and no doubt will find a few more. I have not yet investigated the very apparent discrepancy along the borders.

To conclude, townland boundaries show exactly the kind of positional accuracy we expected (or perhaps hoped). Perhaps 1% of the total data (90-100 townlands from about 9000) may need some form of correction. I'm biased, but this seems pretty good, for a project principally relying on rectified photo-reduced maps from 1939! It's also worth remembering, that unlike road comparisons, there is no widely available sensor data (ie GPS tracks/point) to help boundary alignments.

When time permits I'll extend this to include OSI Open Data too. A big thanks to both organisations for releasing their Open Data. OSNI staff have been contributors to OSM for a while: they host Missing Maps lunchtime sessions in their offices.

Friday, 28 February 2014

Floods of Transient Data

OpenStreetMap was not conceived as a platform for storing transient geographical data: traffic jams, road works, road closures, earthquake damage, barricades in city centres or floods. However it is used widely for holding such information for all cases except the first.

The recent flooding in Britain has led to a degree of questioning about how we should hold such data, but it has also reawakened an interest of mine in seeing if OpenStreetMap data can be used for simple modelling of hydrological systems.

First I discuss transient data and look at a different aspect of flooding, the availability of suitable data, in the second part of this post.

Flooding in the Somerset Levels mapped on OSM
using key natural=water
copyright OSM contributors, CC-BY-SA

Thursday, 16 January 2014

Lamp-posts and Botanical Records

Readers of this blog will know I have an fondness for using Local Government Open Data on the positions of street lights (see here, here, here and here). Over the last couple of weeks I've found another one.

Newly planted Quercus robur next to Derby Road Nottingham
note lamp post!

My original reason for buying a GPS was to make it easier to record wildlife records: I'm only 5 years into the minor distraction called OpenStreetMap! However, over the New Year I managed to drag myself away from addresses and postcodes and participated in the BSBI (the national botanical society) New Year's Day Plant Hunt. This is a relatively recent innovation, but has captured the attention of more and more botanists. The idea is to find as many different species of plants in flower during a 3 hour survey window.

Common Knapweed
One that got away. I missed this fine Common Knapweed Centaurea nigra on Jan 1.

Many people have been intrigued by common flowers appearing out of season. (Of course, some, such as Gorse are rarely seen out of flower ("When gorse is out of bloom, kissing is out of season"). Milder winters almost certainly due to global climate change are an important factor in allowing some plants to continue flowering. In other cases management of the land can cause plants to make new growth in the Autumn. Another possibility is that because the length of the day is the same in the Autumn as in the Spring that plants respond to some photoperiod effect. I've been noticing these things for years but never recorded them in a systematic way.

Green Alkanet
Green Alkanet Pentaglottis sempervirens
An attractive enough flower, both for bees and people, but a pernicious weed which seems to infest everywhere.

Tim Rich's very successful expedition in Cardiff on 1st January 2012 really set the ball rolling. Tim is a professional botanists, co-author of the Plant Crib, an indispensable guide to difficult plants, and author of a number of specialised to very specialised identification guides. Amazingly for someone who has been a bedrock for supporting amateur botanists across the country over the past 20 years, he is also out of work.  He's not the only one: it seems each year I get more mail bounced back or a new email address for a museum professional who has 'been let go'. (For non-naturalist readers, taxonomists are roughly equivalent to sysadmins in the botanical world: lots of things wouldn't happen without them, but they're not noticed when things are going OK).

Yarrow Achillea millefolium : 7057
Yarrow Achillea millefolium
Yarrow plants seem to thrive at this time of year, rather than just 'hanging on in'.

Another aside, people like Tim also play another non-obvious role: that of encouraging future generations of scientists. Many kids start out with an interest in natural history which can, if encouraged, lead to a careers in a whole slate of economically important areas such as medicine, and molecular biology, and not just ecology. Reducing the number of able scientists in museums who also perform substantial outreach and education activities, is eating the seed-corn for far more than just their direct specialisms. At least one British Nobel Laureate was inspired in this way.

Anyway, back two years ago I was really pleased to hear Tim on the Radio 4 Today programme describing his first plant hunt: my casual interest was validated by a similar interest by a serious professional botanist.

Red Valerian Centranthus ruber
Red Valerian Centranthus ruber, growing in Churchyard car park.
Normally a very vigorous population of these pop-up all over the churchyard in Summer.

I managed to do about an hour's surveying on January 1, 2013, but still managed to find 20 plants in flower. This year I was determined to do better. I wasn't able to join the Leicester botanists who set the bar high with 59 or so species. The weather and my health prevented me finding much before the end of the year.

However, the rain stayed off on the afternoon of New Year's Day, which meant I could visit all the places I'd already seen plants in flower, and spend more time looking for others. My main locations for searching for plants where: two areas of parkland ,both campuses of Nottingham University; grass verges along some local roads; a local churchyard; and the car parking around local shops. Many plants survive in the road gutters, or at the edge of the pavement and a wall or on little patches which only get cut from time to time.

Ribwort Plantain: 7079
Short daylight hours meant that it was getting dark towards the end of my survey
Ribwort Plantain Plantago lanceolata

My advantage is that I have used most of these places to learn how to identify plants: I thus know exactly where to find around 400 species of flowering plant within 1 km of home. Naturalists call this 'patch watching' (The Urban Birder talks a lot about the concept): it works because one builds up an intimate knowledge of one's local flora. Things which are new or unfamiliar stand out remarkably quickly in this context (in computing terms, its a very efficient vdiff). Unfortunately familiarity with a patch can also mean that one struggles to identify ordinary plants in an unfamiliar context.

unprepossessing A bit of unpreposseing highway verge: 7056
A bit of unprepossessing highway verge, a typical place worth hunting in.
At least 6 different plants in flower here, fortunately it hadn't been cut in late Autumn.
Plants: Yarrow, Groundsel, Dandelion, Chickweed, Wall Barley and Shepherd's Purse.

For my recording I used the Obsmapp Android app, which meant that I didn't have to carry much with me. The survey time was about right before the battery ran down. I also collected samples of many of the plants to double check my identifications down a microscope when I got home. (A cold dank January day with poor light is not the place to check if a plant has star-shaped hairs on the undersides of it's leaves).

One minor problem with Obsmapp is that you need to make an identification at the time (or at least use a known species as a place holder). Now like many naturalists I'm not great at identifying grasses or dandelion-like flowers. For the latter I used a place holder species and checked the identification when I got home. However there were several grass specimens which I was not at all sure about. I've never chunked grasses in such a way that I can make a good approximate identification: I have to take them home and use a key.

Red Dead-nettle Lamium purpureum : 7047
Red Dead-nettle Lamium purpureum

This meant that I could not record the geolocation of these specimens through Obsmapp. Fortunately, street lights were to hand: instead I took a photo (geolocated, but that's not too important) of the nearest lamp-post's asset identifier. I then used the Nottingham Open Data Street Lamp dataset to get the lat/lon for the lamp-post and added these records into Observado once I got home.

A tuft of False Oat Grass was under this lamp post


All in all I managed to get 40 different species (my unknown grasses all turned out to be False Oat Grass). This was a very gratifying total. Annoyingly, I walked straight past a fine example of Knapweed which I found a couple of days later, and I noticed a Gorse bush which I was unaware off only on Monday!

One other aspect of looking for flowering plants at this time of year is that often they are stunted and don't always have the same habit as in the Spring and Summer. It is therefore an excellent way to challenge and improve one's identification skills. Not least know that in Britain we have Poland and Clements' Vegetative Key for identifying the plants when not in flower. I also used Eggenberg's lovely Flora Vegetativa which has many fine illustrations which complement the Vegetative Flora. (even though this is book on Swiss plants it works very well in Britain).

The other day I noticed a newly planted oak tree (one of 10 trees planted at the behest of our local Councillor Sally Longford to replace ones which had to be felled because of disease). Although I took geo-located photos I took great care to include 2 lamp-posts to enable me to locate this new tree accurately before adding it to OpenStreetMap.

Ophrys apifera Bee Orchid
Bee Orchid at Nottingham University Jubilee Campus
(location originally noted using lighting asset numbers)

I've previously used the asset numbers on the lighting in the University to check on locations of some Orchids in the Jubilee Campus, and I've also used them as markers when trying to map the trees in a belt of woodland more accurately.

So for urban wildlife, lamp-posts can be an invaluable resource for accurately locating specimens!

Thursday, 26 December 2013

Assigning addresses from Land Registry Prices Paid data

After the disappointment of the Land Registry INSPIRE land parcels, it is nice to report a large and useful open dataset from the same source: the Prices Paid data (LRPP). These are the actual prices paid for houses and flats in England and Wales from late-1995 or thereabouts to the present.

New Residential Roads in England and Wales
Roads were identified from Land Registry Prices Paid data
and matched by name to OSM highways within 2km of postcode centroid

Thursday, 31 October 2013

Not very INSPIREd: Land Registry 'Open' Data

Many OpenStreetMap contributors have been very excited about the potential for importing address information from data sets released under the European Union's INSPIRE directive. Our experience in the UK tends to make us more cautious in our expectations, and so it proved with the latest release of Open Government Data under this programme.

Comparison of Land Registry parcels with gardens on OSM in Sutton Coldfield
Scatter plot with log-log scale.
Firstly, (another) quick word about the complexities of organisation of cadastral data in the UK. There are separate cadastral agencies for each of England and Wales, Scotland, and Northern Ireland (note the Oxford comma), which run in different ways: not least because the legal framework of Scot's Law is quite different from English Law. Thus the data released only relates to England and Wales.


Wednesday, 24 July 2013

Persistence in the Urban Environment : 2 Portland Oregon Buildings and experimental clustering

Areas of the Mutlnomah County, Oregon grouped into 5 clusters based on patterns of building age.
Each building is plotted individually using the colour assigned to its own cluster.
(data from City of Portland Open Data Initiative, background OSM)
A few months ago I wrote about how different parts of cities have radically different development patterns. I illustrated this with examples from three cities which I am familiar with, but all my interpretation was subjective. I was therefore very interested in a blog post by MapBox about the Open Data building footprints for the metro area of Portland, Oregon. Apart from the fact that this is a comprehensive data set the most interesting thing is that most (around 87%) of the buildings have a data associated with them. It therefore seemed like a perfect data set to see if one could classify areas of a city in terms of the profile of building activity.

My basic idea was to do something along the following lines:
  • Partition the buildings into contiguous equal sized groups
  • Create some kind of time-series for each group reflecting building history
  • Run a clustering algorithm against the latter data
  • See if it produced interesting results

Friday, 28 June 2013

The Shopping News: mapping retail outlets in Nottingham

Nottingham Open Data 6


ng_retail__detail20130613
Nottingham City Centre retail areas with mapped retail units
(shops, banks, pubs, restaurants, cafes, fast food outlets etc.;
the large areas are shop=mall (an unsatisfactory tag)).

ng_retail_20130613
Retail landuse and retail outlets in the city of Nottingham
(a buffer is used to accentuate smaller retail outlets)

This morning I achieved one of my targets for using Nottingham Open Data. This was 90% reconciliation of the Licensed Premises dataset (this compares with around 40% when I first blogged about it).

Tabulation of reconciliation of Nottingham Open Data Licensed Premises File vs OSM
(loaded as an image because turning my Excel stuff into an HTML table is a PITA).
It seems like a good time to take stock (pun intended) of my retail mapping within the city of Nottingham.

ncc_miss_lic_pcs_20130628
Licensed premises from Nottingham Open Data not reconciled to OpenStreetMap
(plotted as number of premises at postcode centroid) cf. with original map.
I started doing this when my mother asked me to take her to church and I realised that I could do a short productive mapping session whilst she was at the service. For the following 2-3 Sundays I mopped up as much as I could in the area close to her Church. Then at the end of April I got serious and instead of doing my shopping locally I drove to other groups of shops when I had an errand. In this way I've visited the majority of local shopping areas (with two major gaps: Mansfield and Carlton Roads).

Most mapping sessions have been just over an hour in length, mainly involve photo mapping and seem to generate a huge amount of data. With a small number of exceptions in the City Centre I haven't done repeat surveys. Apart from trying to take lots of photos I have not tried to map everything I came across, which has been my usual approach in the past. I started doing this on my first outing, and took 30 minutes before I mapped my first shop, and had only 10 minutes for other shops, so I was going to take forever doing it my old way. So I stopped worrying about grabbing everything and just tried to get shops, but did collect other information if it was convenient and readily accessible.

I distinguish between these two styles of mapping, by analogy with farming, as intensive and extensive. In one we put all our efforts into maximising yield (of crop, or OSM data) from a small number of hectares; in the other we are happy if the yield is good enough.

What I've done.

  • Added postcodes to as many as 5,000 objects. (It's a little difficult to check as I have touched objects which already had postcodes).
  • Added around 1,200 different postcodes, about 20% of the city. (Again some may have already been present.)
  • Added around 5,800 housenumbers. These are not just retail premises, but houses close to shopping areas, and when I've walked along streets I've tried to add house numbers at intersecting streets.
  • Added over 2,500 buildings.
  • Taken 7,000 photos. Of which around 6000 are now available on OpenStreetView.
  • Recorded about 13 hours of mapping audio.
  • Loaded around 200 kilometers of GPX tracks.
What I've still to do.
  • Finish adding POIs for shops (particularly in the City Centre).
  • Indoor mapping in the two main Shopping Centres (Malls) in the middle of the city.
  • Finish adding building outlines in the retail landuse polygons (I'm tending to do smaller ones first)

Things I've learnt (and why)

  • Map all shops in a group together. If a single shop changes use or closes and the row of shops has been mapped it is often impossible to reconcile which shop has been affected. It's far better to be systematic for a small area than mapping patchily. Exceptions can be made for very recognisable buildings or POIs. (This also helps check that POIs are in the correct location, see below).
  • Open Data Addresses are great. The Open Data is not accurately geo-located (only to postcode), but it does contain the address. This meant that as long as I could locate the business on Bing aerials I did not need to collect detailed address data. This made surveying less arduous. 
  • Good high-quality building outlines help. A single building outline covering a whole block is useless. A lot of Nottingham City Centre had building polygons mapped not from aerial photography but from OSGB StreetView. Firstly the building outlines were not very accurate. Secondly, it is very time consuming to divide and correct such amorphous polygons.
  • Good photos and; decent aerial photos are critical. I have taken a huge number of photos (all available on OpenStreetView) to assist this mapping. I try and get photos of the roof line as I can correlate chimneys, dormer windows and other roof-line features between the aerial photos and my street level ones. It is amazingly easy to displace a POI a few tens of metres even with all this information.
  • Android Apps aren't much use in a City Centre. I made some use of KeyPadMapper3, but found the data often needed to be tweaked because my android phone GPS location wasn't too good. In the City Centre the canyon effect even with a Garmin is too much. A further reason not to use the phone is that I was already using a camera, two GPS (one in the backpack) and a digital voice recorder, juggling these and the phone was too much. The phone did come into it's own when the batteries ran out on the dictaphone. In the end I used the voice recorder for most addresses I collected. I didn't try Vespucci.
  • History of POIs is enormously helpful. Most of the errors in the Open Data are failures to update historical data (POIs closing, changing ownership, re-branding, or moving elsewhere). In many cases Nottingham mappers have kept the historical information when updating POIs, and this means that it's mush easier to reconcile OSM with the Local Open Data.
  • It's really difficult to tell if some POIs are still open. See the associated post on Vanishing Pubs.
  • Night-time surveying is the only way to check the status of some Bars, Nightclubs and Fast Food outlets. I'm too old to be a night owl, so someone else needs to do this.
  • POIs change fast. (Well I already knew this) My re-surveying of Market Street, Mansfield Road and Upper Parliament Street/Forman Street, which were all done 2 years ago by Paul Williams enables the rate of change to be quantified.
  • A 5% error rate in local government open data seems a reasonable assumption. This is not too different from rates found with NaPTAN and Ordnance Suryey Open Data Locator. It does mean that it's far better to use this data as the basis for survey (as we have done with Locator) rather than import (as was done with NaPTAN).
  • Local Government Open Data needs significant interpretation. It is collected for discrete purposes, and there is no integration across data sets. I presume licences are granted for a number of years. Therefore there are no checks as to whether the licence is still in use, or even has ever been used, until renewal time.
  • Extensive surveying is more fun, and less exhausting, than intensive surveying. By an intensive survey I mean one intended to collect all types of mapping data in a discrete area. Extensive surveying involves covering a larger area perhaps with some specific targets, but most information is collected as a side product rather than with deliberation.
  • It was a good mapping project. A targeted set of POIs makes for a reasonable mapping project over a shortish term.
  • More Systematic Coverage. Extensive surveying means more systematic coverage of the city: even if not in great detail.

What to do next

The next steps are fairly obvious. 

  • Repeat for Food Hygiene Data. I have an additional data source from the City Council which covers POIs which serve food (anything from fast food outlets to schools and hospitals). This is about twice the size of the Licensed Premises file (2400+ cf. 1200 POIs) and at the moment I have only reconciled 70% of the data. In the main this means checking more day nurseries, care homes and similar establishments.  

  • ncc_miss_fhrs_20130628
    Premises from Nottingham Open Data Food Hygiene file
    not reconciled with OpenStreetMap (cf. with image above).


  • Change Detection. Build a mechanism for automatically detecting change in the source data. So far I have just used a snapshot of the data, but it would be very useful to find changes in the source data files and use them to drive surveys.

  • Create additional tools for Food Hygiene data. The Food Hygiene data is actually available for many parts of the UK and is Open Data. There are at least 350,000 POIs available. It is usually safe to assume that it is accurate at the postcode level, but in the nature of retail outlets several are usual present in each postcode. It would be nice to be able to create layers for mapping (e.g., in JOSM, Potlatch etc) which spread the FHRS POIs out around their postcode location preferably ordered by housenumber in the right direction. It would also be good to be able to load subsets of this data as POIs or similar into Garmin or Android devices.

  • Developing sensible categories for retail. In some of the images in this blog post I have used an ad hoc categorisation of available amenity=* and shop=* values. It would be useful to develop a more considered version of these categories.

Conclusion

The most important thing is that this project would never have started without the availability of Local Government Data. Although I could have tried to find and map retail outlets I would have missed many isolated ones, and would have had no idea how many more there were to find.

With retail data mapped systematically it becomes possible to evaluate exactly how we use tags and if there are any obvious improvements. Remember that Nottingham is the 8th largest retail centre in the United Kingdom and is therefore a reasonable exemplar for all but the largest retail centres in Europe and North America.

A consequence of trying to be systematic is that I have visited areas of the city which have had very little on-the-ground mapping. I have been able to collect other POIs, addresses, correct road alignments etc.

Lastly, this is a very productive and rewarding means of mapping. If you have any local open data on shops I recommend a bit of Retail Therapy.