Saturday 28 May 2011

On the histories of OpenStreetMap data

Peter Körner (MaZderMind) has the recently made available extracts of selected areas from the OpenStreetMap full planet history. These are of various Länder of Germany, but are big enough to be interesting and small enough for repeated processing and analysis.

Handling the history of OSM edits is interesting in its own right, but it also has an immediate practical importance as some edits will not be carried forward with OSM once the OdBL licence change is completed. Tools to process and analyse the complete edit history are needed as part of this change.

My main interest is to visualise past states of the database conveniently. The image below is of Berlin from 31st March 2009. This image was generated by re-extracting data from the history using osmosis (regular snapshot schema tables were replaced by views on history tables) and re-importing with osm2pg2sql. This process is too involved and slow for rapid visualisation of historical views of the data, but does demonstrate that reasonable results can be achieved without elaborate changes to the snapshot schema.

Berlin 2009.12.31

I'm also interested in the technical aspects of handling OSM history: I have worked on temporal database schemas in the past. Some of these were more complex than OSM, as they stored not just history of the system (transaction time), but history of the real-world (valid time). This is also something which intrigues me: how one might store real-world history in something like OSM. In OSM we don't make any attempt to discover when a pub opened or when it closed, we just know that someone added the pub to the system and that later someone removed it. Although it might seem reasonable to assume that the pub closed between the two events, it might be the original edit was by someone who used the pub 20 years ago, and it closed 19 years ago.

My initial approach is simple, some might say naive. This is to basically take a schema very similar to the current API schema and make minor modifications to it. The main change is to add another timestamp column to the main tables, with the time period between the two timestamps representing the period when the given primitive (node, way, relation) was valid. This is partly a convenience for querying the data, but it also has the advantage that we can work with partial histories if some versions are missing. The main reason for doing this, though, is to use timestamps as an additional part of the compound key for way geometries. This avoids having to generate another system key to add to the identifier and version.

Way geometries are the crucial problem in making historical data convenient to render. A given version of a way may have many geometries, depending on how many of the nodes comprising the way have changed position, and, indeed, how frequently. Each time a node changes the geometry of all its parent ways may change: of course a node might just be touched (updated with no change), or might only have its tags modified. In the first instance I ignored these issues, although about 40% of all node changes in the Berlin data do not affect position.

My first step was to find for each way version all the start timestamps for the nodes belonging to that way version and where the node version validity period overlapped that of the way version. As the way version start and end timestamps must also be considered I did this using a UNION:

greatest(n.tstamp_from, w.tstamp_from) AS tstamp_from

FROM way_hist w
JOIN way_node_hist wn
ON w.way_id = wn.way_id
AND w.version = wn.version
JOIN node_hist n
ON wn.node_id = n.node_id
AND w.tstamp_from <= n.tstamp_to
AND w.tstamp_to >= n.tstamp_from
SELECT w.way_id, w.version, w.tstamp_from
FROM way_hist w
It's easy to add a valid end date column to this data using a SQL windowing function.

This gives us the total number of historical records we will need to store to access all distinct geometries of ways in the data set.

Exactly the same approach can be used to extend the way_nodes table with node version, and a validity date range based both on the way version and the node version. Indeed the range of way geometries can be derived from this data, but because we also need to slice up the validity range of an unchanged node for each way geometry it is convenient both conceptually and practically to keep these data separate.

With this data we can now start to generate all the historical way geometries. Unfortunately this is computationally expensive in PostGIS: one of the reasons Osmosis provides options for doing this before loading into the snapshot schema. I used the aggregate ST_MakeLine postgis function, which requires that the nodes be correctly sorted in the input.

With historic tables for nodes, ways, and way geometries created the next step was to create history tables which mimic the tables created by osm2pgsql for mapnik rendering. For nodes this is relatively simple, its just a big query converting relevant hstore data into columns along the lines of tags->'highway' as highway with the addition of my two timestamp validity columns. For ways it is more complex: the z_order and area columns requires population, and some ways need to be treated as polygons not lines. Thanks to asciiphil I was pointed to the relevant routine in the back-end processing of osm2pgsql which handles z_order. This is simple enough to replicate in SQL. For deciding which ways to add to the lines and polygon tables I used the file from osm2pgsql storing this in a table.

My first pass seemed to work OK: I didn't try and turn linestrings into polygon geometries, and I didn't do anything with relations. This was the result:

Failed render of snapshot from Berlin history

Really this is a dismal failure. Firstly running mapnik to generate the image took forever: each query was doing a table scan on the lines table rather than using the GIST index on the geometry. Secondly, it turned out that I had a problem with the generation of geometries in PostGIS: hence the lines all over the place. Node order in ways was not carried over properly into the MakeLinestring function.

Obviously, I'd hoped to get further with this, but I'm going to have to concentrate on getting the geometries right: testing using a correlated query seems to generate sensible results. That being said, small volume queries with aggregation seemed to work properly too. A kludgey solution to mapnik performance would be to just extract data in the map extent bounding box before running the mapnik style, and I may do this before looking at the PostGIS performance issues systematically.

A few other things of potential interest: the Berlin data has around 250k ways, collectively with over 600k versions. I identified around 1.6 million potential geometries for these ways. The actual number of distinct geometries is substantially smaller than this because of the high proportion of node version changes which do not affect position.

No doubt far more sophisticated things will be implemented at the upcoming hack weekend. Personally, from my experience so far, I'd like to see at least some history support in the existing tools.

Sunday 15 May 2011

Access Land

Snowdon from Moel Eilio : 3351a

Snowdon has experienced flurry of edits on OpenStreetMap after the last two long weekends. I happen to be one of the culprits as I've been adding some detail from last July: not least because a family member is planning an attempt on the Welsh 3000ers next month.

The availability of Bing aerial imagery allows a lot more detail to be added, particularly when I can relate it to geotagged photos. I've been able to add:
  • walls and fences alongside or crossing tracks;
  • bits of marshy ground;
  • extend paths which I saw, but didn't walk;
  • and, correct alignment of streams and rivers.
In doing so I've also noticed quite a few things which I recorded but had not transferred to OSM: notably stiles and gates.

Access Land roundel : 2684bWhilst looking at these photos I noticed one of a stile had a roundel indicating the start of Access Land. I've suggested before that it's worth mapping these locations as the complete area covered by access land is only likely to be worked out through gradual accumulation of access points. However, when push comes to shove, I'd not done it myself and couldn't think of an obvious tag for it.

Top of Glyn Rhonwy Acces Land : 9225aI also found a photo of a large notice for the area, Glyn Rhonwy. This contains a lot of old slate quarries including one huge hole. The notice states that the upper part of the area is access land. However the stile we went over when we came out of the old quarry area had a rather threatening warning notice. This appears from the Ordnance Survey mapping to be exactly where the Access Land starts.

Bottom of Glyn Rhonwy Acces Land : 9187aThere was another puzzle on the edge of Llanberis. We wanted to avoid walking straight down the road, so had aimed for a footpath marked on the map leaving at a corner of the road above the tea shop at Penceunant Isaf. Checking the OS map today I see that this is also Access Land. All we found was a firmly locked gate.

All in all the Access Land areas in Snowdownia are not that well marked, and are often huge: making them poor targets for mapping. In most cases they're fairly obvious, or are areas where access has been presumed for many years, so adding them to OSM is not a huge help for the walker. Closer to towns and villages there still seems to be plenty of ambiguity. I'm not a fan of the huge intrusive notices that do exist either. They seem inappropriate in a National Park.

Although I added a lot of detail in Snowdoia, I needed to find somewhere else to look at mapping Access Land.

Chrome Hill : 2633a

The one place I have been in the past couple of years which is access land and where I probably have enough information to map access land is Parkhouse Hill. This striking little hill, one of a pair of fossil coral reefs (the other, Chrome Hill is shown above), lies just inside Derbyshire in the upper reaches of the valley of the Dove. It is also one of the places which was pretty inaccessible until Access Land was introduced by the 2000 CRoW act.

View N from Parkhouse HIll : 2666a

I took a whole series of photo from the ridge of the hill in September 2009, but without good aerial imagery it was not possible to delineate the outline of the accessible area. Only when writing this post did it occur to me that I could now complete this mapping. Accordingly, I've mapped the access land itself as a relation, and added the tag entrance=access_land where I recorded the relevant sign. The tagging is still pretty tentative, but I hope having something concrete will lead to a common approach.

Saturday 14 May 2011

Radburn : exemplar or nightmare

Radburn is a small community in New Jersey.

Until a few weeks ago I'd never heard of it, but at a friends party we got talking about The Meadows area of Nottingham, how it's changed and how it might change again. The council plan to partially revert the area to a gridded layout and reintroduce a linear shopping street. The area as it is now consists of many short cul-de-sacs off orbital distributor roads, with many houses accessed along paths or through communal green space. Radburn, NJ is the archetype for this type of layout.

In The Meadows : 9964a

I have been mapping The Meadows in a desultory way now for a while: it's not the best part of town so I like to keep mapping visits fairly short. The areas which were re-developed in the early 1980's have an amazing complexity of footpaths, and short side roads. Varying height buildings also sometimes contribute to a canyon effect with the GPS, particularly in the parts which still consist of late 19th century terraced housing.

The Meadows were originally the water meadows of the town of Nottingham, famous in the spring and autumn for displays of crocuses (Crocus vernus in Spring and C. nudiflorus in Autumn). These non-native plants are supposed to have been introduced to the Nottingham area by the monks of the Clunaic Lenton Priory. They still survive in various sites in and around the city (for instance the General Cemetery, Waterloo Promenade, University Park). The area started being developed after the Nottingham Enclosure Act of 1845, when certain parts of the area were dedicated as green spaces (e.g., Queen's Walk). The remainder was largely developed as high density terraced housing with some small scale industry. A colliery opened by the River Trent around 1910, followed by a council-owned power station (both now the site of a retail park). This was what led to the story of an ignorant DJ introducing a request with "Crocus Street, The Meadows. What a lovely address!".

By the 1960s this was one of the areas the council was anxious to re-develop in what were then known as slum-clearance programmes. The whole western portion was demolished in the mid-1970s, to be replaced by lower density housing with much more green space and small gardens for most properties. Additionally, private developers also built housing in this area. The remainder was re-developed piece-meal with the larger original houses being kept smaller less salubrious places being replaced. This accorded with changing planning fashions, with 'cellular renewal' replacing big bang clearance projects. Probably this was because by the late '70s it was clear that there were problems with many of the replacement buildings of the 1960s.

What staggers me is that now, thirty years later, the council wants to 'regenerate' the area. Apparently, its problem is all to do with the Radburn layout.

View Larger Map

One of the great things about worldwide on-line maps is that we can look at places as far apart as New Jersey and Nottingham in a similar way. On OpenStreetMap the cartography is the same worldwide too. This really brings home the differences between the real Radburn, and "Radburn", the planners concept.

The Meadows is obviously much more densely populated: it has a highly convoluted network of paths, whereas in the Radburn the purpose of each path is pretty clear just from the map. The green spaces are smaller, and less connected too. However, The Meadows housing is far pleasanter than its predecessors and the green spaces do sustain a decent amount of wildlife.

I also, doubt if Radburn has a deeply obscure house-numbering scheme, with houses on the same road having different streetnames. Some named roads don't have any addresses associated with them at all.

There are plenty of other places in the UK which are tarred with the now tainted "Radburn" name, like Oxmoor, Huntingdon, a suburb of Havant, suburbs of Telford such as Woodside, Dunclug, Ballymena (NI), and Deanbank, Penicuik (Scotland). Many of these bear only a tenuous resemblance to the original: a planning concept having been transmitted by chinese whisper. Some of the problems are glaringly obvious when one tries to map these places: its difficult to determine addresses, there are far too many paths around the back of properties, few shops have survived, the green space is festooned with "NO BALL GAME" notices. BUT, just as these designs were a planning fashion, it looks like denigrating them is one too. It seems too easy to lay the blame for problems in these areas on the original planners and architects, rather than understanding them as aspects of more complex social issues.

In case of The Meadows, the money for its has 'regeneration' disappeared in the great wave of government cuts. Perhaps this will lead to a more incremental approach, rather than the traditional waving of the planners magic wand to fix social problems.

Monday 9 May 2011

Catmose Campus : an Oakham update

On the way back from Barnack I diverted from the Oakham ring road to go and check how much of the Catmose Campus was open. I mapped this a few days before its scheduled opening in February, so it was good to have a simple quick mapping objective.


It proved that my quick photos taken in the rain were not a very good guide to the actual location of the building (compare with the actual GPS traces in this screen snapshot):

Catmose Campus in Potlatch2

As I walked up to the entrance to the Sports Centre I reflected that at some stage most mapping should become like this. A short stop off to grab the new details of the changed area, and a quick and simple edit session to finish off. We're a long way from that point, but before we get there we'll need better ways to identify areas with changes and to manage checking their status.

All-in-all a more satisfying mapping experience than last time I was in Oakham: the weather was nicer too.

Saturday 7 May 2011

Along the Fosse Way: mapping a new road

The A46 between Widmerpool and Farndon (near Newark) is being reconstructed as a dual carriageway and is due to open next year.

Kev Swindells has done a great job mapping the southern section of these road works to the junction with the A6097 at East Bridgford. Several sections are already in use, and once again OpenStreetMap is more up-to-date than other online maps. The most obvious of these is the big new roundabout West of Bingham (below, also compare with Google Maps).

At our first pub meetup in Nottingham I asked Kev he was going to do any mapping North of East Bridgford, but apparently its beyond his normal bike runs. I'd already taken some audio notes last summer and had marked in a new overbridge near Elston. So on a nice spring evening I decided to see how much information I could collect for the remaining part of the route. I tried to cross the line of the road works on as many minor roads as I could, starting from Newark and working my way back towards East Bridgford.


Even with decent waypoints for the crossings it's not straightforward to get the rest of the alignment. Several things do help:
  • Powerlines. Lots of powerlines cross the Trent S of Farndon, and all have been mapped (from surveys, Bing and OS OpenData). From photos I can check how the road alignment compares to pylon positions.
  • Field Boundaries. Most field boundaries visible on Bing are still there: obviously they've gone along the construction zone, but again it's useful information.
  • Hedgerow Trees. The farmland on the E side of the Fosse Way towards the River Devon has little in the way of trees and no patches of woodland. Each large tree which can be located on a photo can be matched to the Bing imagery.
  • Aerial Photography. In a couple of places the Bing imagery is recent enough to be able to see that the ground has been cleared in preparation for construction. This was particularly useful just N of Flintham close to the point when the new road crosses the old road (nothing on the ground yet: presumably this will be the last phase of construction.
This mapping is very preliminary: Kev mapped the Southern section in multiple visits by noting where the roadworks obstructed cycleroutes. Ideally more of the local minor roads and footpaths could be mapped to add to the detail as the road nears completion. This is particularly true near East Bridgford. The precise location of the two carriageways is also not clear: round RAF Syerston it looks as if the cutting is not completely excavated yet.

All Saints Syerston : 3581aAll Saints Hawton : 3535a

I didn't forget to collect other data: the little village of Syerston was missing all its roads. It has a nice little church too. I was able to do a bit of general tidying up of the other minor roads. Another nice church, which was already mapped is at Hawton: this has a very fine Easter Sepulchre in the chancel. Both churches are dedicated to All Saints.