Thursday, 11 August 2011

Creating off-line OpenStreetMap tiles along a route

Generation of mapnik tiles for a route

Gordonjcp was wanting to have a set of offline tiles for his 'phone along a route he is taking tomorrow. The ability to create a limited set of tiles would get round some of the problems associated with tile scraping, limited storage space, and time taken to create the tiles.

However, there aren't any obvious apps which do this. It occurred to me that it might be possible to limit this on the SQL side with little or no change to the standard generate_tiles.py script which is distributed with the OpenStreetMap mapnik renderer.

The idea was so simple it was worth trying out. I already had a PostGIS database containing (fairly) recent data for Great Britain, so all I had to do was import a route into the same database and create views which limited the OSM data to areas adjacent to the route. This is a very simple-minded hack, but it seems to work reasonably well and it may be possible to improve it by adding a little code to the existing python script.

For a route I chose a GPX track I uploaded recently. There are plenty of other ways of generating a similar track: notably by using an on-line routing engine and saving the route as a GPX (for instance Open MapQuest, OpenRouteService or Cloudmade). To get the GPX track into PostGIS I used Quantum GIS to save the track as a shapefile and the SPIT extension to upload this to PostGIS. In PostGIS I added two extra geometry columns to this one row table, and updated these with the track itself and the track surrounded by a 1 km buffer:

UPDATE route_track
SET geom_goog = ST_SETSRID(ST_TRANSFORM(geom_wgs,900913),900913)
, geom_goog_buffer = ST_SETSRID(ST_BUFFER(
ST_TRANSFORM(geom_wgs,900913),1000),900913);
For each of the 4 main planet_osm_* tables I created a view as follows:
CREATE OR REPLACE VIEW planet_osm_lineX
AS
SELECT a.*
FROM planet_osm_line a
, route_track b
WHERE ST_INTERSECTS(a.way,b.geom_goog_buff)
I ran SELECT populate_geometry_columns() to ensure the geometry columns were accessible to PostGIS operations. It would be possible to rename the planet_osm_* tables and create the views instead of them. This would avoid the need to create an extra copy of the mapnik XML file, which is what I did next.

I made a copy of the mapnik.xml file and replaced all examples of planet_osm_* with the same string with an "X" appended.

All I had to do then was add a suitable bounding box to the python script and set local variables for the map tile directory & the mapnik map file. Really the bbox should be determined from the database, but the objective was to do something quick and dirty. For my example I used a bounding box which was 2 degrees wide and 1 degree high.

Tile generation runs pretty smoothly: as one might expect lots of empty tiles are created, but the query overhead is quite reasonable. As the empty tiles are only 103 bytes each they don't take up too much space either.

There are lots of fairly simple optimisations which could be added:
  • Find the bbox of the route & only generate tiles within that bbox.
  • Identify empty tiles en masse and avoid running the mapnik queries against the database for these tiles.
  • Replace empty tiles with links.




Monday, 8 August 2011

Augmenting Residential Landuse : building density

Building Density on Residential polygons, Nottingham

Andy Allan (gravitystorm) asked a very pertinent question after my talk at SotMEU: Why had I not used building outlines to automatically identify different classes of residential landuse?

Briefly, my answer was: "I had tried, but the results weren't satisfactory". This is a much longer answer than I could give at Vienna, and fills in some detail about the technical approach I used. I hope that readers may have some good suggestions to improve this approach.

It is clear that Urban Atlas data is a good model for studying landuse mapping through replication/simulation in OSM. After working with this data for six months I realise that one of the main reasons for this is that UA's main classification is simpler than the current OSM consensus for landuse mapping. Thus most UA landuse classes have a 1:M mapping to OSM tags. Other landuse and landcover schemes, such as CORINE, EUNIS, have many more classes and in thus in many cases the classes have N:1 or N:M mappings to OSM tags. Most of these relate to agriculture, woodland or natural vegetation: for the most part where OSM tag usage is only fitfully consistent, or the same tag has multiple meanings.

There are, however, 5 UA classes (Continuous and Discontinuous Urban Cover — 11100, 11210,11220, 11230 and 11240) which map on to a single tag in OSM : landuse=residential. The main criterion for separating this classes in UA is the degree of surface sealing, with continuous urban cover meaning that over 80% of the land surface is effectively sealed (buildings, asphalt, concrete etc.). As a significant proportion of surface sealing will be buildings it seems plausible to use the proportion of residential landuse covered by buildings as a proxy for surface sealing.

Buildings from OSM & OS OpenData
Buildings generated from OS OpenData using mapseg (orange) compared with those mapped in OSM using Bing imagery (light green) for OS grid square SK5038.

Firstly, I needed a decent source for building outlines. My test data area, Nottingham, does have a large number of mapped buildings (47000+), but is nothing like complete. Partially mapped building sets are no use at all for this purpose. I therefore made use of Mapseg a set of python routines written by Tim Sheeman-Chase to extract building outlines from the Ordnance Survey OpenData StreetView (OSSV) tiles. I had originally created this data to make address mapping easier, but was dis-satisfied with artefacts (spiky buildings, failed orthogonalisation, etc.) so I had only used a tiny proportion of the data directly in OSM. My original data set covered the whole of the county of Nottingham and adjacent areas and included over 600000 building outlines: processing and manipulating this was somewhat tedious as each OSSV tile or sub-tile generated a single OSM XML file and I therefore had around 200 files with duplicate node and way identifiers. In the end I loaded each file into PostGIS and merged the data there. I then clipped it to the bounds of the target test area to end up with a data set of 189, 420 buildings. Some geometries needed to be cleaned to remove self-intersections.

Buildings were then clipped to match the residential landuse parcels, and then assigned to individual parcels. Landuse parcels which had originally been created as multipolygons on a 1 km grid were decomposed to individual polygons (using ST_DUMP) before this step. Polygons meeting on grid boundaries were not re-assembled. At this point it is simple to sum the building area for each polygon and determine the ratio of the total polygon area covered by buildings. The results are graphed below with the data divided into 20 buckets.

For comparison here is the relative distribution of the EEA UA data for the Nottingham area:

UA Code Area (ha) % Total Residential
11100 137 1.0%
11210 7132 52.0%
11220 5298 38.7%
11230 846 6.2%
11240 18 0.1%
11300 272 2.0%

The vast majority of residential landuse falls in two classes with surface sealing in the range 30 to 80%, with more than half of the landuse with greater than 50% surface sealing. A cursory comparison between the two sets of figures suggests that, at best building cover may only account for half factors used to assign surface sealing in Urban Atlas. Fortunately because I created a datasets of differences and similarities of data from UA and OSM it's possible to compare individual landuse polygons: a bit more work as these may be slightly different shapes, and therefore the whole clipping of buildings and area calculation must be repeated.

Comparison of UA Codes with OSM building density

The box-plot above amply confirms the initial impression. No doubt if and when I learn more about R, a proper statistical comparison could be done. Building density from OSM does not explain enough of the UA surface sealing value to be reliable for classification. At least for now. There is enough in the plot to hint that enrichment of the data with other variables may facilitate separation, but that's a different project again.

So that's the (very) long answer to Andy's question.

What does it mean for OSM:
  • Using buildings to derive sub-classes of residential landuse is not generally useful.
  • The requirement for completeness in building outlines on its own means that such an approach would at the best of times only work in certain places (good external sources of building data, excellent imagery). It may also require fanatical devotion to mapping buildings.
  • The basic classes of landuse we are interested can probably be derived from local knowledge, ground survey, aerial imagery etc. To make this more generally useful this requires some guidelines and consensus on tagging.
  • For now we could use tags similar to the Urban Atlas codes for urban areas in Europe and North America. Significant extension of these values might be needed to adequately cope with other areas in the world. (I'll be blogging in depth on this subject soon).
  • Looking at extending this type of data with other OSM derived variables may be interesting. Alexander Zipf mentioned that his group at Heidelberg had used a data mining approach to identify residential areas from OSM data.
If you've got any ideas of how to extend this type of classification drop me an email or comment below.

Tuesday, 26 July 2011

Footpaths in Carmarthenshire: What's the point?

IMG_6566a

Late in June I spent nearly a week near Carmarthen. I thought I'd get to know some footpaths local to where I was staying.


Monday, 20 June 2011

The Church which turned into a Gurdwara: pre-pub mapping

Church? 4827a

This is a very belated write-up of the May Nottingham pub meet-up.

First of all it was great to meet will_p, our most productive local mapper, and to welcome Paul the Archivist who has mapped actively in his home town Derby, Birmingham and Cardiff. A few of us met at the pub for an hour walk and a bit of light mapping: this is always a good opportunity to compare notes about how different people map and talk about specific issues as they arise (such as how does one represent the surface of the road below: we didn't have a good answer!).

surface=? 4804a

I find it difficult to think of an area comparable to that around the Lincolnshire Poacher which has a similar variety of different sorts of places within a couple of hundred metres: modern social housing, schools, a university campus, retail and commercial areas, pre 20th century housing, large houses, parks, gardens etc. It's also not that well-mapped, something which only really becomes apparent by walking through and checking OSM with what's on the ground. In fact our little mapping walk showed, once again, how much can be missed when one only uses aerial imagery.

We again chased down a few missing road names, but failed to see any signs on one road. Marcus pointed out a few streets along Woodthorpe Road which were blocked and not mapped properly (these are tree-lined so guessing that they connected from aerial imagery was wrong).

This time we headed into the S edge of Mapperley Park, one of the wealthier parts of the city, and a great contrast from St Anns, just the other side of the ridge line. We only followed three or four roads but it was very clear that the original sources of OSM data were pretty poor: this was a mix of NPE maps and aerial imagery. Density of mature trees in Mapperley Park made it difficult to identify the actual road lines from aerial imagery. We walked down Redcliffe Road the upper parts deeply cut into the rock and supported by Bulwell Stone walls . These stone walls area a classic sign locally of a private development between about 1880 and 1960. After 1960 the quarries were running short of this attractive Magnesian Limestone. Many are now used for housing or industry close to Bulwell town centre.

Old Boundary Marker : 4802aThe most interesting site along this road were a couple of cast-iron boundary markers from 1859: these list the mayor and town clerk of the Town of Nottingham (it didn't become a city until some time later). In style they are very similar to the markers used when the City boundaries were extended in the 1930s (see an earlier post). I was intrigued because I didn't know which area was on the other side of the boundary. Fortunately we have an old map at home (Salmon's map) which is virtually contemporaneous with these boundary markers. I took a couple of pictures of the same area that we walked through showing the state of Nottingham in the early 1860s. This clearly shows the boundary, and that Mapperley Park and much of the Forest were in the parish of Basford. Basford was incorporated into Nottingham in the 1880s.

Nottingham boundary 1861 : 4947b

Also of interest is that the road was then called Red Hill Lane. Some road names in St Anns have been preserved although in altered form after re-development, but quite a few on the roads around Bellevue Reservoir seem to have been lost. It's not clear what name the paths round the reservoir have themselves: this old map calls it St. Anns Hill, which now belongs to a road which was then called Blenheim Villas, whereas modern maps (and some from the 1930s) name it as part of Corporation Oaks.

Gurdwara Noticeboard: 4832a

Returning to the pub along Mansfield Road we had a better look at the prominent church shown at the top of the blog. Paul had noted that it seemed to be a Gurdwara, a Sikh Temple, and as we passed it took a 'photo of the notice board. The following day I felt compelled to do a bit of research: How long had it been a Gurdwara? Who built it? what was it's recent history? I was relieved to discover that it had recently been acquired from the Christian Scientists who have moved their place of worship into their fine office building next door. But originally it was built as a Presbyterian church. So it had been a church when first mapped on OSM, but another object lesson that things ain't what they seem and the importance of ground surveys.

OSMers in the snug :4837a

We ended up in the snug at the Poacher for a pie (Paul) and several pints.

Friday, 10 June 2011

Hyperlocal Naming

Stanage Edge from Stanage Plantation
Stanage Edge, Derbyshire
The site of hundreds of hyperlocal names used by rock climbers.
Photo courtesy of Earthwatcher BY-NC-ND

We had an interesting a discussion in March on the OpenStreetMap talk-gb mailing list arising out of some controversial naming practices in Loch Lomond and Trossachs National Park ("Giro Bay"). Someone suggested that no-one would or should ever add newly minted names on OSM.

This is an extended version of why I think that names widely recognised in specialist communities should be used when appropriate. I call these names "hyperlocal", because many of them are used for naming features or localities which non-members of the relevant community would not recognise, or would see no need for naming them.

Naturalists. This whole post started because someone expressed surprise that 'non-official' names might be entered in OpenStreetMap (all my examples are from Attenborough Nature Reserve, Nottinghamshire). Naturalists, and particularly those managing and using nature reserves, are creating new names all the time. In part this is because often these places themselves are new (for instance old industrial workings and quarries — "The Delta", "The Bund"), but mainly because much greater precision is required for location information. This is really important if one wants to find the only example of a rare plant, or check on a bird-box location, or photograph a rare fungus, or make sure the right tree gets chopped down. Also one becomes increasingly aware of minor differences which dramatically affect the range of plants and animals over a small area. In principle this could be communicated by using grid references and at a fine scale a GPS. In practice names are more memorable, help to delineate areas which grids don't, and are much easier to communicate. Some names stick, others may be concocted but never enter common usage. Some places have multiple names, and it takes a long time to find a consensus ("Old Car Park", "Fishermans Car Park", "Education Meadow", "Corbett's Meadow"). Many names have multiple intelligible variants ("Butterflies", "Butterfly Triangle", "Butterfly Patch"). In other cases it's possible to see how a name changes. For several years charcoal has been made at Attenborough: the area by the kiln is now nearly always referred to as "by the Kiln", whereas it used to be called "Redwings". In other places locations are known by number alone, for instance 'compartment 73'. I have a map of compartment numbers for Clumber Park, but it's an internal National Trust document, they are used in the national Fungal Records database though.

I have added several of these 'made-up' names which are in common usage around Attenborough Nature Reserve: the Delta, Warbler Dell, Dirty Island Bank, Butterfly Patch (in use since the 1960's), Corbett's Meadow (a recent 'official' coinage, in memory of Keith Corbett who was reserve manager for over 30 years until his death in 2007), which is also known locally as "The Fisherman's Car Park", and "The Old Car Park", and Education Wood (a recent unofficial coinage, around 2005). As all the water bodies were created by gravel working, their names have evolved recently too. I have only added those which are in widespread usage: there are perhaps 50-odd names which were coined in the '60s and '70s, mostly eponymic toponyms, but many never caught on.

Carsington Reservoir is another location where local birders have evolved a significant number of toponyms. Any of the Helm "Where to Watch Birds" regional guides will provide many more examples.

Climbers. Hill and mountain areas are chock full of named features which might never be noticed. This isn't surprising, the cliff shown at the top of this post Stanage Edge has 100s of climbs along its length. The convention is that the first people to successfully climb a route have naming rights, but it's more complicated than that. An even smaller crag further S in Derbyshire, Birchen Edge, has a convention that climbs names should have a nautical allusion (Trafalgar Wall, Camperdown Crawl). I would think in Derbyshire alone there are probably well over 5000 named climbing routes. Many of these will be less than 20 m ascent. On the harder ones each hold will be memorable (I'm not yet aware of these being named yet). On the other hand some places fall into disuse. My father and his friends used to climb at Laddow Rocks because it was accessible by bus from the Manchester area. Much of Laddow is now mossy and green, but everything on it still has a name: several guidebooks were published.

Oddly, in bigger mountains such as the Alps there is not this degree of minutiae in naming: or if it exists it's not published so widely. A number of factors influence this. Climbing in the Alps tends to follow obvious lines because there are a lot more of them, and they're much longer. Sport climbing has used pictorial 'topos' for a long time, so they avoid the same need for names as a narrative descriptive system. There are still plenty of these names though, the Eiger Nordwand has lots, all used in many mountaineering books, and giving a title to one: Hinterstossier Traverse, The White Spider, the Third Ice Field, Traverse of the Gods, Exit Cracks and so on. That is just one face of one mountain, albeit a famous one, and there's enough variety in the type of names to give a feel for how they were created.

Skiers are another group who go into the mountains for recreation. In many ski resorts many of the names are invented and often follow a stereotyped pattern typically with lots of alpine fauna: White Hares, Pika, Marmots, Chamois, Ibex, Ptarmigan. The older European resorts still use lots of names which arose naturally from older uses.

St Anton - Ulmerh¨tte from SchindlergratSt Anton in the Tirol has many ski runs and locations named after alpine meadows. Some of these, such as Gampen (often Gampli) have an interesting linguistic history indicating the presence of speakers of a latin-derived language in the area similar to modern Romansch. Mattun, is another name ultimately derived from latin, referring to a characterstic member of the mountain vegetation. Both of these are fairly common place name elements close by in the Eastern Graubünden. (Place name etymology from the Swiss Alpine Club guides). In addition to the regular runs, St. Anton has a host of well-known, named, off-piste runs. The photo shows the top of a broad but steep gully off the Schindlerspitze: but, I never learnt the names of these gullies. Some of the seriously steep chutes have names which are closely guarded secrets passed on only to those who have skiied them.

Photo courtesy of rickardengstrom CC BY-NC-SA

On the Grands Montets many apparently non-obvious features have names (often different names in different languages): Combe des Amethystes or the Italian Bowl; Canadian Bowl. Again these are not to be found on maps, but are widely used by people who know the area. One or two specialised publications use them too. It's possible to ski a line a few metres from the Italian Bowl and not be aware that its there: even then finding the direct entry which gives an exhilarating steep start requires precise knowledge of the local geography. So once again, these names have arisen from a need for much greater precision than might otherwise be expected.

Ski racing requires a heightened awareness of terrain which again tends to drive increasing precision in naming. One of the oldest open downhill races in the Alps is the Parsenn Derby. When I first visited Davos the piste which formed the lower part of the course had markers giving the names of each turn and schuss. nowadays the course is shorter still, but still features places like the 'S-bends' and the 'Derby Schuss'. It's not for wimps like me:




Motor Racing. Any one who has followed a Grand Prix will know that every part of the track has a name. Of course on a modern track these names will have been invented and assigned by marketeers, but on courses with a long history, like Silverstone, it's easy to guess how the names came about. Early users probably said things like 'I came off at the corner near Stowe', which being cumbersome would get rapidly shortened to Stowe Corner. Of course names were useful to everyone: drivers, instructors, marshals, commentators and spectators. Now they integral to what people expect of a motor racing track. Many of these are captured on OSM, including the circuit featured on Top Gear.


Fishing. I know little about fishing, but it struck me that fishermen must have names for favourite spots, and then I came across the wonderful annotations of Walking Papers by Kirk Lombard. The one I show above is near Fishermans Wharf, San Francisco, with one fisherman's name "Striped Perch Hole". Incidentally, this seems to be a great example of using Walking Papers (and OSM) in an innovative and unexpected way.

Names seem to evolve in each of this situations in similar ways. Often they are created by pragmatic shortening of a description. Memorable events, and the people associated with them, are another rich source of names. Things also get named after people as a token of appreciation, respect or a with to remember them. Thematic naming sometimes works, but often ends up being insufferably twee, and is often resisted or subverted, particularly by wit.

One last one example are local names to the swimming holes on Fairham Brook by Keyworth Meadow NR: see Neil Pinder's article on the parish website. I've asked around for other examples, and someone with extensive experience of forestry noted that workers named landscape features in a very similar way to naturalists I'd be very interested to hear of other examples, particularly outside the interest domains I've covered in this post.

The ability to map these hyper-local toponyms is a very attractive part of OSM. Of course, they need to be researched accurately to ensure they are names which are used rather than 'book-names'.

Wednesday, 8 June 2011

POSSUMs

Check the tongue!
Check the tongue!
courtesy of wollombi CC-BY

Not cute animals or vicious pests, but Persistent Open Streetmap Unique MapIds!

I've been intrigued by the idea of creating persistent IDs for geographical objects in OSM for a while. Although things in OSM do get a unique ID there is absolutely no guarantee that a given object will retain a single ID, or that a given ID will not be used for multiple objects over time. A classic example is the pub initially mapped as a node, then with a building outline traces, and finally the tags are moved from the node to the building and the node deleted.

Persistent IDs would make it much easier to link external data sources to OSM data. It may obviate the need to add certain types of data directly to OSM. SteveC pointed out the problem of business names in a comment on another post here.

The basic problem is conceptually not too difficult: I've done similar things with bank customer and account data. In a typical business application the problem is, however, that although individual systems usually have persistent IDs there is no single one which links across systems (imagine say multiple bank accounts, a credit card and an insurance policy). With OSM we don't even have this luxury.

My basic notion would be to choose something like the minimum node ID of an object when it is first created and use that as a reference throughout its OSM life-cycle (including, potentially, several deletion and restoration events). Some kind of collision avoidance is needed for objects which share nodes, and handling of certain border cases of ways sharing all their nodes and some types of relations).

Each time an object is touched in a changeset a process is needed to determine what happens to its persistent ID. In most cases nothing will have affected it, but it may be difficult to ensure that these cases are identified. In general geometry changes are unlikely to have an impact whereas changes in topology will: but, I don't know what one should do in the case of inaccurately mapped objects which change position significantly. Changes in topology may result in logical merges and splits of objects previously regarded as atomic, so I imagine the relationships between POSSUMs will need to be stored. Consider a road segment (A->B) between two junctions which gets edited and an additional junction is added between the two (A->C->B): two new identifiers need to be created for A->C and C->B which are children of the original A->B.

Here'a s list of other characteristics, in no particular order, which I think they should possess:
  • I think POSSUMs should be approached as an enabling technology: there may be many different persistent ID schemes depending on individual use cases. Some people may be interested in the road network, others might be interested in building outlines, or it might be a specific geographical area, and so-on.
  • POSSUMs therefore should be independent of the OSM data infrastructure: of course if they proved very useful then OSMF might want to provide them as a service.
  • POSSUM creation needs to be rule based: starting rules will be based on tags, and some aspects of geometry (e.g., road junctions).
  • They need only cover a subset of data. New subsets should be capable of being added to an existing POSSUM data store (the cage?).
  • It should be possible to add history of elements retrospectively and a pre-existing POSSUM to be used to identify the older versions of an element.
  • POSSUMs would only be created for tagged objects (excluding things like fixme, note, source etc.)
  • POSSUMs and OSM objects would have a many-to-many relationship. Some means of identifying parts of a way (e.g., road segments), and POSSUM roles will be required (e.g., pub as premises or building, pub as business).
  • Relationships between POSSUMs may require some manual maintenance.
  • Applying changes to rules means they also need to be applied retrospectively, thus some POSSUMs will be superseded (just as in the new junction case). This is probably technically the most complex part of the idea.
The primary use case for POSSUMs is for the maintenance of linkages between external data (e.g., public open data) and OSM objects in a way which minimises the amount of external data which needs to be imported into OSM. This in turn should reduce the maintenance overhead of keeping OSM data in sync with external sources.

POSSUMs might also be useful for OSM mappers when frequent uploads to the planet database are impossible. This commonly occurs in third-world countries, but also in crisis mapping and disasters when telecom resources are unavailable. At the moment mapping in this situation can give rise to large numbers of conflicts which are difficult to resolve. Use of persistent IDs might make the conflict resolution process easier.

Aaron Cope has already done some much more sophisticated thinking for the use case of a worldwide building register. Check out his building=yes site using WOEIDs. I am sure there are others thinking about this: let's get the discussion rolling.

I'm grateful to Bob Chell of 1-spatial for a brief discussion of the basic notion at SotM10, Girona. This encouraged me to believe that the idea was not completely barking mad. But any remaining nuttiness is entirely my own.


Saturday, 28 May 2011

On the histories of OpenStreetMap data

Peter Körner (MaZderMind) has the recently made available extracts of selected areas from the OpenStreetMap full planet history. These are of various Länder of Germany, but are big enough to be interesting and small enough for repeated processing and analysis.

Handling the history of OSM edits is interesting in its own right, but it also has an immediate practical importance as some edits will not be carried forward with OSM once the OdBL licence change is completed. Tools to process and analyse the complete edit history are needed as part of this change.

My main interest is to visualise past states of the database conveniently. The image below is of Berlin from 31st March 2009. This image was generated by re-extracting data from the history using osmosis (regular snapshot schema tables were replaced by views on history tables) and re-importing with osm2pg2sql. This process is too involved and slow for rapid visualisation of historical views of the data, but does demonstrate that reasonable results can be achieved without elaborate changes to the snapshot schema.

Berlin 2009.12.31

I'm also interested in the technical aspects of handling OSM history: I have worked on temporal database schemas in the past. Some of these were more complex than OSM, as they stored not just history of the system (transaction time), but history of the real-world (valid time). This is also something which intrigues me: how one might store real-world history in something like OSM. In OSM we don't make any attempt to discover when a pub opened or when it closed, we just know that someone added the pub to the system and that later someone removed it. Although it might seem reasonable to assume that the pub closed between the two events, it might be the original edit was by someone who used the pub 20 years ago, and it closed 19 years ago.

My initial approach is simple, some might say naive. This is to basically take a schema very similar to the current API schema and make minor modifications to it. The main change is to add another timestamp column to the main tables, with the time period between the two timestamps representing the period when the given primitive (node, way, relation) was valid. This is partly a convenience for querying the data, but it also has the advantage that we can work with partial histories if some versions are missing. The main reason for doing this, though, is to use timestamps as an additional part of the compound key for way geometries. This avoids having to generate another system key to add to the identifier and version.

Way geometries are the crucial problem in making historical data convenient to render. A given version of a way may have many geometries, depending on how many of the nodes comprising the way have changed position, and, indeed, how frequently. Each time a node changes the geometry of all its parent ways may change: of course a node might just be touched (updated with no change), or might only have its tags modified. In the first instance I ignored these issues, although about 40% of all node changes in the Berlin data do not affect position.

My first step was to find for each way version all the start timestamps for the nodes belonging to that way version and where the node version validity period overlapped that of the way version. As the way version start and end timestamps must also be considered I did this using a UNION:

SELECT
w.way_id,
w.version,
greatest(n.tstamp_from, w.tstamp_from) AS tstamp_from

FROM way_hist w
JOIN way_node_hist wn
ON w.way_id = wn.way_id
AND w.version = wn.version
JOIN node_hist n
ON wn.node_id = n.node_id
AND w.tstamp_from <= n.tstamp_to
AND w.tstamp_to >= n.tstamp_from
UNION
SELECT w.way_id, w.version, w.tstamp_from
FROM way_hist w
It's easy to add a valid end date column to this data using a SQL windowing function.

This gives us the total number of historical records we will need to store to access all distinct geometries of ways in the data set.

Exactly the same approach can be used to extend the way_nodes table with node version, and a validity date range based both on the way version and the node version. Indeed the range of way geometries can be derived from this data, but because we also need to slice up the validity range of an unchanged node for each way geometry it is convenient both conceptually and practically to keep these data separate.

With this data we can now start to generate all the historical way geometries. Unfortunately this is computationally expensive in PostGIS: one of the reasons Osmosis provides options for doing this before loading into the snapshot schema. I used the aggregate ST_MakeLine postgis function, which requires that the nodes be correctly sorted in the input.

With historic tables for nodes, ways, and way geometries created the next step was to create history tables which mimic the tables created by osm2pgsql for mapnik rendering. For nodes this is relatively simple, its just a big query converting relevant hstore data into columns along the lines of tags->'highway' as highway with the addition of my two timestamp validity columns. For ways it is more complex: the z_order and area columns requires population, and some ways need to be treated as polygons not lines. Thanks to asciiphil I was pointed to the relevant routine in the back-end processing of osm2pgsql which handles z_order. This is simple enough to replicate in SQL. For deciding which ways to add to the lines and polygon tables I used the default.style file from osm2pgsql storing this in a table.

My first pass seemed to work OK: I didn't try and turn linestrings into polygon geometries, and I didn't do anything with relations. This was the result:

Failed render of snapshot from Berlin history

Really this is a dismal failure. Firstly running mapnik to generate the image took forever: each query was doing a table scan on the lines table rather than using the GIST index on the geometry. Secondly, it turned out that I had a problem with the generation of geometries in PostGIS: hence the lines all over the place. Node order in ways was not carried over properly into the MakeLinestring function.

Obviously, I'd hoped to get further with this, but I'm going to have to concentrate on getting the geometries right: testing using a correlated query seems to generate sensible results. That being said, small volume queries with aggregation seemed to work properly too. A kludgey solution to mapnik performance would be to just extract data in the map extent bounding box before running the mapnik style, and I may do this before looking at the PostGIS performance issues systematically.

A few other things of potential interest: the Berlin data has around 250k ways, collectively with over 600k versions. I identified around 1.6 million potential geometries for these ways. The actual number of distinct geometries is substantially smaller than this because of the high proportion of node version changes which do not affect position.

No doubt far more sophisticated things will be implemented at the upcoming hack weekend. Personally, from my experience so far, I'd like to see at least some history support in the existing tools.