Wednesday, 8 June 2011

POSSUMs

Check the tongue!
Check the tongue!
courtesy of wollombi CC-BY

Not cute animals or vicious pests, but Persistent Open Streetmap Unique MapIds!

I've been intrigued by the idea of creating persistent IDs for geographical objects in OSM for a while. Although things in OSM do get a unique ID there is absolutely no guarantee that a given object will retain a single ID, or that a given ID will not be used for multiple objects over time. A classic example is the pub initially mapped as a node, then with a building outline traces, and finally the tags are moved from the node to the building and the node deleted.

Persistent IDs would make it much easier to link external data sources to OSM data. It may obviate the need to add certain types of data directly to OSM. SteveC pointed out the problem of business names in a comment on another post here.

The basic problem is conceptually not too difficult: I've done similar things with bank customer and account data. In a typical business application the problem is, however, that although individual systems usually have persistent IDs there is no single one which links across systems (imagine say multiple bank accounts, a credit card and an insurance policy). With OSM we don't even have this luxury.

My basic notion would be to choose something like the minimum node ID of an object when it is first created and use that as a reference throughout its OSM life-cycle (including, potentially, several deletion and restoration events). Some kind of collision avoidance is needed for objects which share nodes, and handling of certain border cases of ways sharing all their nodes and some types of relations).

Each time an object is touched in a changeset a process is needed to determine what happens to its persistent ID. In most cases nothing will have affected it, but it may be difficult to ensure that these cases are identified. In general geometry changes are unlikely to have an impact whereas changes in topology will: but, I don't know what one should do in the case of inaccurately mapped objects which change position significantly. Changes in topology may result in logical merges and splits of objects previously regarded as atomic, so I imagine the relationships between POSSUMs will need to be stored. Consider a road segment (A->B) between two junctions which gets edited and an additional junction is added between the two (A->C->B): two new identifiers need to be created for A->C and C->B which are children of the original A->B.

Here'a s list of other characteristics, in no particular order, which I think they should possess:
  • I think POSSUMs should be approached as an enabling technology: there may be many different persistent ID schemes depending on individual use cases. Some people may be interested in the road network, others might be interested in building outlines, or it might be a specific geographical area, and so-on.
  • POSSUMs therefore should be independent of the OSM data infrastructure: of course if they proved very useful then OSMF might want to provide them as a service.
  • POSSUM creation needs to be rule based: starting rules will be based on tags, and some aspects of geometry (e.g., road junctions).
  • They need only cover a subset of data. New subsets should be capable of being added to an existing POSSUM data store (the cage?).
  • It should be possible to add history of elements retrospectively and a pre-existing POSSUM to be used to identify the older versions of an element.
  • POSSUMs would only be created for tagged objects (excluding things like fixme, note, source etc.)
  • POSSUMs and OSM objects would have a many-to-many relationship. Some means of identifying parts of a way (e.g., road segments), and POSSUM roles will be required (e.g., pub as premises or building, pub as business).
  • Relationships between POSSUMs may require some manual maintenance.
  • Applying changes to rules means they also need to be applied retrospectively, thus some POSSUMs will be superseded (just as in the new junction case). This is probably technically the most complex part of the idea.
The primary use case for POSSUMs is for the maintenance of linkages between external data (e.g., public open data) and OSM objects in a way which minimises the amount of external data which needs to be imported into OSM. This in turn should reduce the maintenance overhead of keeping OSM data in sync with external sources.

POSSUMs might also be useful for OSM mappers when frequent uploads to the planet database are impossible. This commonly occurs in third-world countries, but also in crisis mapping and disasters when telecom resources are unavailable. At the moment mapping in this situation can give rise to large numbers of conflicts which are difficult to resolve. Use of persistent IDs might make the conflict resolution process easier.

Aaron Cope has already done some much more sophisticated thinking for the use case of a worldwide building register. Check out his building=yes site using WOEIDs. I am sure there are others thinking about this: let's get the discussion rolling.

I'm grateful to Bob Chell of 1-spatial for a brief discussion of the basic notion at SotM10, Girona. This encouraged me to believe that the idea was not completely barking mad. But any remaining nuttiness is entirely my own.


2 comments:

  1. Indeed, we definitely need permanent object IDs! Have you read the OSM wiki article on UUIDs? I think that might be a way to go, but it would be good if you could comment on what's missing or in error:
    http://wiki.openstreetmap.org/wiki/Proposed_Features/UUID

    ReplyDelete
  2. I perhaps should have mentioned that POSSUMs could use UUIDs, but the management of persistence is entirely independent of key management.

    For instance, Aaron Cope's building=yes approach looks to use WOEIDs. For me a key part of the issue is that a capability is provided for people who need it, but the actual mechanics of the data and process should be what they feel is appropriate. In the UK there might be a desire to link OSM objects to the Ordnance Survey's persistent IDs TOIDS, or to a local authorities street furntiture register, or to the NPLG.

    UUIDs do remove collision problems: indeed the use of UUIDs as the OSM system key was something that was discussed in the context of crisis mapping with Fran Boon from Sahana at WhereCampUK in November.

    ReplyDelete