Tuesday 31 March 2015

Bat Bridges, or why deleting lonely tags is a bad idea

The other day I was idly browsing the blog of Mark Avery, the former conservation director of the British bird protection society, the RSPB. One item caught my attention: it was about 'bat bridges'. Although I hadn't heard of them before it was pretty obvious what they might be.

"Bat bridge" - geograph.org.uk - 872775
A bat bridge on the A590 in Cumbria

Bats tend to follow linear features in the landscape when foraging at night, at least in part because they provide protection from predators. Bats tend to avoid flying over open spaces. Hedgerows, edges of woods, and so on, form commuting routes between roosting and feeding sites for bats. When these are damaged or destroyed, for instance by road building, bats either lose feeding locations or have to cross the open space. Usually they do this by flying low: effective against their age-old predators, but not much help when confronted by a car.

Bat bridges are an attempt to mitigate against the loss of the landscape features used by bats. Often, mitigation is a requirement for the planning permission for a given project. Notoriously, there is often not a requirement to see if the implemented mitigation measure is effective. A group at Leeds University have studied several bat bridges in the UK, and have come to the conclusion that they are more or less useless.

I don't remember seeing a bat bridge and I wondered where they had been installed. Wikipedia has quite a comprehensive list, but without detailed location information. A quick check on taginfo showed precisely two instances mapped on OSM, simply as "description=bat bridge", which turned out to be on the A38 bypassing Dobwalls in Cornwall. Not surprisingly the user who mapped them was at a loss for a suitable tag, and had added man_made=wildlife_crossing and bridge=yes together with the description. Not totally accurate, but entirely useful for finding these structures.

Mapping & Tagging Bat Bridges

As the Derbyshire Bat Group noticed my tweet, I decided that it would be interesting to make the tagging more regular and to add more examples based on the wikipedia listing, and wikimedia and geograph photographs. It was extremely helpful that there were two existing examples because this gave me a good idea of what to look for on aerial imagery: essentially something like power line wires but with obvious (because all are new installations) concrete footings on either side of the road. I had to search up and down the lengths of roadway more than once before I located the bridges, but in a relatively short time had added 9 more to the two originally mapped.

In deciding how to tag them I checked the documentation for wildlife crossings and it seemed appropriate to use adjectival or sub-tagging to separate bat bridges from other types of wildlife crossings. Thus the tag wildlife_crossing=bat_bridge was born.

Various people maintain that tags with a small number of values are useless. Whilst this is true in cases of mis-typing, it more usually reflects something else: a specialised interest, lack of observations, lack of a suitable tag, or simply a lack of time to map such detail. In this case being able to show where bat bridges are located is of immediate interest, and part of the data which conservationists might need when discussing transport policy, as at a conference hosted by Bat Conservation on Tuesday. Curious structures across main roads might also pique the interest of people passing, and general inquisitiveness is certainly a common trait in OpenStreetMappers.

See full screen

I hope describing the background to a rarely used tag shows that such things are often meaningful and useful, even if the usage might be niche.

Low occurrence tags in general

There are two general aspects of using such a tag which are of more widespread relevance to OpenStreetMap as a whole:
  1. The danger of removing meaningful tags inadvertently
  2. Specific keys versus a more generic key and sub-tagging. 
I'll only talk about the former in any depth in this post. The second subject is very broad and involves interesting issues about how to choose an appropriate abstraction level for database design in general.

Before getting into the nitty-gritty of this post, I thought I'd lighten the tone by listing some of my favourite rarely used tags:

Stile with dog gate OS SY567846
Stile with dog 'guillotine', more formally a dog gate.
Source: Wikimedia Commons.
  • Dog Guillotine (dog_guillotine=yes; 52 uses). Not a device for murdering dogs, but a simple vertically operated gate to allow dogs to pass through a fence, usually alongside a stile which many dogs find difficult. This tag is very memorable, much more so than its more formal, but also more prosaic, moniker of 'dog gate'
  • Research Institute (amenity=research_institution; 54 uses). Perhaps I'm attached to this one because I worked in a few , as did my father and many friends.
  • Soft Play Centres (leisure=soft_play; 11 uses). Indoor areas where small children can play protected by soft matting. Usually commercial. There was deep scepticism about these when the tag was first raised, and all are located in the UK. However, I did see one in Ushuaia, so they're not necessarily a British thing (see also this). Probably an excellent example of a tag where mappers arent noticing/cant think how to tag. (Incidentally, in writing this I find that another tag (indoor_play) has been introduced for the same thing since I last used it. This is a common phenomenon for low-occurrence tags because they inevitably have a low profile. This is also a good example of a more generic versus more specific tag. Re-tagging soft play areas to indoor play loses some precision of meaning).
  • Sitting Disability (sitting_disability=yes; 48 uses). Used to indicate that swings (and other equipment) in a playground is appropriate for children with postural disabilities. Part of a well thought out, and moving, tagging proposal. I have yet to map playgrounds in sufficient detail to add playground equipment, but this is the type of thing which makes doing so worthwhile.
  • Pet Crematorium (amenity=crematorium,crematorium=pet; 2 uses). I came across one of these the other day whilst out surveying, but there's even a professional association. (A good example of where choice of generic vs specific key is hard, see below).
These are either things which interest me, or which I've come across. I'm sure every mapper has their own list. This is one reason why low-occurrence tags exist.

Why cleaning up low-occurrence tags requires care.

There is a tendency to treat all rarely used tags as rubbish. It is therefore important to understand how and why they might be created.

A recent example, was the removal of some tags which we use locally in Nottingham. We use an extended version of the not:name tag to highlight discrepancies between Open Data from the city council and OSM: partly to help mappers who might be tempted to use the open data without checking how up-to-date it is, and partly to provide the ability to feedback to the providers. (At the recent OpenDataCamp I was delighted to learn from Dr Sian Thomas of the Food Standards Agency  that a small bit of feedback I provided to a local authority, had been very effective. I am now more determined to help where I can in improving data quality for Open Data providers, and this is one mechanism towards such a goal).

The mapper who removed the tags was unaware of this usage and could see no value in the tags. In this case the new changeset comment feature enabled us to chat with the mapper who had removed the tags and resolve the issue. Given the structure of the key (not:<data provider>:<data set>:<osm key>) these do not lend themselves to a wiki page, as there are around 40 data sets we use, and perhaps 5-6 OSM key values as well.

Of course there are many keys with a small number of values which can be altered to make the data more valuable, but it is rare that they are directly amenable to an automated edit. The most usual ones which can be caught are typos for widely used values, which can be identified when the value corresponds to a typical value of the intended key (e.g., hihghway=secondary).

Here's a sample list of single values for shop in Australia:
"gps", "garden_equipment", "real estate", "Pet Supplies", "building_supplies", "art_supplies", "random_crap", "hunting", "loan","Coles Express", "hydroponics","kayak; canoe","solicitor", "electronics;hifi","TV repair", "wines/art craft", "Linen", "tax_agent", "baby_goods", "video_rental", "beads", "watches", "alterations", "skater", "optometry", "photography", "fresh_produce", "antique_shop", "sewing_machines", "air_conditioning","farm_supplies", "farm_machinery","office","Onsite Rentals","nails","Homes","carwash", "property", "home_brew". 
Every one of these is intelligible, some, such as antique_shop can be mapped to a more widely used value (shop=antiques). random_crap is just pure genius, we all come across shops of this type, but are rarely brave enough to say so. Changing many of these values may remove information, for instance optometrists do different things to opticians, although many may well be opticians as well. Furthermore the locations may have been mapped some time ago and things may have changed.

Places in Europe with gas lighting via Overpass
An example of a tag which is rarely used due to historical change.

All keys and tag values started out with a low count, some prove to be universally applicable, others might be specific to small areas (see lit_by_gaslight), and others still are of minority interest. Low counts are not a bug in themselves. Indeed, it is often possible to rapidly increase the number of mapped objects once a sensible tag appears.

I am not against making obvious corrections; I just believe they need to be done with care, and that usually means not by a single worldwide edit. In many cases there is ample scope to improve map data in the vicinity of the initial object of interest, you can see my other edits around bat bridges here.

So before making changes to these low usage tags here are the questions I ask myself:
  1. Am I improving the semantic value of OSM by changing a tag?
    If the meaning of a tag is obvious (as this one is) and you do not know of a more widely used alternative leave it alone.
  2. Has the tag been accepted by other mappers?
    If a tag has been added and other mappers have edited the object subsequently, then this implies some degree of acceptance.
  3. Am I fixing something?
    If it ain't broke dont fix it.
  4. Is the tag related to other more widely used tags?
    Often one can recognise that a rarely used tag has properties similar to more commonly used tags (e.g., builder cf. architect; maker cf. manufacturer). Using an existing pattern and extending it is a great way to create new keys/tags because it's easy to work out how they might be used.
  5. Am I clear about the intentions of the original tagger?
    If not I'll ask them or leave it for another day. Our new changeset discussion feature is really valuable for this.
  6. Do I have appropriate knowledge to make the edit?
  7. Should I try and document the usage rather than change it?
    Often a little bit of documentation helps everyone.
I also try and keep my changesets local, and document what I have changed to make it easy for someone to see what has changed. However, I mainly don't do this kind of thing: the marginal value of adjusting tags is usually tiny compared with active mapping! (It's also not as much fun).

Choosing the abstraction level of a tag

Bat bridges are so obviously a niche structure that it immediately made sense to me to continue with the man_made=wildlife_crossing used by the original mapper, particularly as this was already documented. However such a choice is not always obvious.

In fact a general trend I have noticed in OSM is that it is often a more specific tag which gains acceptance than a generic one. Sometimes this occurs because the use case for the specific tag is very broad and it directly maps onto common ways of describing things. Often a more generic tag actually moves work from data consumers to mappers (for instance bus stops, wind turbines etc), or worse degrades existing semantics. These more generic tags often do work, but usually only with specific editor support (power is a good example). Generic tags tend to be chosen to strongly support a single use case, but OSM data is often consumed in ways people don't imagine, so an apparently more generic tag may hinder some uses.

I've made this short note because I've noticed a couple of discussions on the tagging mailing list which in my view are tending towards over generalising tags. A couple of examples are: tourism=camping (to include places which are not camp sites) and amenity=fuel (to include all kinds of fuel, rather than just petrol/gasoline/diesel for motor vehicles).

As I stated above, at the start of this section, finding an appropriate abstraction level for representing data is often difficult even in commercial database design. In my experience people often do not realise that it is possible to describe the same system perfectly accurately using completely different approaches to abstraction. I have found this a fascinating subject for quite a while. The development of tags in OSM provide many interesting examples. BUT, it is a very broad topic which I hope to cover at some stage.

Conclusion


Many of these tags are not wrong; on the contrary many represent mappers trying to codifiy how to represent some of the less obvious things we encounter. Removing them without considering this, has the same effect as zapping hedgerows for bats: it reduces diversity in the OSM ecosystem.

No comments:

Post a Comment

Sorry, as Google seem unable to filter obvious spam I now have to moderate comments. Please be patient.