Tuesday, 19 March 2019

Social Housing polygons for England : generalisation from point data

A likely Addison Street candidate, Cefn Fforest, Blackwood
cc-by-sa/2.0 - © Jaggery - geograph.org.uk/p/2
John Boughton (Municipal Dreams) was recently looking for streets named after Christopher Addison a pioneer ofpost-WWI housing legislation in Britain. It was easy to find all the roads with Addison in the name from OpenStreetMap, but much less easy to spot those which were likely to be named after him rather than other Addisons.

Merseyside, NW Cheshire & SW Lancs, showing areas of social housing.
These are concave hull polygons derived from clusters of NROSH postcodes.

In order to reduce the number of roads to be searched  one would ideally have information about when the buildings were built, and whether they were built to provide social (council) housing or not. There is limited open data on the overall age of British housing stock, but no direct information on the original developer of housing. Both are things which may ultimately be of interest to add to OSM, but it will be many years before such information has any utility on a national scale. Furthermore both are hard to check on the ground: at least for the typical mapper.

It occurred to me that one national open data set, that of the National Register of Social Housing (hereafter NROSH), could be useful. This stopped being maintained in 2013, but provides addresses for millions of houses (approx 4 million in 350k postcodes) as of that time. Given that, since then, very few new homes have been added to social housing stock, and many have been removed, this can identify likely areas of social housing.

The NROSH data therefore seemed a good place to get to grips with clustering in PostGIS, particularly as I had a specific objective in mind.

Clustering NROSH Data

Normally one sees clustering as a means of reducing clutter on webmaps, but it's only relatively recently that I realised that these techniques have great potential for performing various generalisations on detailed geographic data (particularly OSM, which tends to the detail rather than the general).

NROSH data is only geocoded at the postcode level. There may be tens of addresses at an individual postcode or just one. At the outset I treated all postcodes equally ignoring the number of addresses. I was mainly concerned to aggregate them into coherent clusters. I grabbed some code from a GIS StackOverflow question & tweaked it very lightly:

SELECT row_number() over () AS id,
  ST_NumGeometries(gc),
  gc AS geom_collection,
  ST_Centroid(gc) AS centroid,
  ST_MinimumBoundingCircle(gc) AS circle,
  sqrt(ST_Area(ST_MinimumBoundingCircle(gc)) / pi()) AS radius
FROM (
  SELECT unnest(ST_ClusterWithin(geom, 100)) gc
  FROM nrosh_pc_geo
) f
To my mind ST_ClusterWithin is still rather like magic. It groups individual postcodes which are within (in the example) 100 metres of each other. It returns all the clusters in an array, so this needs to be unnested to get each cluster. It is an aggregate function so other columns can be used for clustering (for instance local authority might have been a useful one if I'd included it in the imported data).

I initially experimented with NG8 postcodes: this area of Nottingham (see my last post) has many council estates built between the early 1920s to the 1970s (see Municipal Dreams blog for details). Trying with various distances for clustering I found 150 m worked pretty well. In London, and possibly other large cities with many postcodes on a road, this was too high.

The cluster itself is a geometry collection of the original points. It is therefore trivial to calculate a hull for the collection. Fortunately these days ST_ConcaveHull does not break with target percents of less than 99%, and it produced sensible results.


Odd-shaped polygons for Irby on the Wirral. Individual postcodes only have a few NROSH entries. Presumably both roads around the school were built as council housing, but most have now become privately owned.
I extended the code to the entire data set. I soon realised that it was excluding areas of social housing sharing a single postcode. As there are some interesting examples of rural council housing I wanted  them in the overall data set too.


One of the more uunusal social housing forms, Stoford
cc-by-sa/2.0 - © Nigel Mykura - geograph.org.uk/p/4314325

My solution was simple: instead of using points I buffered them by 10 metres. This simply ensures that no data gets thrown away in subsequent steps. It does not mean that one gets a very accurate polygon when there are a very small number of postcodes in the cluster (less than 5 perhaps). If actual geocoded addresses are available then it will be possible to produce more accurate polygons. I haven't tested this, but this should be possible for any local authority where a decent number of addresses are mapped on OSM. In my local path there are several areas of Nottingham, Gedling, Broxtowe and Erewash which meet this requirement.

Overview of the Resultant Data



See full screen

Throughout the Midlands, North-west England and parts of East Anglia the data looks pretty sensible. In general I've looked at places I know and checked that the edges of the polygons accord with what I know of housing patterns in those areas. For now I've tried to put a sample area for Notts, Derbys up on umap.

Hampton, Hanworth & Teddington area of SW London.
In general this is a pretty prosperous part of London. It does pick out some areas of social housing (e.g., near Apex Corner (A312/A junction), but the notion that most of Teddington is social housing is absurd. The 2015 Index of Multiple Deprivation gives a better picture of this area. See additional notes at the foot of the blog.

NW1 postcode. 150 m clusters with 100 m clusters overlaid.
Reducing the cluster distance to 100 m greatly improves the elimination of false positives. Places like the Ossulston Estate between Euston & the British Library show clearly. I'm less convinced that this does not result in many false negatives

I have not scrutinised everywhere, but a few obvious oddities I've noticed:
  • Sheffield & Redditch seem to be data deficient.
  • Areas in London are far too large to be usable, and even reducing the clustering distance does not make a massive difference (see images & commentary in the captions above).

The data is quite large so I havent yet been able to publish it somewhere readily accessible. In the meantime I can share it in various geoformats if you are interested. There's also scope to use IMD & the housing age stats to separate things out a bit more, but I'm just as interested in places which are now predominantly privately owned, but were built as social housing.

I hope this data can be used for various things. In particular, I have long been interested in the possibility of finding Radburn layouts using OSM data. (Ian Waites has more on these estates). A reduction in the total areas to search is always valuable. I'm sure other uses will occur to both social historians, and mappers. On the technical side I hope this might also provoke others to explore the potential for clustering in PostGIS: there's lot to learn.

Further Notes on Hanworth, Hampton & Teddington

I looked at this area because I lived in three places here during the 1980s and 1990s.

Hanworth, the area in the London Borough of Hounslow had a lot of social housing, especially north of the Great Chertsey Road. S of the road housing was extremely mixed with small private speculative developments, older properties, infill, fields with horses grazing and so on. I bought a house in this area in 1986 which was built in the 1960s. Today this property appears in the NROSH data, so it has moved from the private to the social sector in the past 25 years. We bought it because the location was convenient (I used public transport and caught the bus at Apex Corner) and the house had been extended and was larger than equivalent properties we had looked at. It was sold in the early-1990s to an family of South Asian heritage, who probably bought it for similar reasons.

Further south is the Hampton Nurserylands Estate. In the 1980s this was full of young professionals, many with young children. However, it changed demographically rather quickly. Many of the original buyers moved out to bigger houses within 2-3 years, and were replaced by older less-prosperous families. I remember looking at a flat here in 1993 and being staggered how much the area had changed in 3-4 years. The houses on the W side of Oak Avenue were social housing in the 1980s. Clearly these changes have continued.

I can't really explain how Teddington has so many social housing postcodes. It is really one of the most prosperous places in Britain.


Monday, 11 March 2019

Mapping roof-top Solar Panels

We've all noticed that solar panels have become increasingly frequent on the roofs of residential buildings. It's one of the things I have taken note of ever since I started contributing to OpenStreetMap. However, I had never tried to add any to OSM. Until now!

Solar panels (6925093968)
Roof mounted solar PV panels on a semi-detached house.
There are 18 panel modules (in 2 rows of 7 and one of 4). Each module consists of a 12 by 6 array of solar cells (see below for further discussion).
Photo by Phil Sangwell on Flickr via Wikimedia Commons. CC-BY-SA
A couple of days ago Jack Kelly suggested that perhaps we could use OSM to capture the presence of solar panels across the UK.  A lively twitter discussion ensued.

It seemed sensible to have a go at scoping what was involved. Over the past few months I've been improving the mapping of inter-war housing estates in Nottingham, with the current focus on the Aspley Estate which contains perhaps 2,400 houses. In the course of visits and scrutinising aerial imagery I already knew that there were a fair number of roof-top photovoltaic (PV) panels already installed. Unfortunately on my last visit to get representative photos of the buildings I only caught one house in the background.

Housing in the Aspley Estate: note the solar PV panel on house in background (this one on OSM).
Particularly useful is that the best quality imagery layer available for Aspley is Bing, and it shows the panels clearly. In fact sufficiently clearly that I decided to map them as areas.

In a relatively short time (15-20 minutes) I had found just over 200. Unfortunately I was also reminded that there were quite a few houses in the NW sector of the estate which I had not mapped, so I then spent a while adding houses and addresses, followed by fixing a lot of QA issues pointed out by the JOSM validator tool. Only then was I able to align the houses & solar panels, which took another hour.

This was a little long winded for first-cut mapping, so the following morning I gave myself 30 minutes and searched through adjacent housing estates which I suspected would have a similar density of panels as I had found in Aspley.

Solar Panels on new-build replacement housing at Rutland Close, The Meadows.
A recent example of housing I've surveyed without paying particular attention to solar.
One reason for this is that social housing often has a much higher density of solar panel installation than private housing. Firstly the housing stock is often of similar or identical buildings under one ownership enabling economies of scale. Secondly, owners of social housing are much exercised by fuel poverty of their tenants: reducing fuel costs through providing electricity from solar power therefore has much to commend it. Thirdly, Nottingham City Homes, the at-length housing provider for Nottingham City Council, has a great deal of expertise in applying greener energy polices to their housing. Through an odd coincidence I saw a tweet about what they have achieved as I was doing my second scoping task. This set a target of around 4,500 panels to find in the city.

With my second run just mapping each panel as a node I found 320, or just over 10 a minute. This was partially targeted because I pretty much restricted myself to examining areas of social housing. It therefore represents a rather efficient data acquisition rate.

Jack extrapolated this to the whole country mapped in 5 days by 33 people working full-time. This is of course highly optimistic because I was mapping areas I know well from having being mapping them for 10 years on OSM. However, the OSM community is full of people with detailed knowledge of their local areas, so this ought to apply or many parts of the country. even if it was 5-10 times as much effort, say 1 a minute, 100 people mapping for a couple of hours a week for a quarter might find 150k. This suggests solar panels may be a good subject for a Quarterly Project.

Solar PV panels mapped in Nottingham, via Overpass Turbo

Now, 48 hours after I started, I have added 1760 solar panels to OSM in Nottingham. It's time to summarise what I have learnt. In no particular order:
  • All available imagery layers need to be searched. New installations are occurring all the time and it's unlikely that the better quality imagery layers will be recent enough to enable adequate coverage.
  • Newer imagery, such as the Digital Globe layers can be quite grainy & hard to interpret. However once panels have been spotted it is usually possible to then find many more.
  • The huge variability in available imagery is likely to make any attempt to use machine learning to identify targets is likely to be fraught. I would also expect things like glass roofed extensions would generate many false positives.
  • Knowing where panels are likely to be installed helps a great deal: both at the neighbourhood and building level. Christian Quest's OpenSolarMap used some crowd-sourced information from aerial photos to train  system to identify buildings in France with potential for installation of photovoltaic panels. Such information for the UK could reduce the total number of buildings needing to be inspected.
  • Larger detached houses with solar panels are very difficult (impossible?) to pick out from aerial imagery. Shadows from chimneys, and changing roof lines obscure the presence of the panels.
  • Mapping panels as nodes is the best approach initially. I used ID which has a suitable preset (and checked what others had already done, for instance brianboru around Birmingham). Thereafter I just copied the original node.
  • Adding a tag to show that they are roof-mounted is useful (particularly if the building has not been mapped yet). I've used generator:location=roof. Indicating domestic use might also be helpful. The basic tags I used are also used for complete solar farm installations & clearly it is important to distinguish them. (I've subsequently learnt that generator:place=roof is the established tag).
  • Many installations are sufficiently clear on aerial imagery to allow estimation of the number of panel modules involved. Virtually all the ones at Aspley are 2 rows of 5 modules. Unfortunately I don't know the exact module size, but they are probably 10 or 12 by 6 cells. Tagging the module array explicitly is probably better than guesstimating the area (as I have done).
  • If module size is known (see top photo) the array area can be calculated directly. Each solar cell is likely to be 156 mm square, so a 10 by 6 array will be 1.56 x 0.96 m (1.46 sq m).
  • Cell size, module size and number of modules allow optimal power rating to be estimated. I think these arrays of roughly 15 sq m are around 3500-3700 W.
  • Adding compass orientation of the array in degrees, and angle from the horizontal would also be helpful for using the data for estimating likely power output. (Both could be derived from simple 3D building tags, but adding these is much more complex).
  • The last few items (no. of modules, module size, area, power rating, orientation & angle) represent data which can be added iteratively.
  • It's worthwhile surveying at least some of those added from aerial imagery to capture other information.
  • Even if the panel is mapped as an area, most of these tags are still useful as it is unlikely that enough information will be present on the underlying buildings to derive them.
  • Surprisingly few public buildings, education establishments or industrial buildings have solar panel installations. I've only noticed a few on buildings of Derby Hall, a hall of residence at the University of Nottingham, and a couple of warehouses in Bulwell.
Other than quickly looking for whether anyone had mapped roof-mounted solar panels in the UK, I haven't looked at activity in other countries. There may be places with a more developed approach to mapping and tagging.

A couple of caveats:
  • solar panel distribution is likely to be very patchy; 
  • aerial imagery may not be good enough to pick out panels, or recent enough (many of the Nottingham panels have been installed since 2014). 
To help judge what the latter point may mean I provide below a selection of available aerial imagery of various locations in Nottingham, and Basingstoke (Hampshire Council Open Data). The latter includes false colour infra-red (FCIR). 

Aspley Estate (Bing Imagery), roughly here.

Aspley Estate (Bing Imagery) roughly here.

Aspley Estate (Digital Globe Standard Imagery) location as above

Aspley Estate (Digital Globe Premium and ESRI World Imagery) location as above

Broxtowe Lane (Bing Imagery), about here.

Broxtowe Lane (Digital Globe Premium Imagery), same location as above. Obviously newer as a solar panel can be made out on the terrace in the centre. Note on the next terrace down a dark area on the roof. This does not appear to have the same visual appearance as other solar panels, so may be a solar hot water system. 

Broxtowe Lane, as above (Digital Globe Standard imagery)

Deptford Crescent area, Highbury Vale (Bing imagery). Area bottom right is Highbury Hospital.
No solar panels visible

Deptford Crescent (ESRI World Imagery)

Deptford Crescent (Digital Globe Standard Imagery)

Astrid Gardens, Bestwood Estate (ESRI Imagery).
Just occasionally panels have very strong reflections as here.

Astrid Gardens, Bestwood Estate (Digital Globe Standard Imagery). 

Astrid Gardens, Bestwood Estate (Bing Imagery). 

Bilborough Estate (Digital Globe Standard Imagery)
Note the panel lower right which is pretty hard to pick out.
Britten Road, Basingstoke (Hampshire false-colour Infra-red imagery)
This estate has quite a few solar hot water installations but only a few solar PV. The hot water ones are smaller and less obviously modular.

Britten Road, Basingstoke (Hampshire visible spectrum RGB imagery)

So if there are any other takers I suggest this is a suitable topic for a future UK Quarterly Project, perhaps in association with Jez Nicholson's current interest in solar & wind farms.

Lastly I'd like to thank Jack Kelly & Dan Stowell for comments & ideas whilst mapping & writing this up.




Sunday, 14 October 2018

Creating MapMate Picture Files for Ireland

H34 Base Map for Map Mate Hillshaded
H34 West Donegal: Vice County raster map suitable for MapMate
Source data: (c) OpenStreetMap contributors; hill shading NASA SRTM via viewfinderpanoramas.com

One of the things I'd always planned to do once I had a half decent data set for the Irish Vice Counties was to create detailed raster maps for each of them. This is mainly because I've found using such raster maps useful for my own biological recording using MapMate software. They make choosing the correct recording location much easier, and reduce the hassle in producing more attractive (and communicative) outputs. I've detailed the principles behind the creation of such maps here in the past.

With the first such map I produced I struggled to get it to align properly with the Irish Grid displayed with MapMate. I could upload MapInfo .MIF files which would align, but these offered nothing like the degree of detail I want to show. Furthermore the import process with my copy of MapMate only seems to work with polygons. I tried a variety of ways: mainly trying to tweak the .TAB file format which I'd used successfully with files for Great Britain.So I put the idea to one side.

Very recently, prompted by Julia Nunn, VC recorder for County Down, I re-looked at the problem. Once again I was getting nowhere and  I was on the verge of seeking expert help from Richard Cantwell on the arcana of MapInfo's formats, many of which date back to the 1980s. (Richard works professionally with MapInfo data a lot and has written some very informative articles available from his firm's website.)

Note.  Much of this post will mainly be of interest to users of MapMate. However, towards the end I digress into geonerd territory when discussing the pros and cons of MapMate's technical choices about projections. Also please note that has been sitting in my draft folder for a while before being published.


Saturday, 15 September 2018

A few oddities of OpenStreetMap history files

I've been working with the OSM history data for Great Britain for a while now. This has mainly removing bugs in the initial processing (yup, out-by-one errors mainly) and, as a side effect, working out how to improve the speed of extracting way geometries. En route I have noted a few quirks which may be helpful for others working with the data.

Some of these should be obvious, but I feel that they are worth stating nonetheless.


Sunday, 9 September 2018

Where the streets have no name: Coach Road Estate, Washington

Serlby Close, Usworth. The houses on the left are part of Coach Road Estate
Source: © Alex McGregor at Geograph CC-BY-SA-2.0
Microsoft, like a number of other large tech firms, now has teams working to improve OpenStreetMap. A couple of weeks ago they turned their attention to the UK, and asked about some roads apparently missing street names.

I checked a few of these examples using available Open Data sources, primarily those of the Ordnance Survey. For the most part, our existing mapping seems correct: there is no official street name. Many were places like caravan parks and industrial estates, but one really stood out. This was an area on the north side of Washington, County Durham called Coach Road Estate. As I investigated it soon became apparent that the estate is interesting for other reasons too.


Wednesday, 25 July 2018

Coda on shop completion rates on OSM

Thanks to John Baker (Rovastar) for a few suggestions discussing my recent blog post in the pub last night:

  • What do the graphs of numbers of unique shop tags look like with heavier filtering of relatively poorly used tags.
  • E-cigarette shops are a recent phenomenon, and should represent a genuinely novel tag rather than the mix of typos, synonyms etc which characterise much of the long tail of shop tags.

These were easy to follow up, so I present the graphs here:

Unique shop tags over time on OpenStreetMap for Great Britain,
filtered to remove tags with a restricted number of uses as at June 2017.
For virtually any level of filtering the curves level out around 2010-2011. Thus the core set of shop tags looks to be very stable. A good place to judge the extent of likely synonymy for shops in Britain is the LUA script used by SomeoneElse for his "Useful Maps".

Growth of mapped e-cigarette shops in GB on OSM
As expected e-cigarette shops first appeared rather late, at the end of 2013, and there are a decent number mapped (over 200 by mid 2017). I haven't checked, but I suspect the sharp increase in 2017 was caused by some tagging rationalisation. It's not unusal for new things to acquire a range of synonyms before tagging stabilises and one value becomes favoured. (It's equally true that in some cases this does not happen).

I've had a couple of other requests which it will take rather longer to look at, but if you have ideas relating to shops in Great Britain I can look at the data right now.

Tuesday, 24 July 2018

Can we identify 'completeness' of OpenStreetMap features from the data?

At the Milan SotM conference Stefan Keller from the Geometalab at HSR (Rapperswil) will talk about recent work of his group on identifying "Areas of Interest" (AoI) from OpenStreetMap data. Stefan has been kind enough to involve me in some discussions about this work as it has progressed, but in this post I am solely concerned with a separate issue arising from the use of points of interest in this work.

Growth of shops mapped on OSM for selected Local Authorities
(See Analysis section below for commentary)


Areas of Interest were introduced on Google Maps back in 2016. Loosely they correspond to shopping, entertainment and cultural areas with large clusters of relevant points of interest. No doubt Google not only used map features, but also other sources of data such as location of Android phones to calculate the footprints for Areas of Interest (shown in a pale orange or salmon colour on Google Maps).

There are issues with the Google implementation, some discussed in this CityLab article from 2016. My own examination of Google Maps confirms that shopping areas which are otherwise equivalent in range and type of shops are chosen as AoI in wealthy areas, but not in poorer areas dominated by social housing. I also found some places, notably the UBS IT centre in Altstetten, Zurich, which have erroneously been identified as AoI by Google. The work of Geometalab is therefore interesting not just in terms of whether OSM data can be used to calculate similar areas, but also to provide suitable data where biases based on socioeconomic status can, at least, be identified and corrected because data and code are open.

Zurich, centre and Aussersihl districts, showing Areas of Interest.
Work of Geometalab, derived from OpenStreetMap data.
The starting point for this type of work relies on areas where POI mapping density is high and reasonably complete (for instance, the areas of Switzerland which Stefan's group have looked at, and areas of the English East Midlands and Germany which I have looked at both recently, and in the past). Given that it is possible to calculate reasonable AoIs from OSM data where PoI density is high, the question arises "Can we identify which areas are 'reasonably' complete?". Normally, this type of work has involved comparing OSM data to some external reference data which are assumed for the purposes of comparison to be complete (for instance Peter Reed's work on UK retail). However, in many parts of the world, and for many topic domains there is no readily usable data for this purpose. So the ancillary clause for the question is ", and we do this with OSM data alone?"

This post is a first look at the problem for one class of POIs:  shops.