John Boughton (Municipal Dreams
) was recently looking
for streets named after Christopher Addison
a pioneer ofpost-WWI housing legislation
in Britain. It was easy to find all the roads with Addison in the name from OpenStreetMap, but much less easy to spot those which were likely to be named after him rather than other Addisons.
|Merseyside, NW Cheshire & SW Lancs, showing areas of social housing.|
These are concave hull polygons derived from clusters of NROSH postcodes.
In order to reduce the number of roads to be searched one would ideally have information about when the buildings were built, and whether they were built to provide social (council) housing or not. There is limited open data
on the overall age of British housing stock, but no direct information on the original developer of housing. Both are things which may ultimately be of interest to add to OSM, but it will be many years before such information has any utility on a national scale. Furthermore both are hard to check on the ground: at least for the typical mapper.
It occurred to me that one national open data set, that of the National Register of Social Housing (hereafter NROSH
), could be useful. This stopped being maintained in 2013, but provides addresses for millions of houses (approx 4 million in 350k postcodes) as of that time. Given that, since then, very few new homes have been added to social housing stock, and many have been removed, this can identify likely areas of social housing.
The NROSH data therefore seemed a good place to get to grips with clustering in PostGIS, particularly as I had a specific objective in mind.
Clustering NROSH Data
Normally one sees clustering as a means of reducing clutter on webmaps, but it's only relatively recently that I realised that these techniques have great potential for performing various generalisations on detailed geographic data (particularly OSM, which tends to the detail rather than the general).
NROSH data is only geocoded at the postcode level. There may be tens of addresses at an individual postcode or just one. At the outset I treated all postcodes equally ignoring the number of addresses. I was mainly concerned to aggregate them into coherent clusters. I grabbed some code from a GIS StackOverflow question & tweaked it very lightly:
SELECT row_number() over () AS id,
gc AS geom_collection,
ST_Centroid(gc) AS centroid,
ST_MinimumBoundingCircle(gc) AS circle,
sqrt(ST_Area(ST_MinimumBoundingCircle(gc)) / pi()) AS radius
SELECT unnest(ST_ClusterWithin(geom, 100)) gc
To my mind ST_ClusterWithin
is still rather like magic. It groups individual postcodes which are within (in the example) 100 metres of each other. It returns all the clusters in an array, so this needs to be unnested to get each cluster. It is an aggregate function so other columns can be used for clustering (for instance local authority might have been a useful one if I'd included it in the imported data).
I initially experimented with NG8 postcodes: this area of Nottingham (see my last post
) has many council estates built between the early 1920s to the 1970s (see Municipal Dreams blog
for details). Trying with various distances for clustering I found 150 m worked pretty well. In London, and possibly other large cities with many postcodes on a road, this was too high.
The cluster itself is a geometry collection of the original points. It is therefore trivial to calculate a hull for the collection. Fortunately these days ST_ConcaveHull does not break with target percents of less than 99%, and it produced sensible results.
|Odd-shaped polygons for Irby on the Wirral. Individual postcodes only have a few NROSH entries. Presumably both roads around the school were built as council housing, but most have now become privately owned.|
I extended the code to the entire data set. I soon realised that it was excluding areas of social housing sharing a single postcode. As there are some interesting examples
of rural council housing I wanted them in the overall data set too.
My solution was simple: instead of using points I buffered them by 10 metres. This simply ensures that no data gets thrown away in subsequent steps. It does not mean that one gets a very accurate polygon when there are a very small number of postcodes in the cluster (less than 5 perhaps). If actual geocoded addresses are available then it will be possible to produce more accurate polygons. I haven't tested this, but this should be possible for any local authority where a decent number of addresses are mapped on OSM. In my local path there are several areas of Nottingham, Gedling, Broxtowe and Erewash which meet this requirement.
Overview of the Resultant Data
See full screen
Throughout the Midlands, North-west England and parts of East Anglia the data looks pretty sensible. In general I've looked at places I know and checked that the edges of the polygons accord with what I know of housing patterns in those areas. For now I've tried to put a sample area for Notts, Derbys up on umap
|Hampton, Hanworth & Teddington area of SW London.|
In general this is a pretty prosperous part of London. It does pick out some areas of social housing (e.g., near Apex Corner (A312/A junction), but the notion that most of Teddington is social housing is absurd. The 2015 Index of Multiple Deprivation gives a better picture of this area. See additional notes at the foot of the blog.
|NW1 postcode. 150 m clusters with 100 m clusters overlaid.|
Reducing the cluster distance to 100 m greatly improves the elimination of false positives. Places like the Ossulston Estate between Euston & the British Library show clearly. I'm less convinced that this does not result in many false negatives
I have not scrutinised everywhere, but a few obvious oddities I've noticed:
- Sheffield & Redditch seem to be data deficient.
- Areas in London are far too large to be usable, and even reducing the clustering distance does not make a massive difference (see images & commentary in the captions above).
The data is quite large so I havent yet been able to publish it somewhere readily accessible. In the meantime I can share it in various geoformats if you are interested. There's also scope to use IMD & the housing age stats to separate things out a bit more, but I'm just as interested in places which are now predominantly privately owned, but were built as social housing.
I hope this data can be used for various things. In particular, I have long been interested in the possibility of finding Radburn layouts
using OSM data. (Ian Waites has more
on these estates). A reduction in the total areas to search is always valuable. I'm sure other uses will occur to both social historians, and mappers. On the technical side I hope this might also provoke others to explore the potential for clustering in PostGIS: there's lot to learn.
Further Notes on Hanworth, Hampton & Teddington
I looked at this area because I lived in three places here during the 1980s and 1990s.
Hanworth, the area in the London Borough of Hounslow had a lot of social housing, especially north of the Great Chertsey Road. S of the road housing was extremely mixed with small private speculative developments, older properties, infill, fields with horses grazing and so on. I bought a house in this area in 1986 which was built in the 1960s. Today this property appears in the NROSH data, so it has moved from the private to the social sector in the past 25 years. We bought it because the location was convenient (I used public transport and caught the bus at Apex Corner) and the house had been extended and was larger than equivalent properties we had looked at. It was sold in the early-1990s to an family of South Asian heritage, who probably bought it for similar reasons.
Further south is the Hampton Nurserylands Estate. In the 1980s this was full of young professionals, many with young children. However, it changed demographically rather quickly. Many of the original buyers moved out to bigger houses within 2-3 years, and were replaced by older less-prosperous families. I remember looking at a flat here in 1993 and being staggered how much the area had changed in 3-4 years. The houses on the W side of Oak Avenue were social housing in the 1980s. Clearly these changes have continued.
I can't really explain how Teddington has so many social housing postcodes. It is really one of the most prosperous places in Britain.