Saturday, April 25, 2015

Unemployment of Europe in 2014 by NUTS 2 region

During the Christmas break I worked on some code to show unemployment by NUTS 2 region. At that point no 2014 data was available. When I noticed the 214 was available I dug up the code and plotted again.

Data and Code

As written, the code was made beginning this year. At that point it seemed Eurostat's directories had all moved and hence SmarterPoland was not used. Consequently, it is not used here either. However it seems to be much more easily than doing the data extraction work myself.
The mapping is created based on code found at Mapping Data from Eurostat using R. The main change I made was using as continuous a scale as possible. I wanted to see details in southern Germany, but also let the scale show details in the worst regions. It appears not Greece but southern Spain has the worst unemployment. But since Spain has some better regions overall as country it seems a bit better off than Greece. The color then, are a trade off between red is bad, green is good and an ordering where colors are unique. Hence orange sits between yellow and blue to avoid yellow and blue mixing to deliver green again. I have the nagging feeling that it will be horrific for colorblind people.
In addition, I removed France's overseas parts and added a legend.

# create a new empty object called 'temp' in which to store a zip file
# containing boundary data
# temp <- tempfile(fileext = ".zip")
# now download the zip file from its location on the Eurostat website and
# put it into the temp object
# new Eurostat website
# old:
# new:

# download.file(
# "",temp)
# now unzip the boundary data
# unzip(temp)

EU_NUTS <- readOGR(dsn = "./NUTS_2010_60M_SH/data", layer = "NUTS_RG_60M_2010")
ToRemove <- EU_NUTS@data$STAT_LEVL!=2 | grepl('FR9',EU_NUTS@data$NUTS_ID)
EUN <- EU_NUTS[!ToRemove,]

## OGR data source with driver: ESRI Shapefile 
## Source: "./NUTS_2010_60M_SH/data", layer: "NUTS_RG_60M_2010"
## with 1920 features and 4 fields
## Feature type: wkbPolygon with 2 dimensions

myunempl <- read.csv('lfst_r_lfu3rt.csv',na=':',skip=10)

#EU_NUTS <- spTransform(EU_NUTS, CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs"))

EUN@data = data.frame(EUN@data[,1:4], myunempl[
        match(EUN@data[, "NUTS_ID"],myunempl[, "GEO.TIME"]),   ])

EUN <- EUN[!$X2014),]

plot <- plot(EUN, col = rgb(colorRamp(
                'Dark Blue','Light Blue','Purple','Red'))
 axes = FALSE, border = NA)    

            'Dark Blue','Light Blue','Purple','Red'))(seq(0,1,length.out=50))/255),


  1. Thanks for sharing. I found this very interesting. I would find it easier to interpret the colors with a color scheme associated with discrete steps in the data, e.g. unemployment rote between 0 and 10% in green shades, between 10% and 20% in blue shades, etc. instead of the dividsion into 4 shades you currently have. It looks like a straightforward tweak inside of the rgb() function would do it. Will report back if I can tweak that in finite time. Thanks!

  2. Hi Wiekvoet,

    thanks for sharing your interesting code. I have been trying to run your code, but admittedly I am a junior to R. So if you could help me out, I would appreciate it.

    It hangs very early with the following:
    > EU_NUTS <- readOGR(dsn = "./NUTS_2010_60M_SH/data", layer = "NUTS_RG_60M_2010")
    Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
    Cannot open file

    The rgdal library was installed:

    > library(rgdal)
    Loading required package: sp
    rgdal: version: 0.9-2, (SVN revision 526)
    Geospatial Data Abstraction Library extensions to R successfully loaded
    Loaded GDAL runtime: GDAL 1.9.2, released 2012/10/08
    Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.1/Resources/library/rgdal/gdal
    Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
    Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.1/Resources/library/rgdal/proj
    Warning messages:
    1: package ‘rgdal’ was built under R version 3.1.3
    2: package ‘sp’ was built under R version 3.1.3

    I have R installed:
    R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
    Copyright (C) 2014 The R Foundation for Statistical Computing
    Platform: x86_64-apple-darwin10.8.0 (64-bit)

    Can you help, please, thanks.

    1. Sorry, to avoid most obvious I have also just upgraded to

      R version 3.1.3 (2015-03-09) -- "Smooth Sidewalk"
      Copyright (C) 2015 The R Foundation for Statistical Computing
      Platform: x86_64-apple-darwin10.8.0 (64-bit)

      but the error is still the same as above.

    2. Hi Sailor man,

      I think there are two problems. One is that I accidentally commented out a bit too much to avoid unnecessary downloading. The following three lines should not have been commented:

      temp <- tempfile(fileext = ".zip")

      The other problem is that somehow an additional directory level is needed in that call to readOGR(). The why for that eludes me.

      EU_NUTS <- readOGR(dsn = "./NUTS_2010_60M_SH/NUTS_2010_60M_SH/data", layer = "NUTS_RG_60M_2010")

      Does that help?

  3. Hi Wingfeet,
    yes, it helped and I was able to make progress, thanks a lot.
    Unfortunately now I am stuck again because I do not know from where you get the following file from lfst_r_lfu3rt.csv

    Please help me once again. I looked on the internet, but the casual file I found does has not have the column titles set in line with the following code:

    > EUN@data = data.frame(EUN@data[,1:4], myunempl[
    + match(EUN@data[, "NUTS_ID"],myunempl[, "GEO.TIME"]), ])

    This is the exact point I am hanging again. Thanks again for helping thus far.

  4. Hi Wingfeet,
    I found the file on the Eurostat website. So thanks.

  5. Hi Wingfeet,
    it is me again, this time with real name, always intrigued by your nice visualisation.
    From the Eurostat website, it seems that the file lfst_r_lfu3rt.csv is available in many variants and am not sure which is the right one.
    Can you show me please the head of your file lfst_r_lfu3rt.csv, so that I make it like yours. Thanks in advance

    1. That is indeed sometimes difficult. The A1 cell reads: Unemployment rates by sex, age and NUTS 2 regions (%) [lfst_r_lfu3rt]

      There are 9 rows with miscellaneous information.
      Line 11 then, reads GEO/TIME 2010 2011 2012 2013 2014 for the first six columns.
      It is obvious that the latter columns contain the data. The GEO/TIME column contains codes for the region. It has to be codes, since that is what the shapefile has.My first row has EU28 for the whole EU. It then continues with EU27 etc.

  6. Many many thanks for your help, Wiekvoet. I managed now and am very happy to have made the map, too.