30th April 2019

The openair project

openair started about 10 years ago as a way in which to provide open source, dedicated data analysis tools for the air quality community

  • Widely used internationally by academia, governmental bodies and the private sector
  • Downloaded > 150,000 times
  • Coded in R — a programming language developed for data analysis and statistics
  • Lots of functions for air quality analysis including, directional analyses, robust trends, clustering of polar plots, back trajectories, easy access to AURN, SAQN, WAQN, KCL networks, …

Approach to analysis

  • In some ways it is useful to think in terms of a crime scene investigation
    • Perhaps it is a crime scene — air quality limits set in law exceeded, naughty vehicle manufacturers and lots of early deaths!
  • Thinking more in terms of forensic science
    • Forensic scientists collect, preserve, and analyse scientific evidence during the course of an investigation
    • A whodunnit for air quality scientists — Mrs White in the library … or VW on the Old Kent Road
  • Tricky for air quality because the ‘suspects’ do not have unique fingerprints or DNA
    • Although particle composition might be a close analogy …

Asking questions of your data

General questions:

  • What is it you really want to know?
  • Are the measurements dominated by specific source(s)?
  • What changes occur over time (trends)?
  • How does site ‘X’ compare with other sites?
  • Is there evidence that the measurements are affected by street canyons?

Harder questions:

  • What is the contribution from road traffic emissions?
  • How much are concentrations affected by non-local sources?
  • When will site ‘X’ meet an air quality limit?

Conditional analysis is useful

  • Don’t need to analyse the full data set
  • Filter data for conditions of interest
    • Focus on weekdays, and certain hours e.g. 7 am to 7 pm for traffic data
    • Certain months of the year
    • Other data e.g. vehicle flows
    • Geographic areas
    • By wind direction and wind speed
    • Trends — don’t have to consider all data if question is about recent changes
  • Trade off: more filtering \(\Rightarrow\) less data
    • Considering uncertainties can be useful

Example of Oxford roadside site

library(openair)
oxford <- importAURN(site = "ox", year = 2010:2018)

polarPlot(oxford, pollutant = "nox", cols = "inferno")

Cluster analysis

  • Can be tricky to filter data by ‘eyeballing’
  • Use clustering instead

results <- polarCluster(
  oxford, pollutant = "nox", 
  n.clusters = 4, cols = "Set1"
  )

Temporal variations by cluster

Trends by cluster

Closer look at Cluster 2 (road source)

  • Some evidence that concentrations have levelled off recently
  • Smooth trends are useful in this respect — not all data should have a straight-line trend calculated

Extensions to openair

  • openair itself has not changed hugely over the past few years
  • Preference is to have other focused packages that deal with particular things
  • worldmet for accessing meteorological data
  • openairmaps for polar plots on interactive maps (still a bit developmental), see https://github.com/davidcarslaw/openairmaps
  • Polly’s aqtrends package you have already heard about
  • Stuart Grange’s rmweather package to remove the meteorological variation from air quality data, see https://github.com/skgrange/rmweather

Access to met data

  • importAURN since about 2010 uses a numerical model (WRF) to provide wind speed and direction
  • importKCL with option met = TRUE gives a London ‘mean’ value based on surface measurements
  • R packages exist to access data from the likes of NOAA’s Integrated Surface Database (ISD)
library(worldmet)

# search for 100 nearest met sites based on lat/lon
getMeta(lat = 51.5, lon = -0.5, n = 100)
  • Typically two 10-minute periods in the hour, so not full hourly

Better access to met data

Data from Heathrow in 2019

met <- importNOAA(year = 2019)
met
## # A tibble: 2,710 x 23
##    date                usaf  wban  code  station   lat    lon  elev    wd    ws ceil_hgt visibility
##    <dttm>              <chr> <chr> <chr> <chr>   <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>      <dbl>
##  1 2019-01-01 00:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  284.  3.27     657.      22000
##  2 2019-01-01 01:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  287.  3.43     667.      28000
##  3 2019-01-01 02:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  290   4.1      728       45000
##  4 2019-01-01 03:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  290   3.43     768       45000
##  5 2019-01-01 04:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  276.  3.27     778.      35000
##  6 2019-01-01 05:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  267.  2.77   14917.      35000
##  7 2019-01-01 06:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  265.  2.77   11503       28000
##  8 2019-01-01 07:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  273.  3.6      787.      23000
##  9 2019-01-01 08:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  274.  3.1      627.      30000
## 10 2019-01-01 09:00:00 0377… 99999 0377… HEATHR…  51.5 -0.457    25  275.  2.93     718       26000
## # … with 2,700 more rows, and 11 more variables: air_temp <dbl>, dew_point <dbl>, atmos_pres <dbl>,
## #   RH <dbl>, cl_1 <dbl>, cl_1_height <dbl>, cl_2 <dbl>, cl_2_height <dbl>, cl_3 <dbl>,
## #   cl_3_height <dbl>, cl <dbl>

Polar plots on an interactive map

  • polarPlot(s) on maps are useful for triangulating sources — especially in an industrial emissions context
  • Evidence of a street canyon?
  • Tough test of air quality models
library(openairmaps)

polarMap(polar_data, 
         latitude = "latitude", longitude = "longitude",
         type = "site", 
         provider = "Stamen.Toner", cols = "inferno")

Polar plots on an interactive map

Dominant meteorology

  • Much of the time we are happy to analyse data as we find it i.e. strongly affected by the weather
  • However, meteorology can mask or emphasise things we are interested in and can be misleading
    • Trends affected more by the weather than by changes in emissions
    • Detecting change — is it due to an intervention that affects emissions or by the weather?
    • Many sophisticated change-point methods but the elephant in the room is often meteorology
  • Things would be much easier if the weather was the same every day!

Example from the port of Dover

  • Useful example considering SO2 that can be related to fuel sulphur changes

Effects of meteorology ‘removed’

Concluding remarks

  • openair continues to develop but there has been a move towards producing smaller, more focused R packages

  • No silver bullet for data analysis

    • Tools provide a rapid way of exploring data and asking questions of it
    • Guided by domain knowledge
    • Useful to learn from what others do
  • Attend a training course to learn more!