Senior Researcher in Data Science and Machine Learning
Selected projects: (Most recent first)
DIET – In collaboration with British Gas (energy provider), EDMI (smart meter manufacturer) and MDS (data service provider) and funded by Innovate UK, I was the lead data scientist on a project that aimed to detect energy theft from smart meter data using advanced machine learning techniques. The proprietary method for anomaly detection that I developed in this project has been filed through Oxford University Innovation as British patent application 1713896.7. (Tools employed: Matlab, R, RStudio, Git, Bitbucket, MongoDB, Mongo Atlas)
Strategic Blue KTP – I was the mentor for the KTP associate in data science at Strategic Blue teaching data science techniques in R and RStudio for deployment on shinyapps.io.
CLOUDWATCH – Funded by the European Commission, for this project I performed a detailed analysis of the relationships among the EC's portfolio of funded cloud computing enterprises using advanced unsupervised machine learning methods and the National Institute for Standards and Technology's defining characteristics of cloud computing. The results of the analysis were published in the Journal of Cloud Computing. (Tools employed: Matlab)
INFORM – In collaboration with the Global Canopy Program and the European Forest Institute I developed a program to trace the impact of commercial supply chains on Amazonian deforestation. A paper detailing the methods has been submitted to PLOS ONE. (Tools employed: Matlab and many diverse data discovery and manipulation tools.)
LEFT – In collaboration with University of Oxford's Zoology Department I implemented the Local Ecological Footprint Tool used to assess the impact of mining explorations and prospecting. This work was sponsored by Statoil and resulted in several publications. This was the prototype development for an advanced global ecological impact assessment tool that is still used by environmental impact assessment professionals.
VIBRANT – In collaboration with the Natural History Museum, London, I led the data science and cloud computing initiatives to build a 'virtual laboratory' for taxonomic and biodiversity researchers. I was the principal developer and the research and development team manager.
I am primarily interested in unsupervised machine learning for high-dimensional time series data (i.e. Industrial Big Data or IIoT). I have developed and applied new techniques in fields ranging from species distribution modelling, to system condition monitoring and anomaly detection. I usually work in R or Matlab, but also have experience in many different languages and platform tools. More generally, I am a data scientist; I research interesting problems and implement solutions.
Selection: Most Cited
C Yesson, PW Brewer, T Sutton, N Caithness, JS Pahwa, M Burgess, et. al.
How global is the global biodiversity information facility?
PLoS One, 2 (11), e1124, 2007.
MP Robertson, N Caithness, MH Villet.
A PCA‐based modelling technique for predicting environmental suitability for organisms from presence records.
Diversity and distributions, 7 (1‐2), 15-27, 2001.
A Hardisty, D Roberts, +75 co-authors.
A decadal view of biodiversity informatics: challenges and priorities.
BMC Ecology, 13 (16), 2013.
N Davies, D Field, +63 co-authors.
Sequencing data: A genomic network to monitor Earth.
Nature, 481 (145), 2012.
See https://scholar.google.com/citations?hl=en&user=T0b1MsoAAAAJ for complete list of publications.
(Journal articles 20, Citations 775, h-index 12, i10-index 17, updated February 2018)
Detection of Anomalous Systems, filed on 25 and 30 August 2017 by Oxford University Innovation as British patent applications 1713703.5 and 1713896.7.