Understanding and improving the reliability of disease monitoring in GP surgeries is the extensive, yet critical task taken on by the team of researchers at Royal Society of General Practitioners (RCGP) and the University of Surrey headed by Professor Simon de Lusignan and Dr Mark Joy. With this goal in mind, the team challenged participants of the first Turing Network Data Study Group to attempt to develop a predictive algorithm using machine learning that corrects sub-optimal data allowing for better disease monitoring. In part two of our blog series focusing on a Challenge Owner’s perspective of the Turing Network Data Study Group, Professor de Lusignan and his team tell us about their experience of the DSG and the challenge they presented: Improving our ability to use routine data to inform the management of key disease areas. You can read part one of this series, where we spoke to another Challenge Owner, University of Bristol’s Danielle Paul about her experience of the event on the JGI blog. Challenge Owner team – who was involved?
- Prof Simon de Lusignan, University of Oxford/University of Surrey/Royal College of General Practitioners
- Dr Mark Joy, University of Surrey
- Rachel Byford, University of Oxford/University of Surrey
- Dr John Williams, University of Oxford/University of Surrey
- Dr Nadia Smith, University of Surrey/National Physical Laboratory
Can you give us a brief overview of the challenge you presented to the Data Study Group participants? It is essential to monitor blood pressure in various chronic diseases (e.g. heart disease, diabetes, etc). However, GPs tend to indicate certain biases in recording measurements, for example a preference for round numbers. We have 47 million blood pressure readings and 7 million glycated haemoglobin (HbA1c) readings (a measure of diabetes control) and we were interested in finding the true blood pressure and HbA1c trends from the inaccurate data, comparing trends for different groups of patients (e.g. on various medications). Participants were challenged to attempt to develop a predictive algorithm using machine learning that corrects suboptimal data allowing for better disease monitoring. The challenge ended up being split into three sub-challenges:
- Identifying whether a case is a new (incident) or a follow-up (prevalent) when this information is not recorded in the computerised medical record
- What is the true underlying blood pressure (BP) in a population where there is marked end-digit preference for zero, when data are recorded?
- What is the trend in diabetes control when there is additional testing at the time of ill health?
What kind of solutions did the challenge team come up with? The solutions suggested to the three sub-challenges were as follows:
- Tree classifiers for classification as this is essentially a binary classification problem (is a GP visit a follow-up or a new, incident visit?); decision trees and random forests for classification of episodes into new and ongoing; data driven approaches to finding threshold and min-max range of number of days between two episodes per diseases.
- Latent variables, time series ideas
- Bayesian-type approach with an iterative procedure for uncovering the posterior (incorporating Neural Network classifiers for patient characteristics)
What are your hopes for the potential applications of the team’s findings from this week?
We have two members of the group interested in carrying on this work. We hope to explore further the team’s approach to Sub-challenge 1 as we feel this is a promising area for further exploration. The team’s contribution to Sub-challenge 2 is already planned to be incorporated in to the RCGP report to Public Health England. It increases the scope and applicability of this report on the nation’s health in certain key disease areas. Sub-challenge 3 was arguably the more difficult challenge, and the team’s feedback has led us to reconsider how we engineer our data to better address this prediction problem. As a Challenge Owner, what was your favourite part of the Data Study Group week? New perspectives, the opportunity to make more use of our data. We enjoyed engaging with the enthusiasm and energy of the team. Our favourite part was listening to the presentations at the end of the week. Were there any surprises for you at the event? How narrow population health and epidemiological technique are compared with the wealth of ideas and approaches available. Is there anything else you would like to tell us? Two members of the group have been in contact about continuing this work. One to work on “episode types” the other on end-digit preference in blood pressure recording. The event was immensely enjoyable, truly challenging for the team members, and a joy to participate in.
The Alan Turing Institute and Data Study Groups
The inaugural Turing Network Data Study Group was hosted by the Jean Golding Institute at the University of Bristol – one of The Alan Turing Institute’s 13 partner universities in August 2019. The event united six Challenge Owners with 50 students, postdocs and senior academics to tackle real-world data science challenges spanning a variety of fields, from spectroscopy and analytical chemistry to text mining and digital humanities. Building on the popular Data Study Groups (DSGs), held three times a year at Turing HQ in London, this ‘Turing Network’ event was the first of its kind to be hosted by a partner university. It followed the tried-and-tested format of a five-day collaborative hackathon. The Challenge Owners – organisations from industry, government and the third sector – provided real-world data challenges that were tackled by small groups of highly talented researchers. The results were presented on the final day. Find out more about Data Study Groups, including how you can get involved as a researcher or Challenge Owner on The Alan Turing Institute website