Across a five day workshop in July 2018, the 138th European Study Group with Industry brought together mathematicians and industrialists to work side by side to solve the real and important issues that companies are facing today.
GW4 Seed Corn Funding enabled the free attendance of a non-profit organisation: NHS Digital.
This problem was focused on the development of automatic classifiers of dialogue between a member of the public calling an NHS helpline, and the call handler. Clinicians report that when callers feel less anxious, they are more likely to listen to the advice given, and follow it. A simple method for classifying the distress level of a call as `good’ or `bad’ could be used to provide real-time call handler feedback.
This was one of the most popular problems among study group participants and we had a lively group working on it for the whole week.
Due to data protection, actual NHS helpline phone calls were not available for analysis by study group participants. Instead, we were provided with transcripts from radio programs consisting of heated political interviews (representing `bad’ calls) and relaxed conversational discussion (representing `good’ calls). Our initial data exploration revealed significant differences between these two categories of conversation.
We used the data provided to explore various `features’ that, applied to a conversation, might indicate which category it was in: Number of words in a turn, Duration of a turn, and Inter-speaker gap length, Inter-word gap length, speaking rate (words/second), and turn-taking rate (turns/minute), interruptive behaviour, and mimicry. Any of these features could be relevant when assessing real NHS call data.
We were able to classify the conversations as `good’ or `bad’, using a number of approaches trained on a subset of the data and tested on a separate subset: Bayesian updating scheme; a hidden Markov model; and a classier based on the K-S statistical test. In addition, we had success with a scorecard method that characterises the trajectory of a dialogue as a sequence of coordinates in the feature space. All the methods were able to generally distinguish between the `good’ and `bad’ conversations. The Bayesian method achieved correct classification noticeably faster than the hidden Markov method and the K-S test classifier. Whilst less rigorously tested, the scorecard method showed great promise, with classification achieved relatively quickly.
The methods outlined here, including the determination of useful features, now need to be repeated on real NHS call data which has been classified according to call outcome, where the differences between call types may be more subtle. These findings demonstrate great potential for development of early intervention to influence the outcome of a dialogue.
Because this was such a popular problem to work on, our contact at NHS Digital got to meet a number of people who were able to solve other problems. For example, they are currently in discussions with on participant about potentially working on a resource planning project for NHS trusts.
Blog written by Dr Lorna Wilson, Institute of Mathematical Innovation, University of Bath.
The European Study Group was held at the University of Bath, organised by the Institute for Mathematical Innovation (IMI) in collaboration with the Engineering Mathematics Department at the Faculty of Engineering at the University of Bristol. The event was sponsored by IMI, GW4 and the Jean Golding Institute at the University of Bristol.