Development through Data Science for Somaliland

Somaliland is self-declared independent from Somalia: it has been autonomous since 1991 but is internationally unrecognised. It is democratic and has among the highest levels of poverty in the world. Somaliland’s government and industry have recognised that ICT is a great opportunity for sustainable economic development. Somaliland has all key attributes to develop in this sector bar one. The key ingredient that is missing is local, skilled workers. This is a key issue identified by the local government and business community.

Prof Velthuis and Dr De Sio from the UoB particle physics group travelled to Hargeisa to start a project with the University of Hargeisa, UoB’s strategic partner Transparency Solutions and local industry partners including Telesom and NGOs like Candlelight and ADAM Academy to boost capability in data analysis and data intensive research. More information on the project can be found here.

The Transparency Solutions team with Dr De Sio and Prof Velthuis.

Data Science Course

Dr De Sio and Prof Velthuis delivered a course on Python programming and Machine Learning to students coming from universities, NGOs and industry. The course covered Python Programming from the basics to experienced user level and provided an introduction to Machine Learning techniques. After an initial in person lecture at the University of Hargeisa, the course was taught in 15 online lectures and assessed by two assignments.

Dr De Sio presenting her work on leaf position reconstruction in upstream radiotherapy verification using r-UNet, a deep learning based architecture at the University of Hargeisa.

Applications talks

The lectures were accompanied by 10 lectures from experts working in the field to show applications of data science.

  • Mr Mark Gibbons – How we use data at the Centre for Sustainable Energy
  • Dr Richard Hugtenburg – Monte Carlo in medical physics
  • Dr Jaya Chakrabarti – Project VANA: AI, data and the environment
  • Dr Daniel Saunders – Data Science Applications: from Physics, to Gaming, to Crime Fighting
  • Dr Christian Thomay – Data in the Cockpit: Enhancing Pilot – Training with Eye Tracking and Data-driven Feedback
  • Dr Valerio Maggio – Teaching Machines to Recognise Emotion. An interactive online system to showcase human-in-the-loop AI
  • Dr Sudan Paramesvaran – The data challenge at CMS
  • Mr Dan Laasna Reuter – Why data is sexy
  • Dr Chris Lucas – AI in Practice
  • Dr Leonor Frazão – Using AI to Fight Financial Crime

Assessment

The course was examined through two assignments. Successful completion of the first assignment was required to pass the course. This required analysis of data obtained with the HiSPARC detector network. The second, more advanced, assignment required analysis of the number of phone operators required in a call centre or analysis of travel times in a lift. Successful completion of the second assignment was required to pass the course at the advanced level.

The candidates who successfully completed the course were:

  • Abdiaziz Harun Mohamed
  • Abdillahi Sh. Ahmed Farah
  • Yasin Nour Moge
  • Salah Mohamed Osmaan
  • Khalid Ahmed

Khalid Ahmed

The candidates who successfully completed the course at the advanced level were:

  • Siraaj Jama Abdilahi
  • Abdirahman Hussein Mahad
  • Abdirizak Hussein Guled
  • Ahmed Mohamed Warsame
  • Mustafa Ahmed Muse
  • Mohamed Ibrahim Abdilahi
  • Mohamed Ali Mohamed
  • Abdalla Mohamed Yusuf
  • Abdirisaq Yusuf Adani
  • Ahmed Suleiman Bashir

Left to right: Abdirizak Hussein Guled, Mustafa Ahmed Muse, Mohamed Ibrahim Abdilahi

More information on the project can be found here.

JGI Seed Corn Funding Project Blog 2021: Lucy Biddle

Can sharing app data facilitate communication between young people and their mental health practitioner?

Bridget Ellis, Lucy Biddle, Helen Bould, Jon Bird

Mental health problems are increasing among young people, who have the highest prevalence of mental health problems among all age groups [1]. Despite the adverse outcomes that result from this, young people access mental health services at a lesser rate than other age groups [3], with barriers including communication, poor mental health literacy, embarrassment, fear of stigma and confidentiality concerns.

Research illustrates that digital peer support can help people with mental health difficulties [2] and the increased availability of mobile technologies is now being harnessed to deliver mental health support.

Our project was a collaboration with the company that created the award winning, NHS-endorsed young person’s mental health app, ‘Tellmi’ (www.Tellmi.help).  The app is a fully moderated peer support environment, where young people anonymously share ‘tweet’ style posts about their emotional and mental health difficulties. A holistic dataset builds up for each individual which could have potential clinical value if shared with a healthcare practitioner. For example, the posts can be tagged for content, rated for severity, displayed longitudinally and presented in a shareable summary document.

Previous feasibility survey and interview data investigated the views of young people who used the Tellmi app, and Child and Adolescent mental health services (CAMHS) clinicians about the acceptability and utility of sharing such a summary document during mental health consultations as a means of enhancing the clinical exchange. Our current study had two aims: i) to carry out in-depth thematic analysis of this previously collected data; and ii)  to form a multidisciplinary working group and convene a one-day workshop to present and discuss our findings as preparation for a full-scale research proposal.

We conducted thematic analysis on interviews with five young people and four healthcare practitioners, and 120 survey responses from users of Tellmi.

“So I think finding the words and putting them on Tellmi makes it easier to be able to say them to someone who is in front of you”

A theme was identified surrounding communication and how a summary document could be utilised to facilitate this between young people and healthcare practitioners. A concern raised by young people was that the way they communicate varies upon the audience they are communicating with, meaning a summary of the posts which they intended to be seen by peers may contain information they may not usually present to a clinician. Young people appear to value the written communication of Tellmi and were enthusiastic about how this could help to provide a focus and inform clinical sessions. For young people who struggle when trying to communicate their levels of distress with a clinician, this could be overcome through making it possible to share their experiences through their Tellmi posts. Additionally, providing a written account of how young people have been feeling may help to bridge a gap between the more honest and open information that is disclosed anonymously and that that is disclosed face-to-face with a clinician. However, young people did raise concerns about how this written information could be misconstrued or misinterpreted.

“If I feel comfortable with them then I’ll be more likely to share but if I don’t feel comfortable then I would not share”

We found that trust would play a key role in the process of sharing. This was not only trust between a young person and their clinician but also trust between a young person and Tellmi and how sharing could change how young people engaged with the app going forward. Clinicians also raised questions about trusting the Tellmi app, in particular how successfully an algorithm can identify risk or how the data being shared may be monetised.

“Tellmi posts do tend to be quite personal and honest and open because you expect to be talking to someone who isn’t really there so you can say whatever you like and there’s no judgement”

Young people seem to really value Tellmi as a safe space. This safety appeared particularly facilitated by the anonymity it provides. Young people were concerned about how their data may be handled if it was no longer anonymous and being shared with clinicians.

We also found practicalities surrounding sharing that would need to be addressed. For example, young people required control over their data and how it is used and shared. The potential of young people censoring the information they present to their clinician was also discussed. Additionally, the impact that revisiting old posts may have on young people was considered. Factors specific to clinicians could also impact sharing, with time being a concern for both clinician and young people.

“I think it’s [sharing a Tellmi summary] a great idea but the young people would need to have complete control of the information that is included to avoid endangering young people”

Workshop

Our multidisciplinary working group consisted of three researchers from computer science and health sciences, two child and adolescent psychiatrists, representatives from Tellmi, and two young people with lived experience of mental health difficulties. We presented our findings from the thematic analysis then used discussion sessions and group work to consider implications for the design of future research. We discussed how data sharing is likely to be most beneficial; how acceptability can be enhanced for young people and clinicians; stakeholders’ evaluations of the dummy data summary document of Tellmi posts, including methods of data visualisation; and potential barriers to data sharing in practice.

Discussion of the design for a user study of Tellmi data-sharing in practice identified this would involve varied stakeholders, including Tellmi users, researchers and clinicians. It was noted that recruitment could bring challenges and discussion sought to identify the most appropriate pathways for recruiting clinicians and young people in paired groups so that both perspectives can be captured for each case of data sharing. If recruiting through clinicians, it was noted that young people may not be Tellmi users or have enough data to produce a summary document. A suggestion for overcoming this was to ask young people to engage with Tellmi while on a waiting list. However, one of the lived experience advisors highlighted a challenge: I think another issue with recruiting people through NHS is that no matter how good an app is, if you are young person on a waiting list and a clinician says, Use this app, its like, No, I want you to help me and why am I going to use an app?. Alternatively, we discussed recruiting through the Tellmi platform and young people approaching their clinicians to get involved. However again, there would be challenges with this approach such as obtaining ethical approvals and clinical ‘buy in’ where relevant young people could be based all over the country.

We also discussed the practicalities of sharing and how a study procedure would be designed, focusing this discussion around the implications for design highlighted in our thematic analysis. This encompassed details determining how the process of sharing would actually take place. For example, we considered whether the summary would be shared as a physical document or an electronic copy, and whether this should be given to the young person to present to their clinician or be sent directly to the clinician. When to share is also a key consideration, our data showing young people have varied views around this and whether sharing should be repeated, and if so, at what frequency. Additionally, methods of improving and encouraging sharing were discussed, as well as the overall design of a summary document and how this could be altered to ensure inclusivity for special educational needs.

Key to designing a research study were methods of evaluation and establishing outcome measures. Young people and clinicians flagged a range of potential outcomes. These included completing clinical tasks such as goal setting, and how successful a young person may consider a session “something else to measure would be how the young person feels coming out of the appointment. Has it empowered them or let them take control of their healthcare”. The view of the young person was considered key in determining how outcome would be measured “it’s just making me think what is the actual point of sharing the data again? I guess that depends on the young person”.

The workshop provided a space for exciting discussion with input from stakeholders from different backgrounds. While we hoped it would allow elements of co-design to inform development of a data sharing document and research plans to evaluate this, challenges were raised which suggest further development work may be necessary before the process of sharing can be evaluated. The ideas and issues raised at our workshop will be explored through our continued collaboration with Tellmi.

The workshop was incredibly insightful. It provided us the opportunity to discuss the findings of the study with a diverse group of experts including academics, clinicians and young people with lived experiences of poor mental health. It has helped us to completely rethink how to approach the problem and we look forward to continuing to work with the Bristol team.” Kerstyn Comley, Tellmi Co-CEO

JGI Seed Corn Funding Project Blog 2021: Dr Josh Hoole

Exploiting Data to Support UK Search and Rescue

Dr Josh Hoole, Dr Oliver Andrews, Dr Steve Bullock

Introduction

Various UK organisations provide 24/7 Search and Rescue (SAR) capability year-round across land, sea and air. Data analytics provides a key route to supporting SAR operations and aerospace system design in the future.

Aims of the Project

The aim of this project was to explore what data is available to capture the variability present in SAR operations (including mission characteristics and weather) to help support the future design of aerial systems to support SAR. This aim was to be achieved using the following objectives:

Engagement with search and rescue organisations to establish:

  • Availability of data for characterising SAR mission profiles
  • Perceptions on developing Unmanned Aerial Vehicles (UAVs or ‘drones’) to support SAR

Data fusion across asset tracking data to characterise SAR mission profiles:

  • Exploitation of aircraft and vessel trajectories
  • Combining mission profiles with meteorological data

This project therefore lay at the exciting and valuable intersection between data science, aerospace systems, weather and climate analysis and SAR.

Results

To date on the project, the following activities have been performed supported by the Seedcorn Funding:

Data Workshop with the Royal National Lifeboat Institution (RNLI)

A one-day workshop was held with the RNLI Data team at the RNLI College in Poole. Within this workshop, areas of interest and ideas were shared spanning the exploitation of data for mission analysis, future planning and the use of computer vision to support lifesaving activities. The University of Bristol team were simply amazed at the large amount of data-driven work performed by the RNLI and look forward to establishing stronger links between the RNLI and research institutions in the future (see contact details below).

There was also a tour of the RNLI’s training and lifeboat manufacturing facilities as part of the workshop to provide context to the RNLI’s activities. The Bristol team were overwhelmed by the vast and diverse capabilities present in a single location and thoroughly recommend a tour of the RNLI College and All-weather Lifeboat Centre.

RNLI Workshop Participants at the RNLI memorial

 

RNLI All-weather Lifeboat Centre for Lifeboat Manufacture and Maintenance

Initial Assessment of Vessel Tracking Data

Maritime vessels are equipped with real-time tracking capability via Automatic Identification System (AIS) installations. Historic AIS data provides vessel trajectories which can be post-processed to characterise the mission performed. Building on prior work in the literature, an initial investigation into processing the AIS trajectories of RNLI lifeboats has been performed using data sourced from MarineTraffic. Using simple algorithms, AIS trajectories can be processed to identify the occurrence of lifeboat search manoeuvres and generate characteristics regarding the search operation (e.g. search time, search area, etc.). It is intended that such characteristics can be used in the future to support the post-mission reporting performed by the RNLI.

Identification and characterisation of search areas within lifeboat trajectories (data source: MarineTraffic)

Data Fusion to Enhance SAR Helicopter Tracking Data

A large number of aerospace vehicles are also equipped with real-time tracking capability via Automatic Dependent Surveillance-Broadcast (ADS-B) equipment. However, as a line-of-sight system, ADS-B derived trajectories are often lacking in the regions where SAR operations take place, such as at low altitude, close to obstructions or out at sea. SAR helicopters are also equipped with AIS equipment, permitting ADS-B and AIS data sources to be fused to greatly increase the coverage of SAR helicopter trajectories. The ADS-B/AIS fused trajectories can then be further processed to generate mission characteristics as for the maritime vessel trajectories.

Fusion of ADS-B and AIS trajectories for SAR Helicopters (data sources: Opensky Network, MarineTraffic)

Future Plans

Exploitation of Meteorological Data Products

Following completion of the SAR mission characterisation via AIS and ADS-B data sources, the project will intend to couple the trajectories to meteorological data products to fully characterise the SAR operational environment. This level of data fusion could support automated post-mission reporting, draw correlations between the search characteristics and operating environment, as well as support future planning with respect to the impacts of climate change on UK SAR operations.

Engage further with Inland SAR Organisations (PhD projects)

So far, the project has focused on maritime SAR. Future work will engage with inland SAR organisations to a greater extent and initial links have been formed with the relevant organisations. Dr Steve Bullock has successfully secured funding for two PhD students in the area of SAR planning for UAVs and these project will aim to leverage the expertise from the SAR connections made during this project.

Future SAR Data Research Partnerships

The workshop with the RNLI highlighted a significant number of data-centric avenues that could be pursued within future research projects, including aspects of machine learning, computer vision, weather and climate, along with mission analysis. A future workshop is planned, and researchers from across the data community at the University of Bristol are encouraged to participate, so please get in touch via the contact details below. The University of Bristol team are also very keen to explore collaborative partnerships within this area with other research institutions (GW4 and beyond) and SAR organisations. Please send any expressions of interest regarding future opportunities to the contact details below.

Contact Details Dr Josh Hoole, Department of Aerospace Engineering, University of Bristol, josh.hoole@bristol.ac.uk

Medfluencers: how medical experts influence discourse about practices against COVID-19 on Twitter

Introduction

This project aims to investigate the role of medical experts on Twitter in influencing public health discourse and practice in the fight against COVID-19.

Aims of the Project

The project focuses on medical experts as the driving force of Network of Practices (NoPs) on social media and investigates the influence of these networks on public health discourse and practice in the fight against COVID-19. NoPs are networks of people who share practices and mobilise knowledge on a large scale thanks to digital technologies. We have chosen Twitter as a focus of our analysis since it is an open platform widely used by UK and international medical experts to reach out to the public. A key methodological challenge that this project seeks to address is to extend existing text analytics and visual network analysis methods to identify latent topics that are representative of practices and construct multimodal networks that include topics/practices and actors as nodes and Twitter affordances as edges of the network (e.g. retweets, @metions). To address this challenge, the aims of this project are:

  1. Build a machine learning classifier of tweets that mention relevant practices in the fight against COVID-19.
  2. Build a machine learning classifier of authors of tweets, which can distinguish between medical experts and other key actors (e.g. public health organisations, journalists).

Results

1. Data Collection

We used the report from Onalytica to identify the top-100 influential medical experts on Twitter.   After receiving academic access to Twitter API, we collected a total of 424,629 tweets from the official accounts of these medical experts with the R package academictwitteR from 01 December 2020 to 02 February 2022.

2. Build a machine learning classifier for relevant practices

After cleaning the data set, we randomly selected a sample of 1,200 tweets, which was then manually coded as either “relevant” or “non-relevant” by two independent coders and employed to train the Machine Learning classifier. By relevant we mean representative of relevant practices in the fight against COVID-19 (e.g., wearing a mask, getting a vaccine). After training and testing a series of algorithms (support vector classifier, random forest, logistic regression and naïve Bayes), a support vector classifier (SVC) gave the best classification results with 0.907 accuracy. To create the inputs to the classifier, we used a sentence transformer to convert each tweet to feature vector (a sentence embedding) that aims to represent the semantics of the tweet (https://www.sbert.net/). We compared this to a feature vector representing the number of occurrences of individual words in the tweet, achieving lower performance of 0.873 with a random forest classifier. For reference, the baseline accuracy when labelling all classes as relevant is 0.57, showing that the classifier can learn a lot from simple word features. The performance of SVC, random forest and logistic regression was similar throughout our experiments, suggesting that the choice of classifier itself is less important than choosing suitable features to represent each tweet. We applied the chosen SVC + sentence embeddings classifier to the remaining sample of tweets, resulting in 235,320 tweets that were classified as representative of relevant practices.

3. Topic modelling

We employed a topic modelling analysis to gain a better understanding of the types of practices that were discussed by the medical experts. After testing a number of indexes, we found 20 latent topics, were present in our data. We therefore employed a LDA (Latent Dirichlet Allocation) topic model analysis with 20 topics (Figure 1).

Figure 1. Output of Topic Modelling Analysis with 20 Topics

We selected 9 topics related to significant practices linked to the fight against Covid-19:

  • Topic 1 is about vaccines
  • Topics 2 and 16 are about global health policy/practices
  • Topic 6 is about prevention of long covid in children
  • Topic 9 is about immunity (either natural or vaccine-induced) against variants, hence related to COVID-19 public health measures or practices
  • Topic 13 is about reporting of COVID cases and therefore linked to effectiveness of public health measures or practices
  • Topic 18 is about masks (in schools)
  • Topic 19 is about testing
  • Topic 20 is not about a “public health” practice but a scientific practice about sequencing
4. Build a machine learning classifer of authors of tweets

One hundred twitter bios of medical experts were not enough to build the machine learning classifier. Therefore, we upsized our sample by including the bio descriptions of accounts that medical experts followed on Twitter. This strategy allowed us to include bios of users who were not medical experts and therefore differentiate between medical and non-medical “experts” or influencers. We collected the “following” accounts with the R package “twitteR” resulting into a total of 315,589 bios. We randomly selected a smaller sample for the manual coding of these bios (2,000). Following an inductive approach, two independent coders manually coded the bios into labels that classified individuals by their job occupation/profession and organizations by their sector or mission. The label “non classifiable” was used for bios that could not be classified in any professional or organizational category. This resulted into a total of 188 labels which were then aggregated into higher-level categories resulting into a final list of 49 labels.

Future plans for the project

We will use the coded sample of Twitter bios to train a Machine Learning classifier of authors of tweets. We will apply for further funding to improve our methodology and extend the scope of our project, for example, by including more medical conditions, non-English speaking countries, and other platforms in addition to Twitter. We will improve the methodology by identifying experts from a sample of collected Tweets by relevant topics representative of practices and sample and classify authors’ bios from this sample. This will allow us to have a more representative sample of individuals and organizational entities that are active in the public health discourse related to COVID-19 and other medical conditions on Twitter. We will classify practices and then map classified practices and authors onto a network to conduct a network analysis of how medical influencers affect discourse about public health practices on Twitter.

Contact details

For further information or to collaborate on this project, please contact Dr Roberta Bernardi (email: roberta.bernardi@bristol.ac.uk)

Mapping the linguistic topography of Sophocles’ plays: what Natural Language Processing can teach us about Sophoclean drama

Benjamin Folit-Weinberg, A.G. Leventis Postdoctoral Research Fellow (Institute for Greece, Rome, and the Classical Tradition & Department of Classics & Ancient History, University of Bristol) and Justus Schollmeyer, Data Scientist & Programmer

Scholars have long recognized that Sophocles, the great 5th Century B.C.E. tragedian, repeats thematically important words in his plays and that studying these repetitions can offer fundamental insights into his work. At present, however, identifying these repetitions is time-consuming and unsystematic, and the significance of specific repetitions is not always clear. Our project applies Natural Language Processing (NLP) and data visualization techniques to help scholars of Sophocles both identify linguistic patterns more efficiently and rigorously and interpret the significance of these patterns more insightfully.

Seed Corn funding provided by the Jean Golding Institute allowed us to create a feasibility prototype for an NLP and data visualization tool with several functions. The first function is heuristic and identifies the words or word families that appear most frequently in each of the seven fully extant plays of Sophocles. The second function is analytical and calculates how frequently a given word or word family is used in a specific play by Sophocles compared to the remaining six plays. The third function is hermeneutic and depicts the distribution of selected words within a specific play (see diagram below); the chart will ultimately include various overlays that demarcate units of the play and articulate relationships between uses of key words.

The successful development of this feasibility prototype has enabled us to apply for further funding to develop our tool; our goal is to make this available as a common good to anyone with an internet connection, regardless of their institutional affiliation or programming literacy. We are also exploring the possibility of scaling up our tool to address the entire 5th Century Athenian dramatic corpus and other corpora of texts from Greco-Roman antiquity.

For further information, please contact b.folit-weinberg@bristol.ac.uk

Prototype map of use of selected words in the tel- word family in Sophocles’ Oedipus at Colonus