Widening Participation (WP) Research Summer Internships

The Widening Participation (WP) Research Summer Internships provide undergraduates with hands-on experience of research during the summer holidays, with the aim of encouraging a career in research. Interns gain professional experience and knowledge through a funded placement in their chosen subject. This also supports application for postgraduate study and other research jobs.  

This year, the JGI was very pleased to support four internships through the WP scheme. Each of the interns has provided valuable support to an array of diverse and interesting projects related to their fields of interest. We are delighted by the feedback that we have received from their project supervisors and look forward to watching their future progress. Read on for more information on their projects and their experience.

Frihah Farooq 

Frihah Farooq's poster on Automating the linkage of open access data for health services
Research poster on ‘Automating the Linkage of Open Access Data for Health Sciences’ by Frihah Farooq

My name is Frihah, and I’m a third year undergraduate studying Mathematics here at the University of Bristol. My academic interests centre around applied data science and machine learning, and this summer I worked on a project involving the General Practice Workforce dataset published by NHS Digital. My focus was on building tools that could bring accessibility to data that is often scattered and difficult to navigate. 

The aim of the project was to automate the downloading and linkage of open-access datasets, specifically in the context of healthcare services. Many of these records are stored in files with inconsistent formats and structures, often requiring manual effort to piece together a consistent narrative. I developed a codebase in R that could search for the appropriate files, extract the relevant information, and construct a complete dataset that can be used for longitudinal analysis without the need for repeated intervention. While the code was built around the workforce dataset, the methodology generalises well to other datasets published by NHS Digital. 

One observation from the final merged dataset was the trend of decreasing row counts, likely due to restructuring, alongside an increase in the number of recorded variables, a sign that data collection has grown more sophisticated in recent years. This experience strengthened my foundation in data automation and my ability to work with evolving and imperfect data; skills I know will benefit me as I move further into research. 

If you’d like to get in touch, you can reach me at cc22019@bristol.ac.uk 

Grace Gilman 

Hello, my name is Grace Gilman and I am starting my third year studying Computer Science with Artificial Intelligence at the University of Bath. I am hoping to go into academia in the future and pursue computing research specifically with medical applications. You can contact me at gcag20@bath.ac.uk

Over the six weeks I have been participating in a research internship here at the University of Bristol, supported by the Jean Golding Institute. I have been working on a data science project called ‘Using AI to Study Gender in Children’s Books’, for the team Fair Tales, supervised by Chris McWilliams. During my internship I experimented with image analysis using ChatGPT and Vertex AIi, for future integration into the Data Entry app that Fair Tales is producing to semi-automate character and transcript input. I have also been contributing to the database architecture and search and filtering options for users to interact with the database. Some of my work has been analysing the corpus of children’s books using SQL, one pattern I found was that the difference between mother and father characters(1:0.75) is even more pronounced for grandmothers and grandfathers(1:0.5). 

During my time at this internship, I have become much more confident in my abilities to work on a project as well as code that will be used in a research setting. I have learnt more of how research is conducted and what skills are needed for this, and become more sure of an academic future. 

Imogen Joseph 

I am currently studying a Neuroscience MSci with a Year in Industry at the University of Bristol. I’m going into my final year, having just completed a placement year in Southampton General Hospital undertaking clinical research in neonatal respiratory physiology. I’m particularly interested in a career in academia and more specifically looking at molecular mechanisms behind disease for drug discovery. 

This summer, I helped in the development of an R package, ‘midoc’ (Multiple Imputation DOCtor, found on CRAN), designed to guide researchers in analysis with missing data under my supervisor Elinor Curnow. I created several functions that resulted in the display of a summary table of missing data, alongside optional graphs to visualise the distributions of their missing data. This allows the user to explore what is actually missing, and additionally make inferences on whether missingness is random or related to particular variables. 

Before coming into this internship, my R ability was limited to self-teaching via youtube videos. Ample training was provided in this project but more than anything, throwing myself in and actually writing code has been so beneficial to my learning. This knowledge is extremely useful for a career in research – I was even able to apply my acquired skills onto the work carried out in my placement, and used R to analyse the data I gathered. 

I am very grateful for this opportunity given to me under the JGI and will take what I’ve learnt with me into whatever I do next! 

You can contact Imogen at imogenjoseph26@gmail.com 

Sindenyi Bukachi 

Using Big Data to Rethink Children’s Rights (bsindenyi@gmail.com) 

MSci Psychology and Neuroscience, University of Bristol (Year 3) 

Sindenyi Bukachi holding their research poster on 'Investigating attitudes towards children's rights (in education)'
Sindenyi Bukachi holding their research poster

Initially, the project was quite open – the only brief was to explore attitudes towards children’s rights using big data. My early research into Reddit threads, news stories and real-world discourse helped narrow our focus to something more urgent and measurable: children’s right to participation, specifically in educational settings as both my supervisors are based in the School of Education. This became the foundation for the rest of the project, and my supervisors later decided to take it forward as a grant proposal. 

Over the first few weeks, I learned how to do structured literature reviews using academic databases like ERIC, build Boolean search strings, and track findings across a spreadsheet. I explored how participation is talked about and measured, and the themes I identified – like tokenism, power struggles between adults, and the emotional toll of being “heard” but not actually listened to – became central to our research direction. 

In the second half, I moved from qualitative sources to dataset analysis. I used R and RStudio to explore datasets from the UK Data Service. I learned to work with tricky file types (.SAV, .TAB), use new packages, extract variables, visualise trends, and test relationships between predictors — all while thinking critically about how these datasets (often not made for this topic) could reflect participation and children’s agency. 

I’ve gained confidence in data science, research strategy, and independent problem-solving – all skills I’ll take forward into my dissertations and future career. I’m so grateful to Dr Katherin Barg, Professor Claire Fox, and the JGI for the support and trust throughout. 

MagMap – Accurate Magnetic Characteristic Mapping Using Machine Learning

PGR JGI Seed Corn Funding Project Blog 2023/24: Binyu Cui

Introduction:

Magnetic components, such as inductors, play a crucial role in nearly all power electronics applications and are typically known to be the least efficient components, significantly affecting overall system performance and efficiency. Despite extensive research and analysis on the characteristics of magnetic components, a satisfactory first-principle model for their characterization remains elusive due to the nonlinear mechanisms and complex factors such as geometries and fabrication methods. My current research focuses on the characterization and modelling of magnetic core loss, which is essential for power electronics design. This research has practical applications in areas such as the fast charging of electric vehicles and the design of electric motors.

Traditional modelling methods have relied on empirical equations, such as the Steinmetz equation and the Jiles-Atherton hysteresis model, which require parameters to be curve-fitted in advance. Although these methods have been refined over generations (e.g., MSE and iGSE), they still face practical limitations. In contrast, data-driven techniques, such as machine learning with neural networks, have demonstrated advantages in addressing multivariable nonlinear regression problems.

Thanks to the funding and support from the JGI Institute, the interdisciplinary project “MagMap” has been initiated. This project encompasses testing platform modifications, database setup, and neural network development, advancing the characterization and modelling of magnetic core loss.

Outcome

Previously, a large-signal automated testing platform is produced to evaluate the magnetic characteristics under various conditions. Fig. 1 shows the layout of the hardware section of the testing platform and Fig. 2 shows the user interface of the software that is currently used for the testing. With the help of JGI, I have managed to update the automated procedure of the platform including the point-to-point testing workflow and the large signal inductance characterizing. This testing platform is crucial for generating the practical database for the further machine learning process as its automated function has largely increased the testing efficiency of each operating point (approx 6-8s per data point).

Labelled electrical components in a automated testing platform
Fig. 1. Layout of the automated testing platform.
Code instructions for the interface of the automated testing platform
Fig. 2. User interface of the automated testing platform.

Utilizing the current database, a Long Short-Term Memory (LSTM) model has been developed to predict core loss directly from the input voltage. The model shows a better performance in deducing the core loss than traditional empirical models such as the improved generalized Steinmetz equation. A screenshot of the code outcome is shown in Fig. 3 and an example result of the model for one material is shown in Figure 4. A feedforward neural network has been tried out as a scalar-to-scalar model to deduce the core loss directly from a series of input scalars including the magnetic

flux density amplitude, frequency and duty cycle. Despite the accuracy of the training process, there are limitations in the input waveform types. Convolutional neural networks have also been tested before using the LSTM as a sequence-to-scalar model. However, the model size is significantly larger than the LSTM with hardly any improvement in accuracy.

Code for the demo outcome of the LSTM
Fig. 3. Demo outcome of the LSTM.
Bar chart showing ratio of data points against relative error code loss (%)
Fig. 4. Model performance against the ratio of validation sets used in the training.

Future Plan:

Although core loss measurement and modelling is a key issue in industrial applications, the reason behind these difficulties is the non-linear relationship between the magnetic flux density and the magnetic field strength which is also known as the permeability of the magnetic material. The permeability of ferromagnetic is very sensitive to a series of external parameters including temperature, induced current, frequency and input waveform types. With an accurate fitting between the relationship of magnetic flux density and field strength, not only

the core loss can be precisely calculated but also the current modelling method that is used in Ansys and COMSOL can be improved.

Acknowledgement:

I would like to extend my gratitude to JGI for funding this research and for their unwavering support throughout the project. I am also deeply thankful to Dr. Jun Wang for his continuous support. Additionally, I would also like to express my appreciation to Mr. Yuming Huo for his invaluable advice and assistance with the neural network coding process.

Unveiling Hidden Musical Semantics: Compositionality in Music Ngram Embeddings 

PGR JGI Seed Corn Funding Project Blog 2023/24: Zhijin Guo 

Introduction

The overall aim of this project is to analyse music scores by machine learning.  These of course are different from sound recordings of music, since they are symbolic representations of what musicians play.  But with encoded versions of these scores (in which the graphical symbols used by musicians are rendered as categorical data) we have the chance to turn these instructions in various sequences of pitches, harmonies, rhythms, and so on. 

What were the aims of the seed corn project? 

CRIM concerns a special genre of works from sixteenth century Europe in which a composer took some pre-existing piece and adapted the various melodies and harmonies in it to create a new but related composition. More specifically, the CRIM Project is concerned with polyphonic music, in which several independent lines are combined in contrapuntal combinations. As in the case of any given style of music, the patterns that composers create follow certain rules:  they write using stereotypical melodic and rhythmic patterns. And they combine these tunes (‘soggetti’, from the Italian word for ‘subject’ or ‘theme’) in stereotypical ways. So, we have the dimensions of melody (line), rhythm (time), and harmony (what we’d get if we slice through the music at each instant. 

A network of musical notations
Figure 1. An illustration of music graph, nodes are music ngrams and edges are different relations between them. Image generated by DALL·E.

We might thus ask the following kinds of questions about music: 

  • Starting from a given composition, what would be its nearest neighbour, based on any given set of patterns we might chose to represent?  A machine would of course not know anything about the composer, genre, or borrowing involved in those pieces, but it would be revealing to compare what a machine might tell us about this such ‘neighbours’ in light of what a human might know about them. 
  • What communities of pieces can we identify in a given corpus?  That is, if we attempt to classify of groups works in some way based on shared features, what kinds of communities emerge?  Are these communities related to Style? Genre? Composer? Borrowing? 
  • In contrast, if we take the various kinds of soggetti (or other basic ‘words’) as our starting point, what can we learn about their context?  What soggetti happen before and after them?  At the same time as them?  What soggetti are most closely related to them? And through this what can we say about the ways each kind of pattern is used? 

Interval as Vectors (Music Ngrams) 

How can we model these soggetti?  Of course they are just sequences of pitches and durations.  But since musicians move these melodies around, it will not work simply to look for strings of pitches (since as listeners we can recognize that G-A-B sounds exactly the same as C-D-E).  What we need to instead is to model these as distances between notes.  Musicians call these ‘intervals’ and you could think of them like musical vectors. They have direction (up/down) and they have some length (X steps along the scale). 

Here is an example of how we can use our CRIM Intervals tools (a Python/Pandas library) to harvest this kind of information from XML encodings of our scores.  There is more to it than this, but the basic points are clear:  the distances in the score are translated into a series of distances in a table.  Each column represents the motions in one voice.  Each row represents successive time intervals in the piece (1.0 = one quarter note). 

An ngram for a section of music
Figure 2. An example of ngram: [-3, 3, 2, -2], interval as vectors. 

Link Prediction 

We are interested in predicting unobserved or missing relations between pairs of ngrams in our musical graph. Given two ngrams (nodes in the graph), the goal is to ascertain the type and likelihood of a potential relationship (edge) between them, be it sequential, vertical, or based on thematic similarity. 

  • Sequential is tuples that come near each other time.  This is Large Language Model which computes ‘context’. LLM then produces the semantic information that is latent in the data. 
  • Vertical is tuples that happen at the same time.  It is ANOTHER kind of context. 
  • Thematic is based on some measure of similarity.   

Upon training, the model’s performance is evaluated on a held-out test set, providing metrics such as precision, recall, and F1-score for each type of relationship. The model achieved a prediction accuracy of 78%. 

Beyond its predictive capabilities, the model also generates embeddings for each ngram. These embeddings, which are high-dimensional vectors encapsulating the essence of each ngram in the context of the entire graph, can serve as invaluable tools for further musical analysis. 

From aerosol particles to network visualisation: Data science support enhancing research at the University of Bristol

AskJGI Example Queries from Faculty of Science and Engineering

All University of Bristol researchers (from PhD student and up) are entitled to a day of free data science support from the Ask-JGI helpdesk. Just email ask-jgi@bristol.ac.uk with your query and one of our team will get back to you to see how we can support you. You can see more about how the JGI can support data science projects for University of Bristol based researchers on our website.

We support queries from researchers across all faculties and in this blog we’ll tell you about some of the researchers we’ve supported from the Faculty of Health and Life Sciences here at the University of Bristol. 

Aerosol particles

A researcher approached us with Python code they’d written for simulating radioactive aerosol particle dynamics in a laminar flow. For particles smaller than 10 nanometers, they observed unexplained error “spikes” when comparing numerical to analytical results, suggesting that numerical precision errors were accumulating due to certain forces being orders of magnitude smaller than others for the tiny particles.

We provided documentation and advice for implementing higher-precision arithmetic using Python’s ‘mpmath’ library so that the researcher could use their domain knowledge to increase precision in critical calculation areas, balancing computational cost with simulation accuracy. We also wrote code to normalise the magnitude of different forces to similar scales to prevent smaller values from being lost in the calculation.

This was a great query to work on. Although Ask-JGI didn’t have the same domain knowledge for understanding the physics of the simulation, the researcher worked closely with us to help find a solution. They provided clear and well documented code, understood the likely cause of their problem and identified the solutions that we explored. This work highlights how computational limitations can impact the simulation of physical systems, and demonstrates the value of collaborative problem-solving between domain specialists and data scientists.

Diagram A shows straight arrow lines and B shows curvy arrow lines
Laminar flow (a) in a closed pipe, Turbulent flow (b) in a closed pipe. Image credit: SimScale

Training/course development

The JGI offers training in programming, machine learning and software engineering. We have some standard training courses that we offer as well as upcoming courses being development and shorter “lunch and learn” sessions on various topics.

Queries have come in to both Ask-JGI and the JGI training mailbox (jgi-training@bristol.ac.uk) asking follow up questions from training courses which people have attended. Additionally, requests have come through for further training to be developed in specific areas (e.g. natural language processing, advanced data visualisation or LLM useage). The JGI training mailbox is the place to go, but Ask-JGI will happily redirect you!

People sitting at tables in a computer lab looking at a large computer screen at the end of the table
Introduction to Python training session for Bristol Data Week 2025.

Network visualization

Recently Ask-JGI received a query from a PhD researcher in the School of Geographical Sciences. The Ask JGI team offered support on exploring visualisation options for the data provided, and provided example network visualisations of the UK’s industries’ geographical distribution similarity. Documented code solution was also provided so that further customisation and extension of the graphs is possible. At the Ask JGI, we are happy to help researchers who are already equipped with substantive domain knowledge and coding skills to complete small modules of their research output pipeline.

Network made up with lines and dots. Each colour represents a different UK industry
Network visualisation of similarity of UK industry geographical distribution.

Spin Network Optimisation

The aim of this query was to accelerate the optimization of a spin network which is a network of nodes coupled together by a certain strength, to perform transfer of information (spin) from one node to another by implementing parallel processing. The workflow involved a genetic algorithm (written in Fortran and executed via a bash script) and a Python-based gradient ascent algorithm.

Initial efforts focused on parallelizing the gradient ascent step. However, significant challenges arose due to the interaction between the parallelized Python code and the sequential execution of the Fortran-based Spinnet script.

Code refactoring was undertaken to improve readability and introduce minor speed enhancements by splitting the Python script into multiple files and grouping similar function calls.

Given the complexity and time investment associated with these code modifications, it was strongly recommended to explore the use of High-Performance Computing (HPC) facilities. Running the current code on an HPC system went on to provide the desired speed improvements without requiring any code changes, as HPC is designed for computationally intensive tasks like this.

Grant development

The Ask-JGI helpdesk is the main place researchers get in contact with the JGI with regards to getting help with grant applications. The JGI can support with grant idea development, giving letters of support for applications and costing in JGI data scientists or research software engineers to support the workload for potential projects. You can read more about how the JGI team can support grant development on the JGI website!

Ask-JGI is recruiting PhD students! 

We are recruiting a new team of PhD students for the Ask-JGI helpdesk to work from October 2025 until September 2026! 

The Jean Golding Institute (JGI) for data science and AI offers a consultancy service to researchers via its Ask-JGI helpdesk. We offer one day of free support to all staff and doctoral students at the University of Bristol, for queries relating to data science, AI, and software engineering. The helpdesk is run by PhD students and supported by the JGI’s own team of data scientists and research software engineers. 

What we’re looking for 

New recruits will be part of a team with overlapping and complementary skills, who will work together to support researchers in a range of ways. 

It is not expected that you will start with all the skills/experience that we are looking for the team to cover, however you should be enthusiastic about continuous learning and working outside your subject area. 

Typical queries (and skills/experience you may want to highlight in your application) include:  

  • Troubleshooting – Collaborating with researchers from different disciplines and of varying expertise, to find out what they need to do to solve their problem. 
  • Study design and planning – Providing statistical advice on experimental design. Identifying potential data hazards and ethical issues. 
  • Data cleaning and management – Helping to develop pipelines to make raw data ready for analysis. Advice on data management plans and data governance. 
  • Data analysis – Recommending or providing support with tools and methods for modelling, AI/machine learning and statistics. This might involve multilevel modelling, bioinformatics, GIS, NLP, random forests, deep learning, use of LLMs, or mixed/qualitative methods. 
  • Programming – Technical support and coding in (primarily) Python or R. But this could include other tools like SQL, MATLAB, SPSS, STATA, NVivo, Excel, C, Rust, Bash scripts etc. Code review and code optimisation. Deployment to HPC. 
  • Best practices – Giving advice on best practices for writing reproducible research code and creating packages. Support with tools like Git, GitHub, virtual environments and Conda, Docker. 
  • Data communication – Help with data visualisation. Providing advice with dashboards or websites. 

Applicants will need to be current full-time PhD students at the University of Bristol and will need to obtain approval from their primary supervisor. It is expected that applicants can commit on average 5-10 hours per month for 12 months. The team rotates responsibilities every fortnight and there are periods with a higher/lower volume of queries, so time commitments can vary throughout the year. 

Expected start date is the week commencing Monday 29 September 2025, working ad-hoc approximately 5-10 hours per month for 12 months. 

What’s in it for you? 

You will gain experience/skills which will be useful for your future research or career outside academia: 

  • Technical skills – learning from one another and developing best-practice skills in data science, AI and research software engineering. 
  • Project management – managing and prioritising multiple queries and allocating them to fellow team members. 
  • Team working – chairing team meetings, minute-taking, and collaborating with other team members on queries. 
  • Communication – sharing your expertise with researchers (of all levels) from different disciplines. 
  • Adaptability – developing and applying your skills to new and difficult problems, outside your immediate subject.  

This is a paid opportunity at Graduate Teacher – Level 1 for PhD students. 

How to apply 

Complete an online application form 

The deadline to apply is Thursday 31 July 2025. We will assess applications at the start of August and hope to communicate a decision in mid-August. 

The JGI aims to make data science, statistics and software engineering expertise accessible to all. We value diversity in our teams and so applicants from communities traditionally under-represented in data science, AI or research software engineering are strongly encouraged to apply. 

If you have any questions about the role, email jgi-reseng@bristol.ac.uk with the subject “Ask-JGI recruitment”. 

Testimonials from Ask-JGI team members 

Headshot of Yujie Dai

“Over the past year, I had the pleasure of working with the Ask-JGI team, and it was a truly enjoyable experience. The team was welcoming and supportive, and I had the opportunity to engage with researchers from a wide range of departments across the university, which broadened my perspective on different fields of study and enhanced my personal skills. I highly recommend joining this team!”Yujie Dai, Digital Health CDT 

 “What I enjoy most about working at the Ask-JGI helpdesk is the chance to connect with and assist researchers from all kinds of academic backgrounds. I may not always have the immediate answer to queries, but what really counts is doing my best to help and being willing to keep learning along the way.” Yueying Li, PhD student in Genetic Epidemiology 

Headshot of Yueying Li
Headshot of Fahd Abdelazim

“Working with the Ask-JGI service has been incredibly rewarding. I genuinely enjoy contributing directly to researchers’ projects, witnessing the tangible impact of our support. The variety of challenges, from diving into complex data analysis to helping visualize findings, keeps every day engaging and fulfilling.” –  Fahd Abdelazim, PhD student in Interactive AI, specializing in model understanding for Vision-Language models

“Being part of the Ask-JGI team is an excellent opportunity to improve communication skills over statistics/ data science tasks. As PGR students, most of us are accustomed to working within specialized areas of research, it is easy to overlook efforts and skills necessary for collaborating outside of those narrow fields of expertise. I have benefitted from working on the team to improve those skills.”Mirah Zhang, PhD student in Geographic Data Science 

Headshot of Mirah Zhang
Headshot of Dan Collins

“Working as an Ask-JGI data scientist has been a hugely rewarding experience. Each query involves supporting researchers from diverse specialisms across the University. It’s a great way to expose yourself to different technical challenges and research areas, and to explore new technologies that you haven’t worked with before.”Daniel Collins, PhD student in Interactive AI focussed on multi agent AI systems