Widening Participation (WP) Research Summer Internships

The Widening Participation (WP) Research Summer Internships provide undergraduates with hands-on experience of research during the summer holidays, with the aim of encouraging a career in research. Interns gain professional experience and knowledge through a funded placement in their chosen subject. This also supports application for postgraduate study and other research jobs.  

This year, the JGI was very pleased to support four internships through the WP scheme. Each of the interns has provided valuable support to an array of diverse and interesting projects related to their fields of interest. We are delighted by the feedback that we have received from their project supervisors and look forward to watching their future progress. Read on for more information on their projects and their experience.

Frihah Farooq 

Frihah Farooq's poster on Automating the linkage of open access data for health services
Research poster on ‘Automating the Linkage of Open Access Data for Health Sciences’ by Frihah Farooq

My name is Frihah, and I’m a third year undergraduate studying Mathematics here at the University of Bristol. My academic interests centre around applied data science and machine learning, and this summer I worked on a project involving the General Practice Workforce dataset published by NHS Digital. My focus was on building tools that could bring accessibility to data that is often scattered and difficult to navigate. 

The aim of the project was to automate the downloading and linkage of open-access datasets, specifically in the context of healthcare services. Many of these records are stored in files with inconsistent formats and structures, often requiring manual effort to piece together a consistent narrative. I developed a codebase in R that could search for the appropriate files, extract the relevant information, and construct a complete dataset that can be used for longitudinal analysis without the need for repeated intervention. While the code was built around the workforce dataset, the methodology generalises well to other datasets published by NHS Digital. 

One observation from the final merged dataset was the trend of decreasing row counts, likely due to restructuring, alongside an increase in the number of recorded variables, a sign that data collection has grown more sophisticated in recent years. This experience strengthened my foundation in data automation and my ability to work with evolving and imperfect data; skills I know will benefit me as I move further into research. 

If you’d like to get in touch, you can reach me at cc22019@bristol.ac.uk 

Grace Gilman 

Hello, my name is Grace Gilman and I am starting my third year studying Computer Science with Artificial Intelligence at the University of Bath. I am hoping to go into academia in the future and pursue computing research specifically with medical applications. You can contact me at gcag20@bath.ac.uk

Over the six weeks I have been participating in a research internship here at the University of Bristol, supported by the Jean Golding Institute. I have been working on a data science project called ‘Using AI to Study Gender in Children’s Books’, for the team Fair Tales, supervised by Chris McWilliams. During my internship I experimented with image analysis using ChatGPT and Vertex AIi, for future integration into the Data Entry app that Fair Tales is producing to semi-automate character and transcript input. I have also been contributing to the database architecture and search and filtering options for users to interact with the database. Some of my work has been analysing the corpus of children’s books using SQL, one pattern I found was that the difference between mother and father characters(1:0.75) is even more pronounced for grandmothers and grandfathers(1:0.5). 

During my time at this internship, I have become much more confident in my abilities to work on a project as well as code that will be used in a research setting. I have learnt more of how research is conducted and what skills are needed for this, and become more sure of an academic future. 

Imogen Joseph 

I am currently studying a Neuroscience MSci with a Year in Industry at the University of Bristol. I’m going into my final year, having just completed a placement year in Southampton General Hospital undertaking clinical research in neonatal respiratory physiology. I’m particularly interested in a career in academia and more specifically looking at molecular mechanisms behind disease for drug discovery. 

This summer, I helped in the development of an R package, ‘midoc’ (Multiple Imputation DOCtor, found on CRAN), designed to guide researchers in analysis with missing data under my supervisor Elinor Curnow. I created several functions that resulted in the display of a summary table of missing data, alongside optional graphs to visualise the distributions of their missing data. This allows the user to explore what is actually missing, and additionally make inferences on whether missingness is random or related to particular variables. 

Before coming into this internship, my R ability was limited to self-teaching via youtube videos. Ample training was provided in this project but more than anything, throwing myself in and actually writing code has been so beneficial to my learning. This knowledge is extremely useful for a career in research – I was even able to apply my acquired skills onto the work carried out in my placement, and used R to analyse the data I gathered. 

I am very grateful for this opportunity given to me under the JGI and will take what I’ve learnt with me into whatever I do next! 

You can contact Imogen at imogenjoseph26@gmail.com 

Sindenyi Bukachi 

Using Big Data to Rethink Children’s Rights (bsindenyi@gmail.com) 

MSci Psychology and Neuroscience, University of Bristol (Year 3) 

Sindenyi Bukachi holding their research poster on 'Investigating attitudes towards children's rights (in education)'
Sindenyi Bukachi holding their research poster

Initially, the project was quite open – the only brief was to explore attitudes towards children’s rights using big data. My early research into Reddit threads, news stories and real-world discourse helped narrow our focus to something more urgent and measurable: children’s right to participation, specifically in educational settings as both my supervisors are based in the School of Education. This became the foundation for the rest of the project, and my supervisors later decided to take it forward as a grant proposal. 

Over the first few weeks, I learned how to do structured literature reviews using academic databases like ERIC, build Boolean search strings, and track findings across a spreadsheet. I explored how participation is talked about and measured, and the themes I identified – like tokenism, power struggles between adults, and the emotional toll of being “heard” but not actually listened to – became central to our research direction. 

In the second half, I moved from qualitative sources to dataset analysis. I used R and RStudio to explore datasets from the UK Data Service. I learned to work with tricky file types (.SAV, .TAB), use new packages, extract variables, visualise trends, and test relationships between predictors — all while thinking critically about how these datasets (often not made for this topic) could reflect participation and children’s agency. 

I’ve gained confidence in data science, research strategy, and independent problem-solving – all skills I’ll take forward into my dissertations and future career. I’m so grateful to Dr Katherin Barg, Professor Claire Fox, and the JGI for the support and trust throughout. 

How to make data science skills stick? Learnings from the OCSEAN project

Written Catherine Upex and Rachel Wood

Left to right: Sena Darmasetiyawan; John Calorio; Komang Sumaryana; Chris Kinipi; Wahyu Widiatmika; Dendi Wijaya standing in front of the Fry Building
Visiting researchers from the OCSEAN project (from left to right: Sena Darmasetiyawan (Udayana University); John Calorio (Davao Medical School Foundation); Komang Sumaryana (Udayana University); Chris Kinipi (University of Papua New Guinea); Wahyu Widiatmika (Udayana University); Dendi Wijaya (Jakarta University)

Introduction

Earlier this summer, the University of Bristol and the JGI welcomed a group of visiting researchers from the “Oceanic and Southeast Asian Navigators” (OCSEAN) project. OCSEAN is a worldwide interdisciplinary consortium researching the demographic history of ancient seafarers across Oceania and Southeast Asia. The visiting humanities researchers from Indonesia and the Philippines arrived in Bristol with the aim of learning more about quantitative methods, how to apply them to their research, and to take these skills home to help their research community do the same.

When asked, most said they had little to no knowledge or experience in coding. The task therefore was to design a training approach to help them feel confident independently using Python for research – all in the space of a few weeks.

Our Approach

The training style followed a traditional workshop format, but importantly with two instructors. This allowed one to talk through the course content, and the other to provide one-to-one help to individuals. Initially, the sessions consisted of lecture-style teaching, but as confidence grew, they transitioned to a more independent format, where small groups collaborated to solve data science problems directly related to their research interests.

As most participants has no prior coding experience, it was important not assume any knowledge of technical terms. Over eight two-hour sessions spanning three weeks, the training slowly built-up coding knowledge, covering the following topics:

  • Introduction to Python (e.g. variables, data types, operators, lists, dictionaries)
  • Intermediate concepts (e.g. using/writing functions, loops, conditional statements)
  • How to use Chatbots for coding (e.g. how to write good prompts, refine responses, when/when not to use, error handling, and sanity checking)
  • Data analysis (e.g. loading/cleaning data, plotting using seaborn and matplotlib, summarising data)

The training also coincided with Bristol Data Week 2025, so the OCSEAN researchers had the opportunity to cement their knowledge by revisiting concepts in similar training sessions from the event.

Comparing training styles

The approach differed to a recent pilot training scheme run by JGI Research Data Science Advocates. The aim of the pilot was to run training on data analysis in Python in a low-stress environment, via a self-led approach. Participants were supplied with materials to work through independently, with optional contact time with facilitators.

Both training styles were designed for researchers with no prior coding experience. It was interesting to see how the hands-on and hands-off approaches compared in order to understand how to most effectively encourage engagement with data science.

Feedback from OCSEAN researchers

By the end of our training period, all the OCSEAN researchers said that they found the training very beneficial for their research. Many acknowledged that they found learning Python challenging. However, the format of the sessions, especially the opportunity to draw upon help from not only facilitators but also ChatGPT, and importantly each other, allowed them to get to grips with new concepts. Intensive successive trainings with a clear syllabus were seen as more beneficial than one-off unconnected sessions.

The importance of structured training was echoed by feedback from the self-led pilot training. Here, participants highlighted that despite a self-led approaching being easier to fit into a working week, they would have benefitted from group discussions and the opportunity to compare their results with others. Additionally, while most of the self-led participants agreed that the pilot scheme facilitated their learning outcomes and expressed a desire to apply what they learnt to their work, some commented that they lacked a basic understanding of Python to independently apply these skills.

Importantly, OCSEAN researchers commented on how it wasn’t just the training structure that facilitated learning. Aspects such as the use of a small meeting room and the inclusion of regular breaks, further encouraged collaboration between participants and drove better understanding. Additionally, the use of datasets adapted to participants’ research fields made coding seem much more accessible and engaging. This highlighted how important it is to facilitate a supportive and personalised teaching environment in order to fully grasp new complex concepts.

Training attendees with their course completion certificates standing beside Dr Dan Lawson, Rachel Wood and, Catherine Upex
Training attendees with their course completion certificates; featured with training facilitators from the University of Bristol: Dr Dan Lawson (Associate Professor of Data Science and member of OCSEAN project; School of Mathematics), Rachel Wood (PhD student; School of Mathematics); Catherine Upex (PhD student; Bristol Medical School)

Reflections and moving forward

This training was facilitated by two PhD students developing their own teaching skills, and the experience taught the team a lot about what makes effective data science training. To feel confident in independently using data science, intensive face-to-face training is needed to make sure basic coding skills are cemented. This can be difficult for many to fit in, but a weekly commitment, combined with a hand-on collaborative atmosphere can effectively drive key concepts home.

Additionally, to drive engagement particularly from disciplines with little data science background, it is important to cater training to specific research questions in that field i.e. using relevant data sets. This way, participants can see how data science can help them in their own research and be more inspired to try for themselves.

So, what’s next? The aim of this training was to provide OCSEAN researchers with data science skills to apply to their own research. It’s been brilliant to see that some have already taken this leap. Using their coding skills and connections made in Bristol, many are developing new projects, applying for PhD positions and forming future collaborations. In the Autumn, the team plan to travel to Bali to aid OCSEAN researchers in sharing coding skills with their research communities, as well as developing more research collaborations.


This blog was written by Catherine Upex and Rachel Wood

Learn more about the OCSEAN project here or contact Daniel Lawson (Dan.Lawson@bristol.ac.uk) or Monika Karmin (monika.karmin@ut.ee) for more information.

From aerosol particles to network visualisation: Data science support enhancing research at the University of Bristol

AskJGI Example Queries from Faculty of Science and Engineering

All University of Bristol researchers (from PhD student and up) are entitled to a day of free data science support from the Ask-JGI helpdesk. Just email ask-jgi@bristol.ac.uk with your query and one of our team will get back to you to see how we can support you. You can see more about how the JGI can support data science projects for University of Bristol based researchers on our website.

We support queries from researchers across all faculties and in this blog we’ll tell you about some of the researchers we’ve supported from the Faculty of Health and Life Sciences here at the University of Bristol. 

Aerosol particles

A researcher approached us with Python code they’d written for simulating radioactive aerosol particle dynamics in a laminar flow. For particles smaller than 10 nanometers, they observed unexplained error “spikes” when comparing numerical to analytical results, suggesting that numerical precision errors were accumulating due to certain forces being orders of magnitude smaller than others for the tiny particles.

We provided documentation and advice for implementing higher-precision arithmetic using Python’s ‘mpmath’ library so that the researcher could use their domain knowledge to increase precision in critical calculation areas, balancing computational cost with simulation accuracy. We also wrote code to normalise the magnitude of different forces to similar scales to prevent smaller values from being lost in the calculation.

This was a great query to work on. Although Ask-JGI didn’t have the same domain knowledge for understanding the physics of the simulation, the researcher worked closely with us to help find a solution. They provided clear and well documented code, understood the likely cause of their problem and identified the solutions that we explored. This work highlights how computational limitations can impact the simulation of physical systems, and demonstrates the value of collaborative problem-solving between domain specialists and data scientists.

Diagram A shows straight arrow lines and B shows curvy arrow lines
Laminar flow (a) in a closed pipe, Turbulent flow (b) in a closed pipe. Image credit: SimScale

Training/course development

The JGI offers training in programming, machine learning and software engineering. We have some standard training courses that we offer as well as upcoming courses being development and shorter “lunch and learn” sessions on various topics.

Queries have come in to both Ask-JGI and the JGI training mailbox (jgi-training@bristol.ac.uk) asking follow up questions from training courses which people have attended. Additionally, requests have come through for further training to be developed in specific areas (e.g. natural language processing, advanced data visualisation or LLM useage). The JGI training mailbox is the place to go, but Ask-JGI will happily redirect you!

People sitting at tables in a computer lab looking at a large computer screen at the end of the table
Introduction to Python training session for Bristol Data Week 2025.

Network visualization

Recently Ask-JGI received a query from a PhD researcher in the School of Geographical Sciences. The Ask JGI team offered support on exploring visualisation options for the data provided, and provided example network visualisations of the UK’s industries’ geographical distribution similarity. Documented code solution was also provided so that further customisation and extension of the graphs is possible. At the Ask JGI, we are happy to help researchers who are already equipped with substantive domain knowledge and coding skills to complete small modules of their research output pipeline.

Network made up with lines and dots. Each colour represents a different UK industry
Network visualisation of similarity of UK industry geographical distribution.

Spin Network Optimisation

The aim of this query was to accelerate the optimization of a spin network which is a network of nodes coupled together by a certain strength, to perform transfer of information (spin) from one node to another by implementing parallel processing. The workflow involved a genetic algorithm (written in Fortran and executed via a bash script) and a Python-based gradient ascent algorithm.

Initial efforts focused on parallelizing the gradient ascent step. However, significant challenges arose due to the interaction between the parallelized Python code and the sequential execution of the Fortran-based Spinnet script.

Code refactoring was undertaken to improve readability and introduce minor speed enhancements by splitting the Python script into multiple files and grouping similar function calls.

Given the complexity and time investment associated with these code modifications, it was strongly recommended to explore the use of High-Performance Computing (HPC) facilities. Running the current code on an HPC system went on to provide the desired speed improvements without requiring any code changes, as HPC is designed for computationally intensive tasks like this.

Grant development

The Ask-JGI helpdesk is the main place researchers get in contact with the JGI with regards to getting help with grant applications. The JGI can support with grant idea development, giving letters of support for applications and costing in JGI data scientists or research software engineers to support the workload for potential projects. You can read more about how the JGI team can support grant development on the JGI website!

Using ‘The Cloud’ to enhance UoB laboratory data security, storage, sharing, and management

JGI Seed Corn Funding Project Blog 2023/24: Peter Martin, Chris Jones & Duncan Baldwin

Introduction

As a world-leading research-intensive institution, the University of Bristol houses a multi-million-pound array of cutting-edge analytical equipment of all types, ages, function, and sensitivity – distributed across its Schools, Faculties, Research Centres and Groups, as well as in dozens of individual labs. However, as more and more data are captured – how can it be appropriately managed to comply with the needs of both researchers and funders alike?  

What were the aims of the seed corn project? 

When an instrument is purchased, the associated computing, data storage/resilience, and post-capture analysis is seldom, if ever, considered beyond the standard Data Management Plans. 

Before this project, there existed no centralised or officially endorsed mechanism at UoB supported by IT Services to manage long-term instrument data storage and internal/external access to this resource – with every group, lab, and facility individually managing their own data retention, access, archiving, and security policies. This is not just a UoB challenge, but one that is endemic of the entire research sector. As the value of data is now becoming universally realised, not just in academia, but across society – the challenge is more pressing than ever, with an institution-wide solution to the entire data challenge critically required which would be readily exportable to other universities and research organisations. At its core, this Seed Corn project sought to develop a ‘pipeline’ through which research data could be; (1) securely stored within a unified online environment/data centre into perpetuity, and (2) accessed via an intuitive, streamlined and equally secure online ‘front-end’ – such as Globus, akin to how OneDrive and Google Drive seamlessly facilitate document sharing.   

What was achieved? 

The Interface Analysis Centre (IAC), a University Research Centre in the School of Physics currently operates a large and ever-growing suite of surface and materials science equipment with considerable numbers of both internal (university-wide) and external (industry and commercial) users. Over the past 6-months, working with leading solution architects, network specialists, and security experts at Amazon Web Services (AWS), the IAC/IT Services team have successfully developed a scalable data warehousing system that has been deployed within an autonomous segment of the UoB’s network, such that single-copy data that is currently stored locally (at significant risk) and the need for it to be handled via portable HDD/emailed across the network can be eliminated. In addition to efficiently “getting the data out” from within the UoB network, using native credential management within Microsoft Azure/AWS, the team have developed a web-based front-end akin to Google Drive/OneDrive where specific experimental folders for specific users can be securely shared with these individuals – compliant with industry and InfoSec standards. The proof of the pudding has been the positive feedback received from external users visiting the IAC, all of whom have been able to access their experiment data immediately following the conclusion of their work without the need to copy GB’s or TB’s of data onto external hard-drives!  

Future plans for the project 

The success of the project has not only highlighted how researchers and various strands within UoB IT Services can together develop bespoke systems utilising both internal and external capabilities, but also how even a small amount of Seed Corn funding such as this can deliver the start of something powerful and exciting. Following the delivery of a robust ‘beta’ solution between the Interface Analysis Centre (IAC) labs and AWS servers, it is currently envisaged that the roll-out and expansion of this externally-facing research storage gateway facility will continue with the support of IT Services to other centres and instruments. Resulting from the large amount of commercial and external work performed across the UoB, such a platform will hopefully enable and underpin data management across the University going forwards – adopting a scalable and proven cloud-based approach.  


Contact details and links

Dr Peter Martin & Dr Chris Jones (Physics) peter.martin@bristol.ac.uk and cj0810@bristol.ac.uk 

Dr Duncan Baldwin (IT Services) d.j.baldwin@bristol.ac.uk  

Ask-JGI Example Queries from Faculty of Health and Life Sciences 

All University of Bristol researchers (from PhD student and up) are entitled to a day of free data science support from the Ask-JGI helpdesk. Just email ask-jgi@bristol.ac.uk with your query and one of our team will get back to you to see how we can support you. You can see more about how the JGI can support data science projects for University of Bristol based researchers on our website (https://www.bristol.ac.uk/golding/supporting-your-research/data-science-support/). 

We support queries from researchers across all faculties and in this blog we’ll tell you about some of the researchers we’ve supported from the Faculty of Health and Life Sciences here at the University of Bristol. 

AI prediction on video data 

Example of AI video prediction using video data taken from the EPIC-KITCHENS-100 study. The image shows qualitative results of action detection. Predictions with confidence > 0.5 are shown with colour-coded class labels.

One particularly interesting query came from a PhD researcher with no prior experience in programming or AI. She was exploring the idea of using AI to predict how long doctors at different skill levels would need to train on medical simulators to reach advanced proficiency. Drawing inspiration from aviation cockpit simulators, her project involved analysing simulation videos to make these predictions. We provided guidance on the feasibility of using AI for this task, suggesting approaches that would depend on the availability of annotated data and introducing her to relevant computer vision techniques. We also recommended Python as a starting point, along with resources to help her build foundational skills. It was exciting to help someone new to AI navigate the early stages of their project and explore how AI could contribute to improving medical training. 

Species Classification with ML 

Bemisia tabaci (MED) (silverleaf whitefly); two adults on a watermelon leaf. Image by Stephen Ausmus.

Another engaging query came from a researcher in biological sciences aiming to classify different species of plant pest insects—Bemisia, tabaci and two others—based on flight data. Her goal was not only to build machine learning classifiers but also to understand how different features contributed to species differentiation across various methods.

She approached the Ask-JGI data science support for guidance on refining her code and ensuring the accuracy of her analysis. We helped restructure the code to make it more modular and reusable, while also addressing bugs and improving its reliability. Additionally, we worked with her to create visualizations that provided clearer insights into model performance and feature importance. This collaboration was a great example of how machine learning can be applied to advancing research in ecological data analysis.  

Providing guidance for HPC, RDSF, and statistical software users 

High performance computing (HPC) and the Research Data Storage Facility (RDSF) have been used by an increasing number of people at our university. We also recommend them to students and staff when these tools align with their projects’ needs. However, getting started can be challenging—each system has its own frameworks, rules, and workflows. Researchers often find themselves overwhelmed by extensive training materials or stuck on specific technical issues that aren’t easily addressed.  

We provide tailored guidance to make these tools more accessible and practical for our clients, which includes troubleshooting, script modifications, and directing researchers to relevant university services. 

Additionally, this year’s Ask-JGI Helpdesk has brought together experienced users of SPSS, Stata, R, and Python. For researchers transitioning to new statistical software or adapting their workflows, we’ve helped them navigate the subtle differences in syntax across platforms and achieve their analysis goals. 

Handling Group-Level Variability in Quantitative Effects: A Multilevel Modelling Perspective

A visualisation of a multilevel model, original figure produced by JGI Data Scientist, Dr Leo Gorman.

We had a client who was researching differences in fluorescence intensity. This may be potentially due to factors such as antibody lot variation, differences in handling between researchers, or biological heterogeneity. This raises the question: How should such data be represented to ensure meaningful interpretation without misrepresenting the underlying biological processes? One of the key solutions that we recommend is to introduce multilevel modelling.  

Modelling fluorescence intensity at one or multiple levels (e.g., individual, batch, researcher) can help distinguish biological effects from biases. To be specific, for example, by applying mixed effects, we can account for between-individual variation in baseline fluorescence levels (random intercept), as well as differential responses to experimental conditions (random slope). Sometimes, the application of multilevel modelling also appears to be limited by the group-level sample size. If this is the case, as we discussed with the client, we don’t need to go as extreme as fitting multilevel models. To control for variations with such a small amount of changes, we can use alternative strategies, such as correcting standard errors and introducing dummy variables to achieve similar performance.