Using ‘The Cloud’ to enhance UoB laboratory data security, storage, sharing, and management

JGI Seed Corn Funding Project Blog 2023/24: Peter Martin, Chris Jones & Duncan Baldwin

Introduction

As a world-leading research-intensive institution, the University of Bristol houses a multi-million-pound array of cutting-edge analytical equipment of all types, ages, function, and sensitivity – distributed across its Schools, Faculties, Research Centres and Groups, as well as in dozens of individual labs. However, as more and more data are captured – how can it be appropriately managed to comply with the needs of both researchers and funders alike?  

What were the aims of the seed corn project? 

When an instrument is purchased, the associated computing, data storage/resilience, and post-capture analysis is seldom, if ever, considered beyond the standard Data Management Plans. 

Before this project, there existed no centralised or officially endorsed mechanism at UoB supported by IT Services to manage long-term instrument data storage and internal/external access to this resource – with every group, lab, and facility individually managing their own data retention, access, archiving, and security policies. This is not just a UoB challenge, but one that is endemic of the entire research sector. As the value of data is now becoming universally realised, not just in academia, but across society – the challenge is more pressing than ever, with an institution-wide solution to the entire data challenge critically required which would be readily exportable to other universities and research organisations. At its core, this Seed Corn project sought to develop a ‘pipeline’ through which research data could be; (1) securely stored within a unified online environment/data centre into perpetuity, and (2) accessed via an intuitive, streamlined and equally secure online ‘front-end’ – such as Globus, akin to how OneDrive and Google Drive seamlessly facilitate document sharing.   

What was achieved? 

The Interface Analysis Centre (IAC), a University Research Centre in the School of Physics currently operates a large and ever-growing suite of surface and materials science equipment with considerable numbers of both internal (university-wide) and external (industry and commercial) users. Over the past 6-months, working with leading solution architects, network specialists, and security experts at Amazon Web Services (AWS), the IAC/IT Services team have successfully developed a scalable data warehousing system that has been deployed within an autonomous segment of the UoB’s network, such that single-copy data that is currently stored locally (at significant risk) and the need for it to be handled via portable HDD/emailed across the network can be eliminated. In addition to efficiently “getting the data out” from within the UoB network, using native credential management within Microsoft Azure/AWS, the team have developed a web-based front-end akin to Google Drive/OneDrive where specific experimental folders for specific users can be securely shared with these individuals – compliant with industry and InfoSec standards. The proof of the pudding has been the positive feedback received from external users visiting the IAC, all of whom have been able to access their experiment data immediately following the conclusion of their work without the need to copy GB’s or TB’s of data onto external hard-drives!  

Future plans for the project 

The success of the project has not only highlighted how researchers and various strands within UoB IT Services can together develop bespoke systems utilising both internal and external capabilities, but also how even a small amount of Seed Corn funding such as this can deliver the start of something powerful and exciting. Following the delivery of a robust ‘beta’ solution between the Interface Analysis Centre (IAC) labs and AWS servers, it is currently envisaged that the roll-out and expansion of this externally-facing research storage gateway facility will continue with the support of IT Services to other centres and instruments. Resulting from the large amount of commercial and external work performed across the UoB, such a platform will hopefully enable and underpin data management across the University going forwards – adopting a scalable and proven cloud-based approach.  


Contact details and links

Dr Peter Martin & Dr Chris Jones (Physics) peter.martin@bristol.ac.uk and cj0810@bristol.ac.uk 

Dr Duncan Baldwin (IT Services) d.j.baldwin@bristol.ac.uk  

Ask-JGI Example Queries from Faculty of Health and Life Sciences 

All University of Bristol researchers (from PhD student and up) are entitled to a day of free data science support from the Ask-JGI helpdesk. Just email ask-jgi@bristol.ac.uk with your query and one of our team will get back to you to see how we can support you. You can see more about how the JGI can support data science projects for University of Bristol based researchers on our website (https://www.bristol.ac.uk/golding/supporting-your-research/data-science-support/). 

We support queries from researchers across all faculties and in this blog we’ll tell you about some of the researchers we’ve supported from the Faculty of Health and Life Sciences here at the University of Bristol. 

AI prediction on video data 

Example of AI video prediction using video data taken from the EPIC-KITCHENS-100 study. The image shows qualitative results of action detection. Predictions with confidence > 0.5 are shown with colour-coded class labels.

One particularly interesting query came from a PhD researcher with no prior experience in programming or AI. She was exploring the idea of using AI to predict how long doctors at different skill levels would need to train on medical simulators to reach advanced proficiency. Drawing inspiration from aviation cockpit simulators, her project involved analysing simulation videos to make these predictions. We provided guidance on the feasibility of using AI for this task, suggesting approaches that would depend on the availability of annotated data and introducing her to relevant computer vision techniques. We also recommended Python as a starting point, along with resources to help her build foundational skills. It was exciting to help someone new to AI navigate the early stages of their project and explore how AI could contribute to improving medical training. 

Species Classification with ML 

Bemisia tabaci (MED) (silverleaf whitefly); two adults on a watermelon leaf. Image by Stephen Ausmus.

Another engaging query came from a researcher in biological sciences aiming to classify different species of plant pest insects—Bemisia, tabaci and two others—based on flight data. Her goal was not only to build machine learning classifiers but also to understand how different features contributed to species differentiation across various methods.

She approached the Ask-JGI data science support for guidance on refining her code and ensuring the accuracy of her analysis. We helped restructure the code to make it more modular and reusable, while also addressing bugs and improving its reliability. Additionally, we worked with her to create visualizations that provided clearer insights into model performance and feature importance. This collaboration was a great example of how machine learning can be applied to advancing research in ecological data analysis.  

Providing guidance for HPC, RDSF, and statistical software users 

High performance computing (HPC) and the Research Data Storage Facility (RDSF) have been used by an increasing number of people at our university. We also recommend them to students and staff when these tools align with their projects’ needs. However, getting started can be challenging—each system has its own frameworks, rules, and workflows. Researchers often find themselves overwhelmed by extensive training materials or stuck on specific technical issues that aren’t easily addressed.  

We provide tailored guidance to make these tools more accessible and practical for our clients, which includes troubleshooting, script modifications, and directing researchers to relevant university services. 

Additionally, this year’s Ask-JGI Helpdesk has brought together experienced users of SPSS, Stata, R, and Python. For researchers transitioning to new statistical software or adapting their workflows, we’ve helped them navigate the subtle differences in syntax across platforms and achieve their analysis goals. 

Handling Group-Level Variability in Quantitative Effects: A Multilevel Modelling Perspective

A visualisation of a multilevel model, original figure produced by JGI Data Scientist, Dr Leo Gorman.

We had a client who was researching differences in fluorescence intensity. This may be potentially due to factors such as antibody lot variation, differences in handling between researchers, or biological heterogeneity. This raises the question: How should such data be represented to ensure meaningful interpretation without misrepresenting the underlying biological processes? One of the key solutions that we recommend is to introduce multilevel modelling.  

Modelling fluorescence intensity at one or multiple levels (e.g., individual, batch, researcher) can help distinguish biological effects from biases. To be specific, for example, by applying mixed effects, we can account for between-individual variation in baseline fluorescence levels (random intercept), as well as differential responses to experimental conditions (random slope). Sometimes, the application of multilevel modelling also appears to be limited by the group-level sample size. If this is the case, as we discussed with the client, we don’t need to go as extreme as fitting multilevel models. To control for variations with such a small amount of changes, we can use alternative strategies, such as correcting standard errors and introducing dummy variables to achieve similar performance. 

The Turing Seminars 2024-2025

From November 2024 – April 2025, we (the Turing Liaison Team at Bristol) ran a fruitful Turing Seminar Series. This series boasted academics connected to the Turing Institute, speaking about their cutting-edge research in data science and AI.

From Machine Learning, Large Language Models and Digital Twins, to early prediction of dementia, disambiguation in historical texts and evolutionary biology, the range of speaker specialisms reflected the breadth of research at Bristol in this space, reaching academics and early career researchers across the institution.

Below are a list of the talks and speakers:

Wednesday 6 November:

  • Title: Machine Learning and Dynamical Systems meet in Reproducing Kernel Hilbert Spaces
  • Speaker: Boumediene Hamzi, Marie Curie Fellow, Imperial College London.

Wednesday 20 November:

  • Title: Trustworthy Digital Twins: designing, developing, and deploying open and reproducible pipelines
  • Speaker: Chris Burr, Head of the Innovation and Impact Hub, Turing Research and Innovation Cluster for Digital Twins, Alan Turing Institute

Wednesday 4 December:

  • Title: What can your shopping basket say about your health?
  • Speaker: Anya Skatova, Senior Research Fellow, Bristol Medical School (PHS)

Wednesday 15 January:

  • Title: AI-guided tools for early prediction of brain and mental health disorders
  • Speaker: Zoe Kourtzi, Professor of Computational Cognitive Neuroscience, University of Cambridge

Wednesday 12 February:

  • Title: Temporal models for Word Sense Disambiguation in historical texts
  • Speaker: Barbara McGillivray, Lecturer in Digital Humanities and Cultural Computation, Kings College London

Wednesday 26 February:

  • Title: “If you can’t tell, does it matter?” What should the law say about humanlike AI?
  • Speaker: Colin Gavaghan, Professor of Digital Futures, Bristol Digital Futures Institute, University of Bristol

Wednesday 12 March:

  • Title: “Cognition-first evolution”
  • Speaker: Richard Watson, Professor, (evolutionary biology and computer science), University of Southampton

Wednesday 26 March:

  • Title: “Big data as propeller for dynamic and time-sensitive service industries: a tourism sector perspective.”
  • Speaker: Nikolaos Stylos, Associate Professor in Marketing and Digital Innovation, Business School, University of Bristol

Wednesday 9 April:

  • Title: Can large language models reason about qualitative spatial information?
  • Speaker: Robert Blackwell, Senior Research Associate, Alan Turing Institute

These seminars have connected external researchers with relevant academics and departments at Bristol, and we have already seen these connections turn into longer-term collaborations. After the talk by Chris Burr, Alan Turing Institute, we organised a workshop between the Alan Turing Institute and the Bristol Digital Futures Institute (BDFI). This workshop provided an insight into digital twin projects run by both institutes, as well as facilitating connections. We took visitors on a tour of BDFI to show the incredible facilities, namely the Reality Emulator – the world’s first large-scale digital twin facility. Staff then went into roundtable discussions delving into shared areas of interest and what a longer-term collaboration could look like.

Over the series, we had 184 internal and external attendees, with 80% feeding back that they found the information / content provided during the event helpful. We are planning on running another series in the 2025-2026 academic year, building on our momentum and further increasing our external and internal networks.

If you have any suggestions of who you would like to see speak as part of next year’s series, please contact Isabelle Halton, Turing Liaison Manager – uob-turing@bristol.ac.uk

You can find out more about Turing events and opportunities at Bristol, including the previous Turing Seminar talks and slides on the Turing web pages.

Dr Leon Danon appointed as Director of the Jean Golding Institute

Headshot of Leon Danon
Dr Leon Danon

Dr Leon Danon has been appointed as the new Director of the Jean Golding Institute. Leon is an Associate Professor in Infectious Disease Modelling and Data Analytics in the School of Engineering Mathematics and Technology. He has a PhD in Statistical Physics of Complex Networks but has been working on epidemiology of infectious diseases since 2004. His work combines mathematics, data science and AI with an understanding of behavioural and biological drivers of disease spread to solve pressing problems in public health. 

He is Director of Modelling and Data at the Bristol Vaccine centre working with clinicians, immunologists, and statisticians on the epidemiology of vaccine preventable infections.

During COVID-19 he served on SPI-M-O, the modelling subgroup of SAGE, contributing to scientific advice to government on mitigation policies. He was part of the group awarded the Weldon Memorial Prize for this work, as well as the SPI-M-O Award for Modelling and Data Support (SAMDS). He has secured research funding totalling over £25M and continues to work at the science policy interface.

Leon Danon said: “I’m delighted to be joining the Jean Golding Institute as Director. Having been at Bristol since 2021, I’ve had the opportunity to contribute to the University’s vibrant research environment, particularly in infectious disease epidemiology. I’m now very excited to lead the JGI, building on its existing strengths to drive interdisciplinary data science and AI initiatives across a broad range of activity within the University. The JGI is the ideal setting for deepening existing collaborations across faculties and external partners, as well as building new ones, and I’m eager to get started.

He continued: “I look forward to working with you all to support the continued success and growth of the Institute, and to support the University’s ambitions in high-impact research and innovation, sustainability, industry and policy partnerships, and local engagement, raising our global leadership and reputation.

Dr Leon Danon will commence in his role of Director of the Jean Golding Institute on the 1 May 2025.

Meet the Ask-JGI team – Adrianna, Fahd, Yujie & Huw

The new Ask-JGI helpdesk cohort started in September 2024 and have been busy answering queries from researchers across the university! We introduced half of the team in our January blog. Meet the other half of the team below:

Adrianna Jezierska (she/her) – Ask-JGI PhD Student

Headshot of Adrianna Jezierska
Adrianna Jezierska, PhD candidate in in the School of Business

I’m a PhD student at the University of Bristol Business School. My project focuses on social media influencers and their vegan content on YouTube. Using language derived from video transcripts, I analyse to what extent they legitimise veganism so that it becomes popular and desirable in society. Whilst most organisation and management scholars have developed theories based on qualitative data, resulting in small datasets and case study approaches, in my work, I highlight the role of computational social sciences and big data in helping social scientists answer their research questions.

Coming from a social science background, I was initially hesitant about joining the Ask-JGI team. However, this decision has turned out to be the most rewarding and challenging experience. Being part of the team is a continuous learning journey. The questions we receive span various disciplines, often pushing us out of our comfort zones. The most exciting part of the job is the opportunity to communicate with other researchers and receive their positive feedback. On the other hand, we constantly collaborate with other team members and learn from each other, which makes it a very supportive environment. I’m pleased to see more queries from social scientists and humanities researchers. The growing popularity of computational approaches and the shift towards interdisciplinary research is a trend that I find inspiring and exciting

Fahd Abdelazim (he/him) – Ask-JGI PhD Student

Headshot of Fahd Abdelazim
Fahd Abdelazim, PhD student on the Interactive AI CDT in the School of Computer Science

I am a PhD student in the Interactive Artificial Intelligence CDT, specializing in model understanding for Vision-Language models. My research focuses on introducing improvements to Vision-Language models that allow for better linking of specific ideas or attributes to physical items, in order to help models recognize and understand the properties of objects in images.

I first heard of the Ask-JGI team through fellow PhD students, and it was recommended to me as a way to apply data science skills to real-world applications. Joining the Ask-JGI helpdesk has been a unique experience where I’ve been able to delve into various domains and learn about topics that I would otherwise not have had the chance to learn about. The team truly values cross-functional collaboration and encourages tackling new challenges and learning on the job.

Working at Ask JGI is incredibly rewarding. I enjoy the diversity of challenges presented by each query which gives me the chance to improve as a data scientist and gain a better understanding of how data science can help improve academic research. I really enjoy the collaborative spirit within the team. The Ask-JGI team are from many different disciplines and interacting with them allows for interesting exchanges of ideas and problem-solving approaches. This allows me to grow not just as a data scientist but as a researcher as well.

Yujie Dai (she/her) – Ask-JGI PhD Student

Headshot of Yujie Dai
Yujie Dai, PhD student in the Digital Health and Care CDT

I am a PhD student in the Digital Health and Care CDT, specializing in population health data science. My research focuses on leveraging large-scale real-world health data to address critical challenges in infectious diseases. Specifically, I utilize explainable AI (XAI) techniques to characterize and diagnose diseases, aiming to bridge the gap between data science and public health.

 My journey with Ask-JGI began with a recommendation from a friend who was previously part of the team. They spoke highly of the collaborative and dynamic environment, and I was intrigued by the opportunity to apply my skills in real-world research settings. Joining Ask-JGI is an extension of my academic and research pursuits. I was drawn to the idea of supporting researchers across diverse disciplines, helping them navigate technical challenges in their projects, and learning from their different perspectives. The chance to engage with cutting-edge problems and contribute to solutions beyond the scope of my own research was exciting.

There’s so much to love about being part of Ask JGI. I love the variety of work. Each question I encounter presents a new challenge, whether it’s developing a data analysis pipeline, troubleshooting code, or brainstorming creative solutions for a computational problem. The variety keeps me constantly learning and growing as a data scientist. I also love the collaborative atmosphere. Working closely with researchers from different fields gives me diverse ways of thinking and problem-solving. It’s an opportunity to not only apply my skills but also to know more about the scientific community.

Huw Day (he/him) – Ask-JGI Lead

Headshot of Huw Day
Huw Day, JGI Data Scientist

I am a JGI Data Scientist with a background in mathematics, working on a variety of data science projects with researchers across the university using a variety of data science methodologies and techniques. I also help run the Data Ethics Club.

As Ask-JGI Lead, I am responsible for recruiting, training and the general managing of the Ask-JGI team. They’re a fantastic group and I consider myself really lucky to be able to work with them. I support some of the general queries and I’m also responsible for talking with researchers interested in costing out data science support in grant applications.

To me, the Ask-JGI helpdesk is based on the idea that any researcher who wants to do data science should be empowered to do so. Whilst we often do the data science for people, I think the most rewarding outputs from our helpdesk is when we empower researchers to do data science themselves, guiding and validating their work. It’s also a wonderful opportunity for myself and the rest of the helpdesk to learn about research across the university.


All University of Bristol researchers (including PhDs) are entitled to a day of free data science support from the Ask-JGI helpdesk. Just email ask-jgi@bristol.ac.uk with your query and one of our team will get back to you to see how we can support you.

If you’re a PhD student interested in joining the Ask-JGI team, we will do recruiting for the next academic year in summer of 2025 so keep an eye on the JGI mailing list for when we have our recruiting call. We recruit a new cohort every year but do not accept speculative applications outside of the recruiting call.