Using ‘The Cloud’ to enhance UoB laboratory data security, storage, sharing, and management

JGI Seed Corn Funding Project Blog 2023/24: Peter Martin, Chris Jones & Duncan Baldwin

Introduction

As a world-leading research-intensive institution, the University of Bristol houses a multi-million-pound array of cutting-edge analytical equipment of all types, ages, function, and sensitivity – distributed across its Schools, Faculties, Research Centres and Groups, as well as in dozens of individual labs. However, as more and more data are captured – how can it be appropriately managed to comply with the needs of both researchers and funders alike?  

What were the aims of the seed corn project? 

When an instrument is purchased, the associated computing, data storage/resilience, and post-capture analysis is seldom, if ever, considered beyond the standard Data Management Plans. 

Before this project, there existed no centralised or officially endorsed mechanism at UoB supported by IT Services to manage long-term instrument data storage and internal/external access to this resource – with every group, lab, and facility individually managing their own data retention, access, archiving, and security policies. This is not just a UoB challenge, but one that is endemic of the entire research sector. As the value of data is now becoming universally realised, not just in academia, but across society – the challenge is more pressing than ever, with an institution-wide solution to the entire data challenge critically required which would be readily exportable to other universities and research organisations. At its core, this Seed Corn project sought to develop a ‘pipeline’ through which research data could be; (1) securely stored within a unified online environment/data centre into perpetuity, and (2) accessed via an intuitive, streamlined and equally secure online ‘front-end’ – such as Globus, akin to how OneDrive and Google Drive seamlessly facilitate document sharing.   

What was achieved? 

The Interface Analysis Centre (IAC), a University Research Centre in the School of Physics currently operates a large and ever-growing suite of surface and materials science equipment with considerable numbers of both internal (university-wide) and external (industry and commercial) users. Over the past 6-months, working with leading solution architects, network specialists, and security experts at Amazon Web Services (AWS), the IAC/IT Services team have successfully developed a scalable data warehousing system that has been deployed within an autonomous segment of the UoB’s network, such that single-copy data that is currently stored locally (at significant risk) and the need for it to be handled via portable HDD/emailed across the network can be eliminated. In addition to efficiently “getting the data out” from within the UoB network, using native credential management within Microsoft Azure/AWS, the team have developed a web-based front-end akin to Google Drive/OneDrive where specific experimental folders for specific users can be securely shared with these individuals – compliant with industry and InfoSec standards. The proof of the pudding has been the positive feedback received from external users visiting the IAC, all of whom have been able to access their experiment data immediately following the conclusion of their work without the need to copy GB’s or TB’s of data onto external hard-drives!  

Future plans for the project 

The success of the project has not only highlighted how researchers and various strands within UoB IT Services can together develop bespoke systems utilising both internal and external capabilities, but also how even a small amount of Seed Corn funding such as this can deliver the start of something powerful and exciting. Following the delivery of a robust ‘beta’ solution between the Interface Analysis Centre (IAC) labs and AWS servers, it is currently envisaged that the roll-out and expansion of this externally-facing research storage gateway facility will continue with the support of IT Services to other centres and instruments. Resulting from the large amount of commercial and external work performed across the UoB, such a platform will hopefully enable and underpin data management across the University going forwards – adopting a scalable and proven cloud-based approach.  


Contact details and links

Dr Peter Martin & Dr Chris Jones (Physics) peter.martin@bristol.ac.uk and cj0810@bristol.ac.uk 

Dr Duncan Baldwin (IT Services) d.j.baldwin@bristol.ac.uk  

Ask-JGI Example Queries from Faculty of Health and Life Sciences 

All University of Bristol researchers (from PhD student and up) are entitled to a day of free data science support from the Ask-JGI helpdesk. Just email ask-jgi@bristol.ac.uk with your query and one of our team will get back to you to see how we can support you. You can see more about how the JGI can support data science projects for University of Bristol based researchers on our website (https://www.bristol.ac.uk/golding/supporting-your-research/data-science-support/). 

We support queries from researchers across all faculties and in this blog we’ll tell you about some of the researchers we’ve supported from the Faculty of Health and Life Sciences here at the University of Bristol. 

AI prediction on video data 

Example of AI video prediction using video data taken from the EPIC-KITCHENS-100 study. The image shows qualitative results of action detection. Predictions with confidence > 0.5 are shown with colour-coded class labels.

One particularly interesting query came from a PhD researcher with no prior experience in programming or AI. She was exploring the idea of using AI to predict how long doctors at different skill levels would need to train on medical simulators to reach advanced proficiency. Drawing inspiration from aviation cockpit simulators, her project involved analysing simulation videos to make these predictions. We provided guidance on the feasibility of using AI for this task, suggesting approaches that would depend on the availability of annotated data and introducing her to relevant computer vision techniques. We also recommended Python as a starting point, along with resources to help her build foundational skills. It was exciting to help someone new to AI navigate the early stages of their project and explore how AI could contribute to improving medical training. 

Species Classification with ML 

Bemisia tabaci (MED) (silverleaf whitefly); two adults on a watermelon leaf. Image by Stephen Ausmus.

Another engaging query came from a researcher in biological sciences aiming to classify different species of plant pest insects—Bemisia, tabaci and two others—based on flight data. Her goal was not only to build machine learning classifiers but also to understand how different features contributed to species differentiation across various methods.

She approached the Ask-JGI data science support for guidance on refining her code and ensuring the accuracy of her analysis. We helped restructure the code to make it more modular and reusable, while also addressing bugs and improving its reliability. Additionally, we worked with her to create visualizations that provided clearer insights into model performance and feature importance. This collaboration was a great example of how machine learning can be applied to advancing research in ecological data analysis.  

Providing guidance for HPC, RDSF, and statistical software users 

High performance computing (HPC) and the Research Data Storage Facility (RDSF) have been used by an increasing number of people at our university. We also recommend them to students and staff when these tools align with their projects’ needs. However, getting started can be challenging—each system has its own frameworks, rules, and workflows. Researchers often find themselves overwhelmed by extensive training materials or stuck on specific technical issues that aren’t easily addressed.  

We provide tailored guidance to make these tools more accessible and practical for our clients, which includes troubleshooting, script modifications, and directing researchers to relevant university services. 

Additionally, this year’s Ask-JGI Helpdesk has brought together experienced users of SPSS, Stata, R, and Python. For researchers transitioning to new statistical software or adapting their workflows, we’ve helped them navigate the subtle differences in syntax across platforms and achieve their analysis goals. 

Handling Group-Level Variability in Quantitative Effects: A Multilevel Modelling Perspective

A visualisation of a multilevel model, original figure produced by JGI Data Scientist, Dr Leo Gorman.

We had a client who was researching differences in fluorescence intensity. This may be potentially due to factors such as antibody lot variation, differences in handling between researchers, or biological heterogeneity. This raises the question: How should such data be represented to ensure meaningful interpretation without misrepresenting the underlying biological processes? One of the key solutions that we recommend is to introduce multilevel modelling.  

Modelling fluorescence intensity at one or multiple levels (e.g., individual, batch, researcher) can help distinguish biological effects from biases. To be specific, for example, by applying mixed effects, we can account for between-individual variation in baseline fluorescence levels (random intercept), as well as differential responses to experimental conditions (random slope). Sometimes, the application of multilevel modelling also appears to be limited by the group-level sample size. If this is the case, as we discussed with the client, we don’t need to go as extreme as fitting multilevel models. To control for variations with such a small amount of changes, we can use alternative strategies, such as correcting standard errors and introducing dummy variables to achieve similar performance. 

Successful Seedcorn Awardees 2024-2025

The Jean Golding Institute Seedcorn Funding is a fantastic opportunity to develop multi and interdisciplinary ideas while promoting collaboration in data science and AI.  We are delighted that a new cohort of multidisciplinary researchers has been supported through this funding.

Leighan Renaud – Building a Folk Map of St Lucia

Leighan Renaud

Dr. Leighan Renaud is a lecturer in Caribbean Literatures and Cultures in the Department of English. Her research interests include twenty-first century Caribbean fiction, mothering and motherhood in the Caribbean, folk and oral traditions in the Anglophone Caribbean, and creative practices of neo-archiving. 

Louise AC Millard – Using digital health data for tracking menstrual cycles

Dr. Louise Millard is a Senior Lecturer in Health Data Science in the MRC Integrative Epidemiology Unit (IEU) at the University of Bristol. Following an undergraduate Computer Science degree and MSc in Machine Learning and Data Mining, they completed an interdisciplinary PhD at the interface of Computer Science and Epidemiology. Their research interests lie in the development and application of computational methods for population health research, including using digital health and phenotypic data, and statistical and machine learning approaches. 

Photo of Louise AC MIllard on the right

Laura Fryer – Visualisation tool for Enhancing Public Engagement Using Supermarket Loyalty Card Data

Photo of Laura Fryer on the left

Laura is a senior research associate in the Digital Footprints Lab based within the Bristol Medical School. Their aim is to use novel data to unlock insights into behavioural science for the purposes of public good. Laura is particularly passionate about broadening the public’s understanding of digital footprint data (e.g. from loyalty cards, bank transactions or wearable technology such as a smart watch) and demonstrating how vital it can be in developing our understanding of population health within the UK and beyond.  Laura’s project is focused on developing a data-visualisation tool that will support public engagement activities and provide a tangible representation of the types of data that we use – building further trust between the public and scientific researchers.  

Nicola A Wiseman – Cellular to Global Assessment of Phytoplankton Stoichiometry (C-GAPS)

Dr. Nicola Wiseman is a Research Associate in the School of Geographical Sciences. They received their PhD in Earth System Science from the University of California, Irvine, where they specialized in using ocean biogeochemical models to investigate the impacts of phytoplankton nutrient uptake flexibility on ocean carbon uptake. They also are interested in using statistical methods and machine learning to better understand the interactions between marine nutrient and carbon cycles, and the role of these interactions in regulating global climate. 

Photo of Nicola A Wiseman on the right

Georgia Sains – Collecting & Analysing Multilingual EEG Data

Georgia Sains is a Doctoral Teaching Associate in the Neural Computation research group at the School of Computer Science. Her research is focused on the overlap between Computer Science, Neuroscience, and Linguistics. Georgia has worked on developing models to help understand how linguistic traits have evolved. More recently, she has been using Bayesian modelling to find patterns between grammar and neurological response and are now focused on using Electroencephalography experimentation to explore the relationship between linguistic upbringing and how the brain processes language. 

Alex Tasker – Building a Strategic Critical Rapid Integrated Biothreat Evaluation (SCRIBE) data tool for research, policy, and practice

Dr. Tasker is a Senior Lecturer at the University of Bristol, a Research Associate at the KCL Conflict Health Research Group and Oxford Climate Change & (In)Security (CCI) project, and a recent ESRC Policy Fellow in National Security and International Relations. Dr. Tasker is an interdisciplinary researcher working across social and natural sciences to understand human-animal-environmental health in situations of conflict, criminality, and displacement using One Health approaches. Alongside this core focus, Dr. Tasker’s work also explores emerging areas of relevance to biosecurity and biothreat including engineering biology, antimicrobial resistance, subterranean spaces, and the use of new forms of evidence and expertise in a rapidly changing world for climate, security, and defense.

Photo of Alex Tasker on the right

Ask JGI Student Experience Profiles: Rachael Laidlaw

Rachael Laidlaw (Ask-JGI Data Science Support 2023-24) 

I first came into contact with the Jean Golding Institute last year at The Alan Turing Institute’s annual AI UK conference in London, and then again in the early stages of the DataFace project in collaboration with Cheltenham Science Festival. This meant that before I officially joined the team back in October, I already knew what a lovely group of people I’d be getting involved with! Having nice colleagues, however, was not my only motivation for applying to be an Ask-JGI student. On top of that, I’d decided that whilst starting out in my ecological computer-vision PhD niche, I didn’t want to forget all of the statistical skills that I’d developed back in my MSc degree. Plus, it sounded really fun to keep myself on my toes by exercising my mind tackling a variety of data-oriented requests from across the university’s many departments. 

Rachael Laidlaw in centre with two JGI staff members to the left and one JGI staff member to the right pointing towards a Data pin board at the JGI stall
Rachael Laidlaw (centre), second-year PhD student in Interactive Artificial Intelligence, and other JGI staff members at the JGI stall

During the course of my academic life, I’ve taken the plunge of changing disciplines twice, moving from pure mathematics to applied statistics and then again to computer science, and I liked the idea of supporting others to potentially do the same thing as they looked to enhance their work by delving into data. Through Ask-JGI, I kept my weeks interesting by having something other than my own research to sometimes switch my focus to, and it felt very fulfilling to be able to offer useful technical advice to those who were in the same position that I myself had been in not so long ago too! I therefore got stuck in with anything and everything, from training CNNs for rainfall forecasting or performing statistical tests to compare the antibiotic resistance of different bacteria, to modelling the outcomes of university spinouts or advising on the ethical considerations and potential bias present when designing and deploying a questionnaire-based study. And, of course, by exposing myself to these problems (alongside additional outreach initiatives and showcase events), I also learned a lot along the way, both from my own exploration and from the rest of the team’s insights. 

One especially exciting query revolved around automating the process of identifying from images which particular underground printing presses had been used to produce various historical political pamphlets, based on imperfections in the script. This piqued my interest immediately as it drew parallels with my PhD project, highlighting the copious amount of uses of computer vision and how it can save us time by speeding up traditionally manual processes: from the monitoring of animal biodiversity to carrying out detective work on old written records. 

All in all, this year has broadened my horizons by giving me great consultancy-style work experience through the opportunity to share my expertise and help a wide range of researchers. I would absolutely encourage other curious PhD students to apply and see what they can both give to and gain from the role! 

Ask JGI Student Experience Profiles: Emilio Romero

Emilio Romero (Ask-JGI Data Science Support 2023-24)

Emilio Romero
Emilio Romero, 2nd year PhD Student in Translational Health Sciences

Over the past year, my experience helping with the Ask-JGI service has been really rewarding. I was keen to apply as I wanted to get more exposure to the research world in Bristol, meet different researchers and explore with them different ways of working and approaching data.  

From a technical perspective, I had the opportunity to work on projects related to psychometric data, biological matrices, proteins, chemometrics and mapping. I also worked mainly with R and in some cases SPSS, which offered different alternatives for data analysis and presentation. 

One of the most challenging projects was working with chemometric concentrations of different residues of chemical compounds extracted from vessels used in human settlements in the past. This challenge allowed me to talk to specialists in the field and to work in a multidisciplinary way in developing data matrices, extracting coordinates and creating maps in R. The most rewarding part was being able to use a colour scale to represent the variation in concentration of specific compounds across settlements. This was undoubtedly a great experience and a technique that I had never had the opportunity to practice. 

ASK-JGI also promoted many events, especially Bristol Data Week, which allowed many interested people to attend courses at different levels specialising in the use of data analysis software such as Python and R. 

The Ask-JGI team have made this year an enjoyable experience. As a cohort, we have come together to provide interdisciplinary advice to support various projects. I would highly recommend anyone with an interest in data science and statistics to apply. It is an incredible opportunity for development and networking and allows you to immerse yourself in the wider Bristol community, as well as learning new techniques that you can use during your time at the University of Bristol.