Ask JGI Student Experience Profiles: Mike Nsubuga

Mike Nsubuga (Ask-JGI Data Science Support 2023-24) 

Embarking on a New Path 

Mike Nsubuga
Mike Nsubuga, first year PhD Student in Computational Biology and Bioinformatics

In the early days at Bristol, even before I began my PhD, I stumbled upon something extraordinary. AskJGI, a university initiative that provides data science support to researchers from all disciplines, caught my attention through a recruitment advert circulated by my PhD supervisor for support data scientists.

My journey started with hesitation. As a brand-new PhD student, who had just relocated to the UK, I questioned whether I was ready or suitable for such a role. Despite my reservations, my supervisor saw potential in me and encouraged me to seize this opportunity. Yielding to their encouragement, I applied, not fully realizing then how this decision would profoundly shape both my academic and professional paths. 

A World of Opportunities 

Joining AskJGI opened a door to a dynamic world brimming with ideas and innovations. My background in bioinformatics and computational biology meant that working on biomedical queries was particularly rewarding. These projects varied from analyzing protein expression data to studying infectious diseases, allowing me to use data science in meaningful ways. 

Among the initiatives I was involved in was developing models to predict protein production efficiency in cells from their genetic sequences. Our goal was clear yet impactful: to identify patterns in genetic sequences that indicate protein production efficiency. We employed advanced data analysis and machine learning techniques to achieve effective predictions. 

Additionally, I contributed to a project analyzing the severity of dengue infections by using statistical models to identify key biological markers. We pinpointed certain markers as critical for distinguishing between mild and severe cases of the infection. 

These projects showcased the transformative power of data science in understanding and potentially managing diseases, directly impacting public health strategies. 

Making Science Accessible: Community Engagement at City Hall

A highlight of my tenure with AskJGI was participating in Data Science Week at bustling Bristol City Hall. The event was not merely a showcase of data science but an opportunity to demystify complex concepts for the public. Engaging in lively discussions and simplifying intricate algorithms for curious visitors was incredibly fulfilling, especially seeing their excitement as they understood the concepts that are often discussed in our professional circles. 

Audience sitting in City Hall. Some audience members are raising there hand. There is a projector and a speaker at the front of the hall
AI and the Future of Society event as part of Bristol Data Week 2024

Fostering Connections and Gaining Insights 

AskJGI enhanced my technical skills and broadened my understanding of the academic landscape at the University of Bristol. The connections I forged were invaluable, sparking collaborations that would have been unthinkable in the more isolated environment of my earlier academic career. Reflecting on my transformative journey with AskJGI, I am convinced more than ever of the importance of interdisciplinary collaboration and the critical role of data science in tackling complex challenges. I encourage any researcher at the University of Bristol who is uncertain about their next step to explore what AskJGI has to offer. For PhD students looking to get involved, it represents not just a learning opportunity but a chance to make a significant societal impact. 

Unlocking big web archives: a tool to learn about new economic activities over space and time

JGI Seed Corn Funding Project Blog 2022/23: Emmanouil Tranos 

Where do websites go to die? Well, fortunately they don’t always die even if their owners stop caring about them. Their ‘immortality’ can be attributed to organisations known as web archives, whose mission is to preserve online content. There are quite a few web archives today with different characteristics – e.g. focusing on specific topics vs. archiving the whole web – but the Internet Archive is the oldest one. Even if you are not familiar with it directly, you might have come across the Wayback Machine, which is a graphical user interface to access webpages archived by the internet archive.  

Although it might be fun to check the aesthetics of a website from the internet’s early days – especially considering the current 1990s revival – one might question the utility of such archives. But some archived websites are more useful than others. Imagine accessing archived websites from businesses located in a specific neighborhood and analysing the textual descriptions of the services and products these firms offer as they appear on their websites. Imagine being able to geolocate these websites by using information available in the text. Image doing this over time. And, image doing this programmatically for a large array of websites. Well, our past research did that and, therefore, serves as a proof-of-concept for the utility of web archives in understanding the geography of economic activities. Our models were successful in utilising a well-curated by The British Library and the UK Web Archive data set to understand how a well-known tech cluster – that is Shoreditch in London – evolved over time. Importantly, we were able to do this at a much higher level of detail in terms of the descriptions of the types of economic activities than if we had used more traditional business data.  

The JGI project provided the opportunity to start looking forward. Our proof-of-concept research was useful in validating the value of such a research trajectory and revealing the evolving mechanisms of economic activities as we only focused on the 2000-2012 period. The next question is how to use this research framework in a current context.  

Before I explain the challenge in doing this, let me tell you about the value of being able to do this. Our current understanding of the typologies of economic activities is based on a system called Standard Industrial Classification (SIC) codes. Briefly, businesses need to choose the SIC code that describes best what they do. Useful as they may be, SIC codes have not been updated since 2007 and, therefore, cannot capture new and evolving economic activities. In addition, there is built-in ambiguity in SIC codes as quite a few of them are defined as “… not elsewhere classified” or “… other than …”. Having a flexible system that can easily provide granular and up-to-date classifications of economic activities within a city or a region can be very useful to a wide range of organisations including local authorities, chambers of commerce and sector-specific support organisations.  

The main challenge of building such a tool is data in terms of finding, accessing, filtering and modelling relevant data. Our JGI seedcorm project together with Rui Zhu and Giulia Occhini allowed us to pave the path for such a research project. Thanks to the Common Crawl, another web archive which offers all its crawled data openly every two months, we have all the data we need. The problem is that we have much more data than what we need as the Common Crawl crawls and scrapes the whole web providing a couple of hundred of terabyte of data every two months. And that is in compressed format! So, only accessing these data can be challenging set aside building a workflow which can do all the steps I mentioned above and – importantly – keep on doing these steps every few months once new data dumps become available.  

Although we are nowhere close to completing such a big project, the JGI seedcorn funding allowed us to test some of the code and the data infrastructure needed to complete such a task. We are now developing funding proposals for such a large research programme and although a risky endeavour, we are confident that we can find the needle in the haystack and build a dynamic system of typologies of economic activities at a level of detail higher than current official and traditional data offer, which is based on open data and reproducible workflows.  


Emmanouil Tranos 

Professor of  Quantitative Human Geography | Fellow at the Alan Turing Institute  

e.tranos@bristol.ac.uk | @EmmanouilTranos | etranos.info | LinkedIn 

Ask JGI Student Experience Profiles: Emma Hazelwood

Emma Hazelwood (Ask-JGI Data Science Support 2023-24) 

Emma Hazelwood
Emma Hazelwood, final year PhD Student in Population Health Sciences

I am a final year PhD student in Population Health Sciences. I found out about the opportunity to support the JGI’s data science helpdesk through a friend who had done this job previously. I thought it sounded like a great way to do something a bit different, especially on those days when you need a bit of a break from your PhD topic.

I’ve learnt so many new skills from working within the JGI. The team are very friendly, and everyone is learning from each other. It’s also been very beneficial for me to learn some new skills, for instance Python, when considering what I want to do after my PhD. I’ve been able to see how the statistical methods that I know from my biomedical background be used in completely different contexts, which has really changed the way I think about data. 

I’ve worked on a range of topics through JGI, which have all been as interesting as they have been different. I’ve helped people with coding issues, thought about new ways to visualise data, and discussed what statistical methods would be most suitable for answering research questions. In particular, I’ve loved getting involved with a project in the Latin American studies department, where I’ve been mapping key locations from conferences throughout the early 20th century onto satellite images, bringing to life the routes that the conference attendees would have taken. 

This has been a great opportunity working with a very welcoming team, and one I’d recommend to anyone considering it!

Ask JGI Student Experience Profiles: Emilio Romero

Emilio Romero (Ask-JGI Data Science Support 2023-24)

Emilio Romero
Emilio Romero, 2nd year PhD Student in Translational Health Sciences

Over the past year, my experience helping with the Ask-JGI service has been really rewarding. I was keen to apply as I wanted to get more exposure to the research world in Bristol, meet different researchers and explore with them different ways of working and approaching data.  

From a technical perspective, I had the opportunity to work on projects related to psychometric data, biological matrices, proteins, chemometrics and mapping. I also worked mainly with R and in some cases SPSS, which offered different alternatives for data analysis and presentation. 

One of the most challenging projects was working with chemometric concentrations of different residues of chemical compounds extracted from vessels used in human settlements in the past. This challenge allowed me to talk to specialists in the field and to work in a multidisciplinary way in developing data matrices, extracting coordinates and creating maps in R. The most rewarding part was being able to use a colour scale to represent the variation in concentration of specific compounds across settlements. This was undoubtedly a great experience and a technique that I had never had the opportunity to practice. 

ASK-JGI also promoted many events, especially Bristol Data Week, which allowed many interested people to attend courses at different levels specialising in the use of data analysis software such as Python and R. 

The Ask-JGI team have made this year an enjoyable experience. As a cohort, we have come together to provide interdisciplinary advice to support various projects. I would highly recommend anyone with an interest in data science and statistics to apply. It is an incredible opportunity for development and networking and allows you to immerse yourself in the wider Bristol community, as well as learning new techniques that you can use during your time at the University of Bristol. 

Ask JGI Student Experience Profiles: Daniel Collins

Daniel Collins (Ask-JGI Data Science Support 2023-24)

Daniel Collins
Daniel Collins, 2nd year PhD Student in the School of Computer Science at the University of Bristol

I applied to Ask-JGI as a 2nd year PhD student on the Interactive AI CDT. Before starting my PhD, I spent several years working in Medical Physics for the NHS. Without a formal background in data science, transitioning to an AI-focused PhD felt like a significant shift. I was looking for opportunities to gain more practical experience in areas of data science outside of my research topic, and Ask-JGI has been the perfect place to do this! 

Working with Ask-JGI has been a hugely rewarding experience, and I’ve really appreciated the variety it introduced into my day-to-day work. With a PhD, you’re often working towards a long-term goal in a very specific domain area, with projects that can span several months at a time. With Ask-JGI, each query becomes a self-contained mini-project with a much smaller scope and timeline. These short bursts of exploration and learning have been really valuable to have alongside my PhD. 

The queries involve supporting researchers from various specialisms across the University, and can involve a broad range of topics and technical skills. I’ve particularly enjoyed queries that have involved writing demo code e.g. for data processing, visualisation or modelling. One of the highlights has been my work with GenROC, visualising the number of children with different rare genetic conditions recruited to the study. To try to make it more engaging for the children and families involved, we developed a pipeline for creating 3D bubble plots with a space theme using the Blender Python API. This was great because I got to spend time learning a new software tool while also learning more about the important work the GenROC researchers are doing at the University!

Blender API bubble plots for GenROC project. Plots made with anonymised and randomised data
Example of the Blender API bubble plots made for GenROC, with anonymised and randomised data

I wholeheartedly recommend joining the team if you have experience in any area of data science and you’re looking to develop your skills. The JGI team have created an incredibly friendly and supportive environment for learning and collaboration. It’s an excellent opportunity to learn from others, and gain exposure to the different ways data science can be applied in academic research!