JGI Seed Corn Funding Project Blog 2022-2023: Jin Zheng

Developing an Integrated and Intelligent Algo-trading System

Introduction:

The financial trading landscape is constantly evolving, driven by advancements in technology and the need for faster, more efficient decision-making. Traditional algo-trading strategies have become central features of modern financial markets due to their speed, accuracy, and cost-effectiveness. However, these strategies often rely solely on the analysis of present and past quantitative data, neglecting the importance of incorporating qualitative data in the decision-making process. To address this limitation, we aim to develop an integrated and intelligent algo-trading system that combines advanced technology, data integration, and intelligent decision-making.

Data Integration:

In financial trading, individuals collect diverse information from various sources, including market-based data, past performance, financial reports, public opinion, and news. Our integrated system seeks to leverage the power of data integration by combining market-based quantitative data with qualitative data sources. By integrating these different data types, we can gain a more comprehensive understanding of the financial landscape and make informed trading decisions.

Advanced Technology:

The integrated system will harness advanced technologies such as artificial intelligence (AI), cloud computing, and machine learning algorithms. These technologies enable the analysis and processing of vast amounts of data in real-time. AI algorithms can identify patterns, trends, and correlations that may not be immediately apparent to human traders. Cloud computing provides the scalability and computing power necessary to handle large volumes of data and perform complex calculations. By leveraging these advanced technologies, our system can enhance decision-making and improve trading performance.

Intelligent Decision-Making:

The core objective of our system is to enable intelligent decision-making in the trading process. While traditional algo-trading strategies focus on quantitative analysis, our integrated approach incorporates qualitative data, allowing traders to better assess potential risks and identify market trends. By factoring in qualitative data, traders can make more informed decisions and adjust their strategies accordingly. Intelligent decision-making is achieved through the application of AI and machine learning algorithms, which can analyze vast amounts of data and provide valuable insights to traders.

With support from the Jean Golding Institute, we successfully ran a hybrid workshop on Machine Learning and data science in finance. The workshop aimed to bring together experts and enthusiasts in the field to exchange knowledge, share insights, and explore the intersection of machine learning and finance. We were fortunate to have a line-up of esteemed speakers, including four external experts one internal expert who are renowned in their respective areas of expertise. Their diverse backgrounds and experiences enriched the workshop and provided valuable perspectives on the application of machine learning in finance.

We have successfully finished the development of a robust data pipeline and have created a unified API that efficiently retrieves data from various sources. We have implemented effective data cleaning techniques and implemented measures to filter out spam. Additionally, we have utilized Graph Neural Networks (GNN) to determine the influence rate of each account and calculate the daily sentiment rate for several stocks with significant market capitalization. Furthermore, we have incorporated predictive models into our system.

Moving forward, our next objective is to create a cloud-based web service that empowers users to build their own trading robots, develop unique trading strategies, and design customized trading algorithms. To enhance the user experience, we will incorporate advanced data visualization techniques, allowing traders to effortlessly interpret and analyze the vast array of information available. Moreover, we aim to enhance the system’s capabilities by integrating machine learning algorithms for improved decision-making and risk management. Our ultimate goal is to create a user-friendly and versatile platform that caters to the needs of researchers, individual traders, and students alike. Through this platform, users will be able to gain practical experience, enhance their financial knowledge, and utilize cutting-edge technologies in the field of algorithmic trading.

JGI Seed Corn Funding Project Blog 2022-2023: Neo Poon

Seeking ground truth in data: prevalence of pain and its impacts on well-being and workplace productivity

Chronic pain is a major health issue across the globe. Researchers estimated that at least 10% of the population in the United Kingdom are suffering from pain conditions. If we consider the entire world, some estimated that over 20% of the population have chronic pain and that results in more than 20 million ‘pain days’ per year. Naturally, it is important to examine how pain conditions affect people’s well-being and their productivity in the workplace.

Our research team (Digital Footprints Lab at the Bristol Medical School, led by Dr Anya Skatova) specialises in using Big Data to investigate human behaviours and social issues. In our previous works, we have already established a link between the purchase of pain medicines and the proportion of people working part-time across geographical regions of the United Kingdom, which suggests an economic cost of chronic pain and an impact on national productivity.

With the funds provided by the Jean Golding Institute (JGI), we decided to directly investigate the ‘ground truth’. That is, instead of examining pain at geographical levels, we designed a survey to ask individuals about their pain conditions, well-being, physical health states, and employment status. Importantly, and relevant to JGI’s focus on data science, the survey also asks individuals to share their shopping history data with us. With the General Data Protection Regulation (GDPR) in place, residents in the United Kingdom have the right to data portability, which means people can choose to share their data held by companies to external organisations, such as a university or a research team. In our design, participants are asked to donate their loyalty card data related to their shopping at a major supermarket with us. This study allows us to ask important questions, such as how the frequency and types of pain relief purchases are related to different types of pain conditions reported by participants. We further ask questions including how pain conditions affect people’s life satisfaction and their ability to work, which might collectively have an impact on their shopping patterns beyond just the purchases of pain relief products.

The JGI funds facilitates the data collection process, which is being finalised at the moment of writing. Moving forward, this study will allow us to define chronic pain with shopping patterns alone, which can drive future research: by connecting the frequency and types of pain medicines with self-reported pain conditions from this study, we can find a way to define a metric and more accurately compute the prevalence of chronic pain from transaction data itself. Our research team has ongoing partnerships with other supermarket and pharmacy chains, which provide us access to commercial data for research purposes. When we conduct similar research using these external data and when it is not possible to directly involve participants with surveys, we can then employ our metric and estimate the proportion of people suffering from chronic pain. Furthermore, our study also includes questions about menstrual pain, which is an important but seldom studied aspect of pain experience, which opens up further avenue for research. Potentially we can examine how menstrual pain impacts the quality of life and people’s workplace productivity. Finally, our study also controls for Covid-19 history, which might have a long-term effect on pain conditions and subjective well-being, paving the way for research studying the longitudinal effect of Covid-19.

JGI Seed Corn Funding Project Blog 2022-2023: Jiao Wang and Ahmed Mohamed

Large-sample evaluation of deep learning and optical flow algorithms in nowcasting radar rainfall-flooding events

1. Aim

This seed corn project aimed to identify a large sample of rainfall events that result in flooding in Great Britain and to evaluate deep learning and optical flow algorithms for radar rainfall nowcasting at the event and catchment scales.

2. Data collection

During the project, we collected hourly time series for observed rainfall and flow data over ten years across 458 catchments in Great Britain (Figure 1). Moreover, we classified these catchments based on various criteria, such as area, location, land cover types, and human activities. Meanwhile, the UK Met Office’s radar rainfall mosaic data with high resolutions in space (1 km) and in time (5 minutes), generated by the UK Met Office Nimrod system, were also collected covering Great Britain.

Figure 1. Location map of 458 catchments in Great Britain.
Figure 1. Location map of 458 catchments in Great Britain.

3. Rainfall-flooding events identification

We applied a recently developed and novel objective methodology called the DMCA-ESR method to separate rainfall-flow events for each catchment. This process yields a total of 18,360 events, encompassing a wide range of magnitudes and durations. The threshold of peak flow for each catchment was set based on flooding information derived from local government reports and previous studies. We also removed overlapping events based on predefined criteria for event occurrence and termination. Consequently, 442 rainfall events that contributed to flooding were identified. Radar data were then extracted specifically for each event based on event start and end times.

4. Deep learning and optical flow algorithms

We employed the UNet, a convolutional neural network (CNN), for rainfall nowcasting. The model was trained, evaluated, and validated using the radar data. The two-year radar data from 2018 to 2019 were split into 80% for training and 20% for evaluation. The UNet model takes the previous 12 radar rainfall frames as input to forecast the subsequent 12 frames.

Additionally, we used three optical flow methods in rainfall nowcasting, namely, SparseSD, Dense, and DenseRotation. These methods utilize the concept of motion estimation to predict the movement of rain patterns in radar images. The Eulerian persistence, which assumes that the current rainfall distribution will remain unchanged in the near future, was used as a standard baseline.

5. Evaluate rainfall nowcasting Performance

We evaluated the performance of the deep learning model and optical flow algorithms for nowcasting all 442 events in Great Britain. The accuracy of the nowcasts was assessed using two metrics: Mean Absolute Error (MAE) and the critical success index (CSI). Figure 2 illustrates the average metric results for a specific rainfall event.

Figure 2. Verification results of five rainfall nowcasting models in terms of two indicators, MAE and CSI (at a rain intensity threshold of 10 mm/h) for a rainfall event in Great Britain, which occurred from 23/05/2013 8:00 PM to 30/05/2013 5:00 AM.
Figure 2. Verification results of five rainfall nowcasting models in terms of two indicators, MAE and CSI (at a rain intensity threshold of 10 mm/h) for a rainfall event in Great Britain, which occurred from 23/05/2013 8:00 PM to 30/05/2013 5:00 AM.

Based on Figure 2, we observe a general decline in the performance of all models as the lead time increases. The Eulerian Persistence baseline exhibits the lowest performance. Regarding the MAE, the UNet initially shows lower performance compared to the optical flow-based algorithms at the early lead times (t+5, t+10, and t+15). However, as the lead time progresses, the UNet’s advantage becomes more prominent, and it outperforms the other models at longer lead times (after t+25). SparseSD, Dense, and DenseRotation demonstrate relatively similar performance. In terms of the CSI values at a rainfall intensity threshold of 10 mm/hr, the UNet exhibits superior performance compared to the other models, except at lead times of 20 and 25, where DenseRotation slightly outperforms it. Among the optical flow-based models, DenseRotation demonstrates the best overall performance.

6. Next steps

For our upcoming steps, we have outlined the following objectives:

  1. Evaluate the five rainfall nowcasting models at the catchment scale.
  2. Compare the performance of the algorithms using information theory criteria.
  3. Provide a comprehensive summary that highlights any patterns or relationships between the catchment characteristics and the nowcasting model performance.
  4. Utilize the rainfall nowcasts for hydrological modelling and evaluation.

JGI Seed Corn Funding Project Blog 2022-2023: Holly Fraser

Machine learning based online discourse analysis of mental health and medication use

Introduction

Extracting information from free text sources is a complex and exciting data science challenge. Textual data generated by humans is rich, complex, with its meaning often contextual and packed with nuance. This project analysed data from Reddit, an online social news website and discussion forum community, using Natural Language Processing (NLP). NLP is a data science technique used to extract information from textual data sources. A particular aim of this project was to explore ways to model the types of discussions Reddit users were having about antidepressants (AD), a common intervention for the management of depression and anxiety symptoms.

Model exploration

With support from the Jean Golding Institute (JGI), I was able to explore sentiment, emotion, and topics discussed on various subreddits using a range of data driven techniques. For example, I used a sentiment analysis package[1] to extract the sentiment (fig 1) and emotion (fig 2) from a large data corpus (n=24183) extracted from the r/antidepressants subreddit. Figure 3 depicts a schematic of the workflow used to analyse the sentiment of the comments on this subreddit. I then used topic modelling to extract and cluster the topics from the data corpus using a cluster based transformation technique[2] (fig 4).

Figure 1: Sentiment analysis of data from r/antidepressants (n=24183)

Figure 2: Emotion analysis of data from r/antidepressants (n=24183)

Figure 3: Schematic of sentiment analysis workflow

Figure 4: Example of clustered topic extractions

It was really valuable to be able to use these techniques to explore questions related to the lived experience of managing mental health challenges. My PhD research involves using population health data to explore questions related to depression and medication use, so using free text data to explore similar questions in a data driven exploratory way was thought-provoking. For example, how do you extract specific information relevant to mental health from a real-life, unstructured dataset? How could we use data analysis like this for impactful mental health research?

Interpretation of results

One of the biggest challenges of the project has been interpretation of the model results. The output of the topic modelling was particularly difficult, due to many of the topics extracted containing words that didn’t have much meaning out of context despite using strategies to remove these ‘noisy’ words.

The results of the sentiment and emotion analysis are relatively easy to describe and interpret however – for example, the sentiment model classified the majority of comments as having a negative sentiment (fig 1). The emotion analysis model output is also relatively easy to interpret, but worth considering that the model struggled to classify the data into discrete emotion categories, with the ‘others’ column being the most densely populated (fig 2). This doesn’t seem like a surprising finding when looking at the raw data from a human perspective; many of the comments are long and complex, containing multiple stances. For example, the model struggled to correctly predict the emotion of comments which said things like ‘I was doing badly on X, but I’m doing much better now on Y’ (paraphrased). Therefore, more work evaluating the ability of the model to correctly classify things like sentiment and emotion would be valuable.

Knowing which types of text data the model struggled to classify gives an interesting insight into what the challenges of NLP are in this particular context, where the text data are often complex and comprised of different clauses containing multiple emotions.

Conclusion and next steps

A valuable next step to this work would be to more formally assess the ability of the models to classify sentiment, emotion, and extract topics on a smaller data set by using human interpretation. This would give an insight into how well the models I used perform on Reddit data, by comparing the model output to human judgement in a structured way. Extracting important information related to health (e.g., patient experience of a healthcare intervention) from unstructured text data is a significant NLP challenge; having better insight into the complexity of this challenge has been one of the most valuable outcomes of this project.

If anyone is interested in hearing more about this work or my other projects, you can find me on Twitter @hollyalexfraser or email holly.fraser@bristol.ac.uk.

Note: Work carried out for academic purposes only.

References

[1] J. M. Pérez, J. C. Giudici, and F. Luque, “pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks.” arXiv, Jun. 17, 2021. Accessed: Jun. 17, 2023. [Online]. Available: http://arxiv.org/abs/2106.09462

[2] M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure.” arXiv, Mar. 11, 2022. Accessed: Jun. 17, 2023. [Online]. Available: http://arxiv.org/abs/2203.05794

JGI’s Widening Participation Summer Internship Experience: Emily Anderton & Senyi Luo

JGI’s Widening Participation Summer Internship Experience: Emily Anderton & Senyi Luo

Momi, Senyi and Emily with JGI team
Interns Momi, Senyi and Emily with the JGI team

We completed a six-week internship working with Dr Hen Wilkinson and the JGI team as part of the University of Bristol’s Widening Participation Research Summer Internship scheme. Our internship project was in the data science field and centered around the topic of PeaceTech.

Our experience:

Throughout the internship, I gained technical skills in using Tableau software and Python to create data visualisations, and I further developed my critical thinking skills when producing a scoping review of PeaceTech-related literature and researching stop and search statistics.

I felt it was valuable experience to be included in meetings with the wider JGI team, and the friendly nature of the team helped me feel confident when presenting updates on our project. It was also fascinating to learn of the other JGI data science projects currently underway during these meetings, especially since I am interested in pursuing a career in this field.

Although the internship was not akin to my undergraduate degree course field (Accounting and Management), it helped me to develop universal academic skills, including writing literature reviews and reports, which will prove useful as I enter the third year of my degree programme. The internship also enabled me to gain insight into the way research is conducted at the University of Bristol, which will be of great use to me when considering postgraduate study.

Overall, I felt that the internship was well structured, and the daily check-ins helped to keep everyone on the right track, which enabled me to learn so much in only six weeks. The JGI team were a pleasure to work with and I would definitely recommend an internship with them.

– Emily Anderton

During the internship at the JGI, I got a deeper understanding of what PeaceTech is and got to know more about data science. I learned how to collect data, clean data, and visualize it in the end. Alongside with this, I learned about literature review which I have not done before, and it is helpful to my future study. It is such a fascinating experience into the world of research at the university. You can get an excellent insight about how research is conducted and the University works.

Also, the real-world project gives you a valuable hands-on experience about learning new things, solving the problem, and most importantly work as a team. The JGI team is also very friendly and welcoming which as well they give us a lot of support and the atmosphere here is so good. The Widening Participation program is well-organized and there is a meet-up every week so that you can share your experience about your project with other interns from different departments as well which can give you more point of views.

– Senyi Luo

We would like to thank the JGI and the University of Bristol for this incredible opportunity.

JGI Seed Corn Funding Project Blog 2022-2023: Cheryl McQuire

Addressing the fetal alcohol spectrum disorder (FASD) ‘data gap’ –  Cheryl McQuire

A red puzzle bridge connecting two puzzle islands.Cheryl McQuire on behalf of the study team: Amy Dillon, University of Bristol; Prof Raja Mukherjee, Surrey and Borders Partnership NHS Foundation Trust; Prof Penny Cook, University of Salford; Sandra Butcher, National Organisation for FASD; Andy Boyd, Director, UK Longitudinal Linkage Collaboration; Beverley Samways, University of Bristol; Dr Sarah Harding, University of Bristol

Twitter: @cheryl_mcquire

What’s the problem?

Landmark UK guidance has called for urgent action to increase identification, understanding, and support for those affected by fetal alcohol spectrum disorder (FASD); but a paucity of national data undermines the feasibility of achieving this.

Tell me more…

Fetal alcohol spectrum disorder (FASD) is caused by exposure to alcohol in pregnancy. It is the most common non-genetic cause of lifelong disability worldwide. FASD is associated with problems with learning and behaviour and an increased risk of physical, mental health, substance misuse, and social problems. Prevention, early diagnosis, and support for people living with FASD, can improve outcomes and lead to societal cost savings.

In the UK, FASD is thought to be particularly common. A study in Manchester schools found that 2% of children had confirmed FASD, and 4% had possible FASD. UK health organisations have recommended urgent action to improve FASD prevention, diagnosis, and support. Publication of the National Institute for Health and Care Excellence (NICE) Quality Standard for FASD in 2022 sets the strongest precedent yet for improved prevention, assessment, and support for FASD.

In parallel, the UK government has called for a transformation in the way people’s information (data) is used to improve health. However, reliable and accessible data on FASD is not available. This makes it difficult to achieve important FASD research, policy, and healthcare goals.

A potential solution?

We believe that an important step towards addressing the FASD ‘data gap’ will be to produce the first UK National Linked Database for FASD. This would bring together de-identified FASD assessment records from NHS and private health settings that have not previously been available for research. These records would be stored in a trusted research environment, enabling researchers to use the data in way that protects people’s privacy. FASD records could then be linked to other population records including health, education, employment, crime, and social care. It would provide new insights into the characteristics and needs of people living with FASD, impacts and costs of FASD in the UK, and identify opportunities for improving outcomes.

What were the aims of this seed corn project?

This seed corn funding allowed us to take the first steps towards making a UK National Database for FASD a reality. We used it to establish the feasibility, acceptability, key purposes, and data structure of the first linked national research database for fetal alcohol spectrum disorder.

What did we do?

We spoke to over 100 stakeholders including clinicians, data specialists, researchers, policy makers, charities, and people living with FASD to find out:

  1. What they want from a FASD database
  2.  How this database could be used to advance policy, research and practice
  3.  What UK data sources are currently available, and are due to become available, for FASD
  4.  What data are commonly collected by FASD clinics
  5.  What opportunities there are for standardisation/harmonisation of FASD data
  6.  What should be considered in relation to ethical and data governance frameworks, data collation, transfer, storage, linkage, onward sharing and sustainability

To maximise engagement, we took a flexible and tailored approach, speaking to people using email/video conferencing and holding 1 in-person workshop to coincide with the UK Conference for FASD 2023 (Salford, March 2023).

What did we find?

There was strong support for a national FASD database. Charities and those living with FASD spoke of the benefits of increased awareness, understanding and support for FASD. Clinicians reported that the detailed clinical information provided on a national database could improve diagnosis, making assessment more efficient, potentially reducing long waiting lists. Researchers expressed enthusiasm for using it to better understand long-term outcomes, costs and opportunities for improved support. Policy makers identified clear alignment with current FASD and data transformation policy. The most common concern was around privacy and data sharing. The study team has been developing a data pipeline model to ensure that these concerns are appropriately addressed.

What’s next?

We are developing a ‘data pipeline’ model, in collaboration with representatives from FASD clinics and people working in secure data environments to take the initial steps in making the national database for FASD a reality.

We have clear plans for follow on funding, maintaining the strong, widespread, collaborations that we have developed and strengthened through this seed corn work.
We are presenting a summary of this public engagement work at the ADR-UK conference in November and have had this work accepted in the International Journal of Population Data Science.

Overall, this project has been invaluable in paving the way for progress in FASD in the UK. We hope to finally address this crucial FASD ‘data gap’ that has been stalling progress in prevention, understanding and appropriate support for too long.

JGI Seed Corn Funding Project Blog 2022-2023: Sydney Charitos & Lauren Thompson

An exploration of how primary school children want to view their health data: a co-design study

Introduction

Due to the growth of self-tracking health devices, greater attention has been paid to how individuals view their health data. However, young children are often not the focus of these investigations. Instead, adult tools are applied and validated rather than starting from the children’s perspectives.

Aims

This project aimed to explore children’s views around visualising health data and to co-produce a set of designs/prototypes illustrating how health data can be better visualised for children. To do this, we are running a series of creative workshops with a class of 10–11-year-olds from a local school. We had both research aims and social aims throughout the project. These included:

Workshop 1:

Research Aim: Generate a range of data visualisations based on personas with invisible disabilities who represent ‘clients’. These personas represented children with different health conditions, with different motivations and requirements for tracking.

Social Aim: Educate the children about invisible disabilities, foster empathy towards the personas, provide healthy children with a broader understanding of the experiences of non-healthy children, and introduce the concept of co-design by involving them in the research process.

Workshop 2:

Social Aim: Teach the children to use BBC Micro:bits which are small electronic devices created to support children in developing electronics and computer science skills.

Workshop 3:

Research aim: Enable children to provide feedback and compare designs created by artists and academics interested in data visualisation. Subsequently, the children would create visualisations guided by designs based on their workshop 1 displays.

Social aim: Through comparison, children evidence and develop their skills of analysis and evaluation. These are higher-order processing skills which reflect a thorough understanding of the topic and are key steps on the way to intentional creation. Intentional creation is the ultimate goal as it evidences the full understanding of the context and application in education.

Workshop 4:

Research aim: Create a poster synthesising the children’s ideas into one display per persona.

Social aim: Children will show a full understanding of health data visualisation. They will do this by comprehending and responding to the three personas and adapting their ideas to those needs and preferences. This will show their ability to generate and adapt ideas when visualising health data and will be confirmed by their creation of a poster, on which they have also provided rationale for their choices – proving their consideration and conceptualisation. Children will furthermore work collaboratively, evidencing that skill of teamwork, to combine all of their ideas on each of the ‘clients’ into a singular large display. Through synthesising ideas, they will simultaneously evaluate and create in order to reflect their full understanding of both health data visualisation and how to adapt it to requirements.

Results

Since the project is still ongoing, we do not currently have results. However, we can share some displays that highlight interesting concepts. For more details, please reach out to us.

Figure 1, Child's drawing from workshop 1 based on the personas.
Figure 1: Child’s drawing from workshop 1 based on the personas.

Future plans

After we complete all of the intended workshops, we will create a display to be placed on a wall at the school we have been working with. The children will then be able to show their work to their teachers, friends and parents. We intend to write up these results into a paper focusing on the methodology of the study as well as interesting and unexpected findings.

Contact details and links

Contact Sydney Charitos (sydney.charitos@bristol.ac.uk) or Lauren Thompson (lauren.thompson@bristol.ac.uk) for more information about the project.

Ask JGI Student Experience Profiles: Ben Anson

JGI Student Experience Profiles: Ben Anson (Ask-JGI Data Science Support 2022-23)
Ben Anson, 1st year PhD Student in the School of Mathematics at the University of Bristol

JGI Student Experience Profiles: Ben Anson (Ask-JGI Data Science Support 2022-23)

Ask-JGI was advertised to my CDT by one of the staff members in the School of Mathematics. It sounded like a fun way to make the most of my MSc in statistics and get some ‘real world’ practice with statistics, all whilst doing my PhD! So, I applied, and it was one of the easiest application forms I have ever filled in. I was offered the job almost instantly and started 2 or 3 weeks after.

It has been really beneficial for me to chat with people from disciplines, e.g. biological sciences, psychology, education. It makes it much easier to understand the jargon from field to field (and hopefully it was helpful for them to understand more about data science), and gives interesting insights into what other research are up to. A lot of the statistics I’ve studied has been within a fairly theoretical framework, so it has been challenging and rewarding to see how this theory applies in practice. Queries are also quite varied, in the sense that some people know exactly what model they want to use, but want advice on how to perform inference, and other queries are about selecting the right method, or model, or even about framing their problem in a sensible way.

The work has spanned over many areas, I’ve worked on several queries from psychologists about how to fit models and/or test hypotheses for experiments they have run, and helping them explore ways of dealing with problematic datasets (e.g. missing data). I’ve helped analyse survey data from taught courses at the university, advised on how to process football sentiment text data, and discussed the best way to visualize results from non-destructive testing methods!

My experience of Ask-JGI has been mostly with statistics queries, but there are also queries about data visualization, data management, ML applications, etc, which are also super interesting. I’d recommend anyone who has experience in any of the above to apply, as the role is very varied and fits around PhD work.

Ask JGI Student Experience Profiles: Vanessa Hanschke

Ask JGI Student: Vanessa Hanschke
Vanessa Hanschke, 3rd year PhD student, in the School of Computer Science at the University of Bristol

JGI Student Experience Profiles: Vanessa Hanschke (Ask-JGI Data Science Support 2022-23)

I initially wanted to join the Ask-JGI because I thought it would be a great opportunity for me to keep my coding skills alive. I studied computer science and have worked in the data and AI industry for three years before starting my PhD in Interactive Artificial Intelligence. My PhD looks at supporting data science teams to reflect on the social impact of their applications through roleplay and although my technical background is helpful, I don’t actually write any code on a day-to-day.

The JGI experience definitely gave me opportunities to practise some coding, but it also gave me so much more. I was able explore all the interesting research work happening around the university, whether it was fish genetics, appetite psychology, analysing racist discourse or history video games. It’s inspiring to see the many different things researchers can do with data and how different their data sets look: qualitative data in the form of survey responses, hand curated excel sheets manually extracted from historical archives or long lists of numbers collected with environmental sensors. My biggest takeaway is that, because academia can be such a competitive environment, having a place that will give you constructive feedback and support is invaluable. It was very rewarding to facilitate connections between researchers who could collaborate or to provide a piece of advice or that little snippet of code that helped researchers become unstuck.

Another big highlight of being part of the JGI was participating in public outreach events such as AI UK in London or Bristol Data Week. It is such an exciting space and so much fun to speak to people who are just learning about data concepts and are curious to know if it will benefit their lives or if the perceived harms will manifest. The media buzz around data and AI means there is a lot of important work that needs to be done to both demystify the hype, while also opening up opportunities for people to be creative with the possibilities that data science can provide.

Ask JGI Student Experience Profiles: Matt Chandler

Ask JGI Student: Matt Chandler
Matt Chandler, 3rd year PhD student in the Department of Mechanical Engineering

JGI Student Experience Profiles: Matt Chandler (Ask-JGI Data Science Support 2022-23)

Over the past year I’ve found my experience working with the Ask-JGI service really rewarding. I was keen to apply as I was looking for an exposure to the wider world of research being done at Bristol, which is something I have definitely achieved along the way. An aspect I was most surprised by was how relevant a lot of my previous experience in data analysis was, in topics very far removed from my own area of research. Whether it be statistics, coding advice, data ethics or visualization, data is data regardless of where it came from. And when things came up which I had not encountered before, having a team there with a range of different backgrounds made it a lot easier to get up to speed.

A part of the job I’ve most enjoyed was helping out with a range of events throughout the year. The highlights include assisting in the delivery of one of UKRN’s Train-the-Trainer workshops, working a stall at AI UK 2023 on sensing air quality, and the launch of the Ask-JGI Roadshow this year in which the team would visit departments across the university to have a more informal opportunity to engage with researchers about their data. A few of these conversations then lead to more in-depth assistance and advice. If you see an upcoming Roadshow in your department and have a question you would like answering (or even if you don’t!), I would definitely recommend going along.

The Ask-JGI team has made this year a really enjoyable experience. As a cohort, we’ve come together to deliver much better advice than any individual would be able to, and it means we’ve been able to rely on one another when our individual research projects took up more time. I would strongly recommend applying to anyone with even a vague interest in data science. It’s an amazing opportunity for development and networking, and allows you to immerse yourself in the wider community at Bristol.