Ask-JGI Example Queries from the Faculty of Arts, Social Sciences and Law

All University of Bristol researchers (from PhD student and up) are entitled to a day of free data science support from the Ask-JGI helpdesk. Just email ask-jgi@bristol.ac.uk with your query and one of our team will get back to you to see how we can support you. You can see more about how the JGI can support data science projects for University of Bristol based researchers on our website.

We support queries from researchers across all faculties and in this blog we’ll tell you about some of the researchers we’ve supported from the Faculty of Arts, Law and Social Sciences here at the University of Bristol.

YouTube Comment Scraping

One researcher got in touch for advice about scraping data from the YouTube comment section. They were interested in collecting all the comments for a set of videos so that they could analyse sentiment and engagement with the videos’ content. While this wasn’t something we’d done before, we spent some time reading about the subject and found that the official YouTube Data API (https://developers.google.com/youtube/v3) was suitable for this work (no 3rd party tools needed!). We discussed this with the client, and based on their needs, suggested that we use the official Python client as a simple and flexible way to interact with this data source.

While the researcher was relatively new to Python, they expressed an interest in learning for the project. While we wrote the code and documentation for the comment scraping pipeline, the researcher went through some of the Python courses that the JGI offers (https://bristol-training.github.io/). This way, we were able to meet with them again after a few weeks to go through the code together, and make sure everything was understood and in a usable state.

Table of 'Scraped YouTube Comments'. The table shows the channel name, title, author, like count and text. — *Example of the kind of YouTube comment data accessible via the official API.*

Cross-platform comparison of social media posts quoting a Greek poet

One query we supported revolved around a cross-platform comparison of social media posts quoting a specific Greek poet. The study aimed to collect posts from TikTok, Tumblr, and Pinterest to identify the most popular poem quotes and analyse how frequently they were misattributed. While researchers working with platforms like X, Facebook, or YouTube can often find established data collection methods, niche platforms pose unique challenges. A key difficulty was determining the right data sample size across platforms. Three of them form unique social networks with different engagement metrics, making it unclear how many posts would be sufficient for a meaningful analysis. Through collaboration, we worked together to understand the research question better and adapted methodological aspects of this research design. We also explored alternative analysis approaches, including network analysis, to better understand how posts spread on these platforms and to assess the reach of these quotations.

Code review for cross-sectional survey on food insecurity

A PhD student working in anthropology and social policy attended some of the free coding courses the JGI offers (https://bristol-training.github.io/). Since this initial encounter with R, they have been using R for their data analysis. As their supervisors do not work with R, the student found themselves in need of additional feedback on their R based project. Specifically, they wanted to make sure that their approach to and interpretation of Principle Component Analysis is on the right track. So the student contacted Ask-JGI for a second opinion on their analysis, and they wish to have their R code reviewed to make sure it was all working correctly. We are happy to have offered them the support they needed and to confirm that they were on the right track!

People sat at computers looking at code on a projector screen — *R training session led by JGI Data Scientists*.

Fuzzy Matching for Job Postings Analysis

We assisted researchers from the Business School with the data collection process for their job postings analysis. This involved extracting and analysing job postings data to understand how companies invest in specific skill sets, especially those related to cutting-edge technologies like AI.

One of the initial hurdles we faced was matching company names from the provided list with those found in job postings. Even though this might sound straightforward, company names can vary significantly. We encountered abbreviations and slight variations in spelling. A simple exact match would not be sufficient. That’s where fuzzy matching came into play. We used algorithms that can identify similar strings, even with minor differences. This allowed us to accurately link our company list to job postings, even when the names weren’t perfectly aligned. This was crucial for capturing the broadest possible range of relevant data.

The sheer volume of job posting data presented another significant challenge. We were dealing with potentially millions of records, processing this data requires substantial computational resources. To tackle this, we utilized High-Performance Computing (HPC). HPC allows us to distribute the workload across multiple processors, significantly accelerating the data processing and analysis. This was essential for handling the massive datasets and complex algorithms involved in fuzzy matching.

Visualising historical networks of Chinese and Eurasian elites in the British Empire

We are working with a PhD researcher in the History department. In this case, the Ask-JGI team is offering assistance in exploring the use of network visualisation and analysis tools. These might be otherwise not as easily accessible to researchers when the methods are considered interdisciplinary in their home discipline. And Ask-JGI helps to bridge that gap. The PhD project involves mapping the network of powerful individuals in the British Empires across the late 19th and early 20th centuries. This network is complex, as individuals are connected with one another through different types of ties, such as family relations, alumni networks, business partnerships, and political organisations. Visualising these ties as a network of heterogenous nodes and edges helps the researcher to effectively communicate the subject of the research. Through our conversations, we bring clarity to concrete next steps in the analysis of the dataset. We also offered learning resources and advice on alternative analytical methods that can be applied to distil insights on how interpersonal connections and social capital might have translated to power in the historical context.