Tracing Voices: A Visual Journey through Latin American Debates about Race  

JGI Seed Corn Funding Project Blog 2023/24: Jo Crow

I’m a historian who is keen to learn how digital tools can strengthen our analysis of the material we find in the archives. I research histories of race, racism and anti-racism in Latin America. I’m particularly interested in how ideas about race travelled across borders in the twentieth century, and how these cross-border conversations impacted on nation-state policies in the region.  

The book I am currently writing investigates four international congresses that took place between the 1920s and 1950s: the First Latin American Communist Conference in Buenos Aires, Argentina (1929); the XXVII International Congress of Americanists in Lima, Peru (1939); the First Inter-American Conference on Social Security in Santiago, Chile (1942); and the Third Inter-American Indigenista Congress, in La Paz, Bolivia (1954). These were very different kinds of international meetings. but they all dedicated a significant amount of time to debating the problem of racial inequality, especially the ongoing marginalisation of indigenous peoples. 

Who was at these congresses? Who spoke to whom, and what conversations did they have? Where did the conversations took place? What did the rooms look like? How were they set up? And what about the spaces outside the formal discussion sessions – the drinks receptions that delegates attended, the archaeological sites and museums they visited, the film screenings and book exhibitions they were invited to, the restaurants they frequented, the hotels they stayed in? Luckily, I have found a great variety of source materials – conference proceedings, newspaper reports, personal and institutional correspondence, memoirs of participating delegates – that help me begin to answer these questions.

Black and white photos from a newsletter of men sat down in a room for the  XXVII International Congress of Americanists in Lima
Photographs of the XXVII International
Congress of Americanists in Lima. Published in
El Comercio newspaper, 11 September 1939.
Black and white photo of three delegates at the III Inter-American Indigenista Congress in La Paz.
Photograph of three delegates at the III Inter-American Indigenista Congress in La Paz. Included in an International Labour Organization report of 1954. 

As part of my JGI seed-corn project, I’ve been able to work with two brilliant researchers: Emma Hazelwood and Roy Youdale. Emma helped me to explore the uses of digital mapping for visualising the “who” and “where” of these congresses, and Roy helped me to experiment with machine-reading. In this blog, I share a few of the things we achieved and learnt.   

Digital Mapping

Emma started by inputting the data I had on the people who attended these congresses – their names, nationalities, where they travelled from – into Excel spreadsheets. She then found the coordinates of their origins using an online resource, and displayed them on a map using a coding language called Python. Below are a few of the results for Lima, 1939. The global map (Map 1) shows very clearly that this was a forum bringing together delegates from North, Central, and South America, and several countries in Europe too. We can zoom in to look more closely at the regional spread of delegates (Map 2), and further still to see what parts of Peru the Peruvian delegates came from (Map 3). For those delegates that were based in Lima – because we have their addresses – we can map precisely where in the city they or their institutions were based (Map 4).

Global map with red dots to show delegate locations and a green dot to highlight Peru
Map 1. The global map shows very clearly that this was a forum bringing together delegates from North, Central, and South America, and several countries in Europe.
Map of South America on the left and a zoomed in version on the right with red dots to show delegate locations and a green dot to highlight Peru
Map 2 (left) shows a zoomed in version of the global map to see the regional spread of delegates. Map 3 (right) shows what parts of Peru the Peruvian delegates came from.
Satellite image of Lima with different colour dots to symbolise different institute locations
Map 4. For delegates in Lima, the satellite image maps where in the city they or their institutions were based. 

In some ways, these visualisations don’t tell me anything I didn’t already know. From the list of conference attendees I compiled, for instance, I already had a sense of the spread of the countries represented in Lima in 1939. What the maps do do, however, is tell the story of the international nature of the conference much more clearly and speedily than a list or table can. With the city map, showing where Lima-based delegates lived and worked, we do learn something new. By plotting the addresses, I can envisage the contours of the space they occupied. I couldn’t do that in my head with just a list of the addresses, especially without knowing road names.   

The digital maps also help with comparative analysis. If we look at the global map (like Map 1) of all four congresses together we get a clear view of their very similar reach; most delegates to all of them were from South America. We are also able to swiftly detect the differences – for example, that the Lima conference attracted more delegates from Europe than the other meetings, or that there were no delegates from Europe at the 1954 congress in La Paz. We can then think about the reasons why.  

Satellite image of Lima with an old map layered on top with different colour dots to symbolise different locations
Map 5. Shows the main venues for the XXVII International Congress of Americanists.

Map 5 above takes us back to Lima. It shows the main venues for the XXVII International Congress of Americanists. It visualizes a circuit for us. I don’t think we can perceive this so clearly from a list of venues, especially if we are not very familiar with the city. Here we can see that most of the conference venues and the hotels where delegates stayed were clustered quite closely together, in Lima’s historic centre. Delegates could easily walk between them. There are a few outliers, though: one of the archaeological sites that delegates visited, the museum that threw a reception for delegates, and a couple of restaurants too. This prompts further questions and encourages us to imagine the delegates moving through the city.  

Machine Reading

As well as digital mapping, I’ve been keen to explore what machine or distant reading can add to our analysis of debates about race in early twentieth century Latin America. It’s widely known, for example, that, in the context of the Second World War, many academic and government institutions rejected the scientific validity of the term race (“raza” in Spanish). A machine reading of the proceedings of these four congresses gives us concrete, empirical evidence of how the word race was, in practice, used less and less from 1929, to 1939, to 1942, to 1954. Text analysis software like Sketch Engine, which Roy introduced me to, also enables us to scrutinise how the term was used when it was used. For instance, in the case of the 1929 conference in Buenos Aires, Sketch Engine processes 300+ pages of conference discussions in milliseconds and shows us in a systematic way which so-called “races” were being talked about, the fact that “race” was articulated as an object and a subject of the verb, and how delegates associated the term race with hostile relations, nationhood, indigenous communities, exploitation, and cultural tradition (see below). In short, it provides a really useful, methodical snapshot of the many different languages of race being spoken in Buenos Aires. It is then up to me to reflect on the significance of the detail, and to go back to specific moments in the text, for example the statement of one delegate about converting the “race factor” into a “revolutionary factor”.  

Results from a text analysis in Sketch Engine
Results from a text analysis in Sketch Engine for the 1929 conference in Buenos Aires. The result shows us in a systematic way which so-called “races” were being talked about.

In all, I’ve learnt how digital tools and methodologies can productively change how we’re able to look at things, in this case “race-talk” and who was speaking it. By looking differently we see differently too. What I’d like to do now is to trace where the conversations went from these congresses, and see how much they shifted and transformed in the process of travel.  


Jo Crow Professor of Latin American Studies , School of Modern Languages 

Using ‘The Cloud’ to enhance UoB laboratory data security, storage, sharing, and management

JGI Seed Corn Funding Project Blog 2023/24: Peter Martin, Chris Jones & Duncan Baldwin

Introduction

As a world-leading research-intensive institution, the University of Bristol houses a multi-million-pound array of cutting-edge analytical equipment of all types, ages, function, and sensitivity – distributed across its Schools, Faculties, Research Centres and Groups, as well as in dozens of individual labs. However, as more and more data are captured – how can it be appropriately managed to comply with the needs of both researchers and funders alike?  

What were the aims of the seed corn project? 

When an instrument is purchased, the associated computing, data storage/resilience, and post-capture analysis is seldom, if ever, considered beyond the standard Data Management Plans. 

Before this project, there existed no centralised or officially endorsed mechanism at UoB supported by IT Services to manage long-term instrument data storage and internal/external access to this resource – with every group, lab, and facility individually managing their own data retention, access, archiving, and security policies. This is not just a UoB challenge, but one that is endemic of the entire research sector. As the value of data is now becoming universally realised, not just in academia, but across society – the challenge is more pressing than ever, with an institution-wide solution to the entire data challenge critically required which would be readily exportable to other universities and research organisations. At its core, this Seed Corn project sought to develop a ‘pipeline’ through which research data could be; (1) securely stored within a unified online environment/data centre into perpetuity, and (2) accessed via an intuitive, streamlined and equally secure online ‘front-end’ – such as Globus, akin to how OneDrive and Google Drive seamlessly facilitate document sharing.   

What was achieved? 

The Interface Analysis Centre (IAC), a University Research Centre in the School of Physics currently operates a large and ever-growing suite of surface and materials science equipment with considerable numbers of both internal (university-wide) and external (industry and commercial) users. Over the past 6-months, working with leading solution architects, network specialists, and security experts at Amazon Web Services (AWS), the IAC/IT Services team have successfully developed a scalable data warehousing system that has been deployed within an autonomous segment of the UoB’s network, such that single-copy data that is currently stored locally (at significant risk) and the need for it to be handled via portable HDD/emailed across the network can be eliminated. In addition to efficiently “getting the data out” from within the UoB network, using native credential management within Microsoft Azure/AWS, the team have developed a web-based front-end akin to Google Drive/OneDrive where specific experimental folders for specific users can be securely shared with these individuals – compliant with industry and InfoSec standards. The proof of the pudding has been the positive feedback received from external users visiting the IAC, all of whom have been able to access their experiment data immediately following the conclusion of their work without the need to copy GB’s or TB’s of data onto external hard-drives!  

Future plans for the project 

The success of the project has not only highlighted how researchers and various strands within UoB IT Services can together develop bespoke systems utilising both internal and external capabilities, but also how even a small amount of Seed Corn funding such as this can deliver the start of something powerful and exciting. Following the delivery of a robust ‘beta’ solution between the Interface Analysis Centre (IAC) labs and AWS servers, it is currently envisaged that the roll-out and expansion of this externally-facing research storage gateway facility will continue with the support of IT Services to other centres and instruments. Resulting from the large amount of commercial and external work performed across the UoB, such a platform will hopefully enable and underpin data management across the University going forwards – adopting a scalable and proven cloud-based approach.  


Contact details and links

Dr Peter Martin & Dr Chris Jones (Physics) peter.martin@bristol.ac.uk and cj0810@bristol.ac.uk 

Dr Duncan Baldwin (IT Services) d.j.baldwin@bristol.ac.uk  

Successful Seedcorn Awardees 2024-2025

The Jean Golding Institute Seedcorn Funding is a fantastic opportunity to develop multi and interdisciplinary ideas while promoting collaboration in data science and AI.  We are delighted that a new cohort of multidisciplinary researchers has been supported through this funding.

Leighan Renaud – Building a Folk Map of St Lucia

Leighan Renaud

Dr. Leighan Renaud is a lecturer in Caribbean Literatures and Cultures in the Department of English. Her research interests include twenty-first century Caribbean fiction, mothering and motherhood in the Caribbean, folk and oral traditions in the Anglophone Caribbean, and creative practices of neo-archiving. 

Louise AC Millard – Using digital health data for tracking menstrual cycles

Dr. Louise Millard is a Senior Lecturer in Health Data Science in the MRC Integrative Epidemiology Unit (IEU) at the University of Bristol. Following an undergraduate Computer Science degree and MSc in Machine Learning and Data Mining, they completed an interdisciplinary PhD at the interface of Computer Science and Epidemiology. Their research interests lie in the development and application of computational methods for population health research, including using digital health and phenotypic data, and statistical and machine learning approaches. 

Photo of Louise AC MIllard on the right

Laura Fryer – Visualisation tool for Enhancing Public Engagement Using Supermarket Loyalty Card Data

Photo of Laura Fryer on the left

Laura is a senior research associate in the Digital Footprints Lab based within the Bristol Medical School. Their aim is to use novel data to unlock insights into behavioural science for the purposes of public good. Laura is particularly passionate about broadening the public’s understanding of digital footprint data (e.g. from loyalty cards, bank transactions or wearable technology such as a smart watch) and demonstrating how vital it can be in developing our understanding of population health within the UK and beyond.  Laura’s project is focused on developing a data-visualisation tool that will support public engagement activities and provide a tangible representation of the types of data that we use – building further trust between the public and scientific researchers.  

Nicola A Wiseman – Cellular to Global Assessment of Phytoplankton Stoichiometry (C-GAPS)

Dr. Nicola Wiseman is a Research Associate in the School of Geographical Sciences. They received their PhD in Earth System Science from the University of California, Irvine, where they specialized in using ocean biogeochemical models to investigate the impacts of phytoplankton nutrient uptake flexibility on ocean carbon uptake. They also are interested in using statistical methods and machine learning to better understand the interactions between marine nutrient and carbon cycles, and the role of these interactions in regulating global climate. 

Photo of Nicola A Wiseman on the right

Georgia Sains – Collecting & Analysing Multilingual EEG Data

Georgia Sains is a Doctoral Teaching Associate in the Neural Computation research group at the School of Computer Science. Her research is focused on the overlap between Computer Science, Neuroscience, and Linguistics. Georgia has worked on developing models to help understand how linguistic traits have evolved. More recently, she has been using Bayesian modelling to find patterns between grammar and neurological response and are now focused on using Electroencephalography experimentation to explore the relationship between linguistic upbringing and how the brain processes language. 

Alex Tasker – Building a Strategic Critical Rapid Integrated Biothreat Evaluation (SCRIBE) data tool for research, policy, and practice

Dr. Tasker is a Senior Lecturer at the University of Bristol, a Research Associate at the KCL Conflict Health Research Group and Oxford Climate Change & (In)Security (CCI) project, and a recent ESRC Policy Fellow in National Security and International Relations. Dr. Tasker is an interdisciplinary researcher working across social and natural sciences to understand human-animal-environmental health in situations of conflict, criminality, and displacement using One Health approaches. Alongside this core focus, Dr. Tasker’s work also explores emerging areas of relevance to biosecurity and biothreat including engineering biology, antimicrobial resistance, subterranean spaces, and the use of new forms of evidence and expertise in a rapidly changing world for climate, security, and defense.

Photo of Alex Tasker on the right

Exploring the Impact of Medical Influencers on Health Discourse Through Socio-Semantic Network Analysis

JGI Seed Corn Funding Project Blog 2023/24: Roberta Bernardi

Gloved hand holding a petri dish with the Twitter bird logo on the dish
This Photo by Unknown Author is licensed under CC BY-NC-ND

Project Background

Medical influencers on social media shape attitudes towards medical interventions but may also spread misinformation. Understanding their influence is crucial amidst growing mistrust in health authorities. We used a Twitter dataset of the top 100 medical influencers during Covid-19 to construct a socio-semantic network, mapping both medical influencers’ identities and key topics. Medical influencers’ identities and the topics they use to represent an opinion serve as vital indicators of their influence on public health discourse. We developed a classifier to identify influencers and their network of actors, used BERTopic to identify influencers’ topics, and mapped their identities and topics into a network.

Key Results

Identity classification

Most Twitter bios include job titles and organization types, which often have similar characteristics. So, we used a machine learning tool to see how accurately we could predict someone’s job based on their Twitter bio. Our main question is: How well can we guess occupations from Twitter bios using the latest techniques in Natural Language Processing (NLP), like few-shot classification and pre-trained sentence embeddings? We manually coded a training set of 2000 randomly selected bios from the to 100 medical influencers and their followers. Table 1 shows a sample of 10 users with (multi-)labels.

Table of users and their multi-labels
Table 1. Users and their multi-labels

We used six prompts to classify the identities of medical influencers and other actors in their social network. The ensemble method, which combines all prompts, demonstrated superior performance, achieving the highest precision (0.700), recall (0.752), F1 score (0.700), and accuracy (0.513) (Table 2).

Table of prompts and their identities classification
Table 2. Comparison of different prompts for the identities classification

Topic Modelling

We used BERTopic to identify topics from a corpus of 424,629 tweets posted by the medical influencers between December 2021 and February 2022 (Figure 1).

Coloured scatter graph of medical influencer topics
Figure 1. Map of medical influencers’ topics

In total, 665 topics were identified. The most prevalent topic is related to vaccine hesitancy (8919 tweets). The second most significant topic focuses on equitable vaccine distribution 6860 tweets. Figures 2a and 2b illustrate a comparison between the top topics identified by Latent Dirichlet Allocation (LDA) and those by BERTopic.

Word map of LDA top 5th topics on the left and bar charts of BERTopic top 8th topics on the right
Figure 2. Comparisons of LDA topics and BERTopic topics

The topics derived from LDA appear more general and lack specific meaning, whereas the topics from BERTopic are notably more specific and carry clearer semantic significance. For example, the BERTopic model shows either the “Hesitancy” or the “Equity” of the vaccine (topic 0, 1), while the LDA model only provides general topic information (topic 0).

Table 3 shows the three different topic representations generated from the same clusters by three different methods: Bag-of-Words with c-TF-IDF, KeyBERTInspired and ChatGPT.

Table of comparison of three different topic representations methods of BERTopic
Table 3: Comparison of three different topic representations methods of BERTopic

The Keyword Lists from Bag-of-Words with c-TF-IDF and KeyBERTInspired provide quick information about the content of the topic, while the narrative Summaries from ChatGPT offer a human-readable summary but may sacrifice some specific details that the keyword lists will provide. BERTopic captures deeper text meanings, essential for understanding conversation context and providing clear topics, especially in short texts like social media posts.

Mapping Identities and Topics in Networks

We mapped actors’ identities and the most prevalent topics from their tweets into a network (Figure 3).

Network representation of actors’ identities and topics
Figure 3. Network representation of actors’ identities and topics

Each user node features an attribute detailing their identities, which defines the influence of medical influencers within their network and how their messages resonate across various user communities. This visualization reveals their influence and how they adapt discourse for different audiences based on group affiliations. It aids in exploring how the perspectives of medical influencers on health issues proliferate across social media communities.

Conclusion

Our work shows how to identify who medical influencers are and what topics they talk about. Our network representation of medical influencers’ identities and their topics provides insights into how these influencers change their messages to connect with different audiences. First, we used machine learning to categorize user identities. Then, we used BERTopic to find common topics among these influencers. We created a network map showing the connections between identities, social interactions, and the main topics. This innovative method helps us understand how the identities of medical influencers affect their position in the network and how well their messages connect with different user groups.


Contact details and links

For further information or to collaborate on this project, please contact Dr Roberta Bernardi (email: roberta.bernardi@bristol.ac.uk)

Acknowledgement

This blog post’s content is based on the work published in Guo, Z., Simpson, E., Bernardi, R. (2024). ‘Medfluencer: A Network Representation of Medical Influencers’ Identities and Discourse on Social Media,’ presented at epiDAMIK ’24, August 26, 2024, Barcelona, Spain

Foodscapes: visualizing dietary practices on the Roman frontiers 

JGI Seed Corn Funding Project Blog 2023/24: Lucy Cramp, Simon Hammann & Martin Pitts

Table laid out with Roman pottery from Vindolanda
Table laid out with Roman pottery from Vindolanda ready for sampling for organic residue analysis as part of our ‘Roman Melting Pots’ AHRC-DFG funded project 

The extraction and molecular analysis of ancient food residues from pottery enable us to reconstruct the actual uses of vessels in the past. This means we can start to build up pictures of dietary patterns in the past, including foodways at culturally diverse communities such as the Roman frontiers. However, there remains a challenge in how we can interpret these complex residues, and both visualise and interrogate these datasets to explore use of resources in the past. 

Nowadays, it is commonplace to extract organic residues from many tens, if not hundreds, of potsherds; within each residue, and especially using cutting-edge high-resolution mass spectrometric (HRMS) techniques, there might be several hundred compounds present, including some at very low abundance. Using an existing dataset of gas chromatography-high resolution mass spectrometric data from the Roman fort and associated settlement at Vindolanda, this project aimed to explore methods through which these dietary information could be spatially analysed across an archaeological site, with a view to developing methods that could be applied on a range of scales, from intra-site through to regional and even global. It was hoped that it would be possible to display the presence of different compounds in potsherds recovered from different parts of a site that are diagnostic of particular foodstuffs, in order to spatially analyse the distribution of particular resources within and beyond sites. 

A fragment from a Roman jar that was sampled from Vindolanda
A fragment from a Roman jar that was sampled from Vindolanda for organic residue analysis as part of our ‘Roman Melting Pots’ AHRC-DFG funded project 

The project started by processing a pilot dataset of GC-HRMS data from the site of Vindolanda, following a previously-published workflow (Korf et al. 2020). These pottery sherds came from different locations at the fort, occupied by peoples of different origins and social standings. This included the praetorium (commanding officer’s house), schola (‘officers’ mess’), infantry barracks (occupied by Tungrians, soldiers from modern-day Belgium and Netherlands), and the non-military ‘vicus’ outside of the fort walls likely occupied by locals, traders and families. Complex data, often containing several hundred compounds per residue were re-integrated using open-source mass spectrometry data processing software MZ Mine, supported by our collaborator from MZ IO gmbh, Dr Ansgar Korf. This produced a ‘feature list’ of compounds and their intensities across the sample dataset. This feature list was then presented to Emilio Romero, a PhD student in Translational Health Sciences, who worked as part of the Ask-JGI helpdesk to support academic researchers on projects such as these. Emilio developed data matrices and performed statistical analyses to identify significant compounds of interest that were driving differences between the composition of organic residues from different parts of the settlement.  This revealed, for example, that biomarkers of plant origin appear to be more strongly associated with pottery recovered from inside the fort compared with the vicus outside the fort walls. He was then able to start exploring ways to spatially visualize these data, with input from Léo Gorman, a data scientist from the JGI, and Levi Wolf from the School of Geographical Sciences. Emilio says: 

‘Over the past year, my experience helping with the Ask-JGI service has been truly rewarding. I was very excited to apply as I wanted to gain more exposure to the world of research in Bristol, meet different researchers and explore with them different ways of working and approaching data. 

One of the most challenging projects was working with chemometric concentrations of different chemical compound residues extracted from vessels used in ancient human settlements. This challenge allowed me to engage in dialogue with specialists in the field and work in a multidisciplinary way in developing data matrices, extracting coordinates and creating maps in R. The most rewarding part was being able to use a colour scale to represent the variation in concentration of specific compounds in settlements through the development of a Shiny application in R. It was certainly an invaluable experience and a technique I had never had the opportunity to practice before.’ 

This work is still in progress, but we have planned a final workshop that will take place in mid-November. Joining us will be our project partners from the Vindolanda Trust, as well as colleagues from across the Roman Melting Pots project, the JGI and the University of Bristol. A funding application to develop this exploratory spatial analysis has been submitted to the AHRC.  


Contact details and links

You can find out more about our AHRC-DFG funded project ‘Roman Melting Projects’ and news from this season’s excavations at Vindolanda and its sister site, Magna