Using topic modelling to study lived experiences in extreme weather conditions

In 2025, Huw Day (formerly JGI Data Scientist, now Research Associate in Digital Health at the VIVO Hub for Enhanced Independent Living) completed a project working with Eunice Lo and Joanne Godwin, helping them with a study of lived experiences in extreme climate conditions in the South West of England.

This project has now been published in Weather and Climate Extremes, and in this blog, Huw explains his role in the project and the work he did.


What was your role?

Part of this survey included asking participants about adaptations they made during extreme weather events, such as heatwaves, storms and floods and cold snaps. My role was to extract discussion themes from this large amount of free text responses. This analysis forms part of the work recently published!

In both panels, the number in each theme indicates the total number of unique mentions of actions within the theme (e.g., repeated mentions of an action in response to the same weather hazard by the same respondent only count as one action). The vertical length of each theme is proportional to the number of actions within it, so themes with few actions may be difficult to read. Examples include “Other: 52” in the orange box at the bottom of panel (a), and “Check in on people: 35” in the yellow box within “Precaution: 360” in panel (b).
Adaptation actions by survey respondents. Adaptation actions in response to (a) cold weather and warnings and (b) stormy weather and warnings, clustered into major themes (middle column) and minor themes (right column). Figure 6 from Lo et al. (December 2025) CC BY 4.0.

What is topic modelling?

Topic modelling is a natural language processing (NLP) task of taking in a large collection of documents (which could be social media posts, paper titles/abstracts or survey responses) and sorting them into groups. Given those groups of documents, we then seek to extract some sort of meaning from those groups. Are there certain words that are both unique and also common to a particular group of documents, for example? 

When is it useful?

Qualitative data tells stories but quantitative data puts it context, so topic modelling is a nice way to get an idea of what topics are being mentioned and how frequently. Understanding the lived experiences of individuals is important, but knowing how many individuals share that experience can help frame it further. In contrast, if you only know the number of people affected but not the specifics of their experience, then it can be hard to know how they might need to be supported.  

If you have a huge amount of text data then it’s costly and time consuming to do manual qualitative analysis on it all. Topic modelling (among other NLP tools) can help you focus your manual efforts and put manual qualitative analysis into wider context.

What tools did you use?

I made use of BERTopic for topic modelling, which works by converting sentences of text into high dimensional vectors in some embedding space, projected those vectors down into some lower dimensional space whilst preserving local and global structure and then clustering those embedding vectors.

Each cluster has a collections of sentences in them, so you use something called a bag-of-words approach where you consider how often words appear in clusters. If a word appears a lot across many different clusters, it might be quite a common word in your documents and wouldn’t be a good candidate to represent a particular cluster. If a word appears frequently in your favourite cluster but seldom in the other clusters, then it’s a good candidate for a label to describe the contents of your favourite cluster.

Where can people find out more?

The paper Sociodemographic vulnerability to cold and stormy weather in relation to health and life is published in Weather and Climate Extremes (December 2025).

There is a public GitHub repo for this project which includes:

  • Code for data loading (not the actual data though), including splitting longer phrases for more representative embeddings whilst keeping track of who said what for visualisations.
  • Code for running BERTopic for topic modelling, commented to explain how you can fine tune relevant parameters depending on the size and topic diversity of your documents.
  • A slide deck for a talk I gave on topic modelling to the climate dynamic group at Bristol. The slide deck notes include lots of links to nice explainers on relevant topics from some of my favourite content creators like StatsQuest and 3Blue1Brown.

There is also a recording of my talk A Crash Course in Topic Modelling with BERTopic on the JGI YouTube channel.