Exploring the Impact of Medical Influencers on Health Discourse Through Socio-Semantic Network Analysis

JGI Seed Corn Funding Project Blog 2023/24: Roberta Bernardi

Gloved hand holding a petri dish with the Twitter bird logo on the dish
This Photo by Unknown Author is licensed under CC BY-NC-ND

Project Background

Medical influencers on social media shape attitudes towards medical interventions but may also spread misinformation. Understanding their influence is crucial amidst growing mistrust in health authorities. We used a Twitter dataset of the top 100 medical influencers during Covid-19 to construct a socio-semantic network, mapping both medical influencers’ identities and key topics. Medical influencers’ identities and the topics they use to represent an opinion serve as vital indicators of their influence on public health discourse. We developed a classifier to identify influencers and their network of actors, used BERTopic to identify influencers’ topics, and mapped their identities and topics into a network.

Key Results

Identity classification

Most Twitter bios include job titles and organization types, which often have similar characteristics. So, we used a machine learning tool to see how accurately we could predict someone’s job based on their Twitter bio. Our main question is: How well can we guess occupations from Twitter bios using the latest techniques in Natural Language Processing (NLP), like few-shot classification and pre-trained sentence embeddings? We manually coded a training set of 2000 randomly selected bios from the to 100 medical influencers and their followers. Table 1 shows a sample of 10 users with (multi-)labels.

Table of users and their multi-labels
Table 1. Users and their multi-labels

We used six prompts to classify the identities of medical influencers and other actors in their social network. The ensemble method, which combines all prompts, demonstrated superior performance, achieving the highest precision (0.700), recall (0.752), F1 score (0.700), and accuracy (0.513) (Table 2).

Table of prompts and their identities classification
Table 2. Comparison of different prompts for the identities classification

Topic Modelling

We used BERTopic to identify topics from a corpus of 424,629 tweets posted by the medical influencers between December 2021 and February 2022 (Figure 1).

Coloured scatter graph of medical influencer topics
Figure 1. Map of medical influencers’ topics

In total, 665 topics were identified. The most prevalent topic is related to vaccine hesitancy (8919 tweets). The second most significant topic focuses on equitable vaccine distribution 6860 tweets. Figures 2a and 2b illustrate a comparison between the top topics identified by Latent Dirichlet Allocation (LDA) and those by BERTopic.

Word map of LDA top 5th topics on the left and bar charts of BERTopic top 8th topics on the right
Figure 2. Comparisons of LDA topics and BERTopic topics

The topics derived from LDA appear more general and lack specific meaning, whereas the topics from BERTopic are notably more specific and carry clearer semantic significance. For example, the BERTopic model shows either the “Hesitancy” or the “Equity” of the vaccine (topic 0, 1), while the LDA model only provides general topic information (topic 0).

Table 3 shows the three different topic representations generated from the same clusters by three different methods: Bag-of-Words with c-TF-IDF, KeyBERTInspired and ChatGPT.

Table of comparison of three different topic representations methods of BERTopic
Table 3: Comparison of three different topic representations methods of BERTopic

The Keyword Lists from Bag-of-Words with c-TF-IDF and KeyBERTInspired provide quick information about the content of the topic, while the narrative Summaries from ChatGPT offer a human-readable summary but may sacrifice some specific details that the keyword lists will provide. BERTopic captures deeper text meanings, essential for understanding conversation context and providing clear topics, especially in short texts like social media posts.

Mapping Identities and Topics in Networks

We mapped actors’ identities and the most prevalent topics from their tweets into a network (Figure 3).

Network representation of actors’ identities and topics
Figure 3. Network representation of actors’ identities and topics

Each user node features an attribute detailing their identities, which defines the influence of medical influencers within their network and how their messages resonate across various user communities. This visualization reveals their influence and how they adapt discourse for different audiences based on group affiliations. It aids in exploring how the perspectives of medical influencers on health issues proliferate across social media communities.

Conclusion

Our work shows how to identify who medical influencers are and what topics they talk about. Our network representation of medical influencers’ identities and their topics provides insights into how these influencers change their messages to connect with different audiences. First, we used machine learning to categorize user identities. Then, we used BERTopic to find common topics among these influencers. We created a network map showing the connections between identities, social interactions, and the main topics. This innovative method helps us understand how the identities of medical influencers affect their position in the network and how well their messages connect with different user groups.


Contact details and links

For further information or to collaborate on this project, please contact Dr Roberta Bernardi (email: roberta.bernardi@bristol.ac.uk)

Acknowledgement

This blog post’s content is based on the work published in Guo, Z., Simpson, E., Bernardi, R. (2024). ‘Medfluencer: A Network Representation of Medical Influencers’ Identities and Discourse on Social Media,’ presented at epiDAMIK ’24, August 26, 2024, Barcelona, Spain