Photo (copyright D. Kingstone): This image of Catherine and Ibi in conversation represents varied layers of clarity/blurriness that were a constituent part of the anonymising process. Decisions about the removal of potentially identifying data got talked through by both researchers until they achieved clarity about the best way forward.

A conversation between Dr Catherine Dodds and Dr Ibidun Fakoya

The project was funded by the annual Jean Golding Institute seed corn funding scheme.

Qualitative data re-use and open archiving

This project aimed to demonstrate the considerable value of qualitative data re-use and open archiving. Our team undertook in-depth anonymisation of two existing qualitative HIV datasets by applying and refining an anonymisation protocol. The two qualitative datasets anonymised were:

Relative Safety: contexts of risk for gay men with HIV (2008/09) 42 transcripts, Sigma Research
Plus One: HIV serodiscordant relationships among Black African people in England (2010/11) 60 transcripts, Sigma Research

The key aim of the project was to ready these materials through the removal of personally identifying information, so that the deidentified data can then be deposited with the UK Data Archive. We provided meta-data (including participant recruitment materials, data collection templates, related research outputs, thematic lists used in coding and details of the anonymisation) for each submission.

In this blog post, Dr Catherine Dodds and Dr Ibidun Fakoya from the School of Public Policy, converse about the ethical and practical considerations of archiving qualitative data. Catherine was involved in the original data collection and was the PI for this project, and Ibidun undertook the bulk of the anonymising work.

Ibidun: What motivated you to deposit these datasets to the UK Archive?

Catherine: A few years back I was awarded funding from the Wellcome Trust to examine the feasibility of re-using and archiving multiple qualitative datasets relating to HIV in the UK. Working with a coalition of other UK researchers on that project, we learned that while the UK Data Archive makes the process of deposit incredibly straightforward, what takes much more time is the decision-making, data transfer and preparation of data for deposit. We were looking at depositing projects that go quite far back in time, before the notion of Open Data was a widespread concept, so there was a lot to be considered in terms of readying these transcripts for deposit in a way that is useful, ethical and responsible. A key outcome of that work was the development of an anonymisation protocol to assist with the practical and ethical decision-making that is involved when readying such data for sharing in an archive.

Ibidun: How did you decide which datasets to deposit?

Catherine: During my 16-year career with Sigma Research (latterly at LSHTM), I was involved with and led on a considerable array of qualitative studies. I selected the data from Plus One and Relative Safety II because they were both undertaken just over ten years ago, at a time when it became more clear that HIV pharmaceuticals were being positioned as HIV prevention technologies. Because this is an area of particular interest for me, I wanted to personally revisit these two studies first in order to re-use them, while also anonymising them in readiness for archiving.

Ibidun: Do you think the ethical considerations for depositing data are different for qualitative and quantitative data? If so, how?

Catherine: They are absolutely different, because qualitative data tend to focus on the experiences, perspectives and human stories of participants in ways that are rich and detailed. This is one of the real strengths of this type of data. This means we need to anonymise in a way that goes beyond just identifying and removing personal names and names of organisations or places. Instead, we need to consider whether the overall narrative a person offers (as a collection of life experiences) could itself identify an individual who should be allowed to remain anonymous. This requires a highly skilled approach to anonymisation. If we were anonymising quantitative datasets, the risk of potential identification would probably be much lower, and anonymisation might just involve removing a few fields from a database.

Catherine : You weren’t involved in the original data collection, what contextual detail helped you to get started when anonymising these materials in readiness for archiving?

Ibidun: It helped that I am familiar with the HIV research in the UK. I started out working in HIV and sexual health back in 2001, so I was aware of the findings of these studies before I started the anonymisation. Nevertheless, it was useful to read through the original study materials such as the research protocols, topic guides, interview schedules and fieldnotes. Speaking to the original investigators also provided insights to the research landscape at that time. By understanding the original aims, objectives and findings from the studies, I was able to focus on just the anonymisation rather than become distracted by the themes that were emerging from the data.

Catherine: How did you tackle the task of reading through so much text and remaining alert to the requirements of anonymisation?

Ibidun: Anonymisation takes a lot of concentration. You need to remain focussed on the transcripts and read every word to ensure that you do not miss any identifiers. I knew that I would struggle to remain alert if I tried to read the transcripts on my computer because I am used to skim reading articles on screen. Initially, I had thought about printing out all the transcripts, but I am conscious of wasting paper. Instead, I made use of the “Learning Tools” in Microsoft Word. I followed these steps to improve my focus and comprehension and ensure I read every word:

Go to View > Learning Tools
Select Column Width to compress sentence line length to make the page narrower
Select Read Aloud, to hear the document as each word is highlighted.
Increase the Read Aloud speed so you are reading approximately 300 words per minute.

It takes a little while to get used to the Read Aloud function, particularly at speed, but ultimately, I found this method to be the most efficient way to remain focused and quickly read through the large volume of text.

Catherine: How did you find the anonymisation protocol that was devised as a support tool?

Ibidun: The protocol was useful for getting started with the task, particularly for straightforward guidance on how to deal with direct identifiers and geographic locations. For more complicated anonymisation (e.g. “Situations, activities and contexts”) the guiding principles set out in the protocol provided only a starting point, meaning we needed to identify cases for team discussion where the potential for identification was high.

Catherine: What advice about anonymising qualitative datasets would you give others who want to archive similar materials for re-use?

Ibidun: My top three tips are:

Keep a research diary like you would do for any other study. Keep note of your reflections and ideas as these may come in handy later.
Work in a team of at least two so you can discuss any ambiguous decisions about de-identification.
Your first duty is to protect the anonymity of the interviewee. If you cannot do that without destroying the integrity of the data (because you have to redact too much material) then err on the side of caution and keep the transcript out of the archive. When in doubt, do not deposit.

Catherine: Is there anything that surprised you in undertaking this work?

Ibidun: I was surprised by the emotional impact of reading the transcripts. Many of the interviewees recounted traumatic events or spoke about painful personal relationships. At times I found myself angry about injustices interviewees had faced, especially those who had been subject to criminal investigations for the reckless transmission for HIV. Therefore, anonymising such sensitive information has the same ethical considerations for researchers as undertaking original qualitative data collection. Researchers undertaking anonymisation also need to pause and reflect on the effects of engaging with emotionally charged narratives and be able to discuss these with colleagues.

Catherine: What value is there to making these materials available to other researchers through the UK Data Archive?

Ibidun: I hope researchers from outside the field of sexual health and HIV research can use these narratives in new and novel ways. It is possible that themes unrelated to the original research might emerge from the data for other researchers. For example, a linguist might want to examine changes in speech patterns among gay men in the UK. A sociologist might examine the datasets about the impact of unemployment on black African migrants in England. There’s a lot of potential in re-using these datasets, perhaps in combination with other data from the UK Data Archive.

Ibidun: I can throw that question back at you, how useful are qualitative datasets for other researchers?

Catherine: I suspect that will be up to them to decide. We have had interest from PhD students and other colleagues who want to use these data to interrogate specific historical aspects of HIV in the UK. It is a shame that we have not been able to undertake anonymisation and deposit more swiftly, but our learning is that to do this work retrospectively takes a great deal of re-familiarisation and case-by-case decision making. Archivists at the UKDA have been very excited about the prospect of having a themed set of qualitative data on social aspects of HIV in their collection and are convinced that this will be of use for researchers focussed on HIV. Social historians, LGBT and queer studies specialists, anthropologists and others might also have use for the data.

Ibidun: What are the methodological considerations when re-using qualitative data?

Catherine: I have just written a methods article on this subject, but in brief:

It is essential that the person re-using the data becomes as familiar as possible with the original context and goals of the project from which the data emerges. Hopefully there will be metadata available to support them in this, and they also should seek to discuss the project with those originally involved (where possible). In my own case, even though I was one of the original data collectors, I was amazed by just how much I had already forgotten (or re-structured in my own mind’s eye).
It is instructive to attend to Hammersley’s (2010) reflections on re-use, which encourage us to think about the given and constructed nature of the data we encounter in these endeavours. For instance, my colleague Peter Keogh and I have approached re-use purposively; we were interested in the theme of biomedicalization and so were selective about which particular projects and transcripts we chose to analyse. We wanted to capture both the mundane and the challenging aspects of life in close proximity to HIV. At the same time, some of the given elements of these data emerged from the shadows in ways that had taken us off guard and reminded us of what it was to work in different moments and places of an unfolding epidemic. And furthermore, as Irwin et al. (2012) have also argued, bringing data and researchers together across datasets can afford an opportunity for listening out to silences which enable us to open up new interpretive avenues.

The Jean Golding Institute seed corn funding scheme

The JGI offer funding to a handful of small pilot projects every year in our seed corn funding scheme – our next round of funding will be launched in the Autumn of 2019. Find out more about the funding and the projects we have supported.