Get to know your data
Now past the age when many retire, Professor Jean Golding OBE still works full-time to draw conclusions from the massive data sets she has shepherded into being. The epidemiologist, for whom the new University of Bristol data science institute is named, is best known for founding the uniquely rich study ‘Children of the 90s’, or ALSPAC, which collected a wealth of biological, behavioural and social data from babies born in Bristol and their families.
At the start of Professor Golding’s career, punchcards were the height of technology. Now, having worked on three large-scale birth cohort studies and with an illustrious career, she reflects on the changes, and constants, in answering questions with data, cross-disciplinary collaboration, and the University of Bristol.
After reading Mathematics at St Anne’s College, Oxford, Professor Golding took a variety of jobs to support her family before happening upon her first role analysing medical research data. She answered an advert for a statistician, and began by calculating simple percentages for a writeup of a large epidemiological study. While this work was simple mathematically, she learnt from other group members about the medical background of the study and relevant biology, and discovered her own love of finding stories in data. Here, her career in epidemiological research began. She remembers:
That was hands on – not using computers at all. The information that I was interested in was abstracted from questionnaires, written on cards with holes around the edges and could be sorted with a ‘knitting needle’. You got to know the data very well, so you got to know that certain things were associated with one another, which if you had not been looking you might not have noticed.
Physical computing methods such as punchcards certainly let practitioners get to know their information, but became unwieldy when thousands of entries were compared, and impossible when into the hundreds of thousands. However, as data sets have expanded and modern computational methods have come online, Professor Golding feels the fundamentals have not changed.
It’s still important to get to know the data, but you do it in a different way – through doing a variety of different cross-tabulations (a method of comparing how often different characteristics occur together). I think it’s very important, in looking at the data, to see what is missing from your variables – what’s not there will tell you something important. Keep looking at the data, is my message!
Following on from this experience, Jean was involved in three large scale UK birth cohort studies: in 1958 and 1970, and the 1990s study she founded: ALSPAC. (Read more about ALSPAC in the second part of this interview). While the amount of variables collected in ALSPAC is far greater than the others, Jean is neutral about the possibility of further expansion in future studies; she points out that the cost of gathering so much data from each participant is the limiting factor. Asked about the current trend for talking about ‘data science’ and ‘big data’, Jean expresses no particular affinity for the terms; for her, it is all about the data, call it what you may, whether one is a data scientist in 2017 or a statistician in 1958.
As someone performing statistical analyses in an epidemiological context, I asked what discipline she identifies with. “I’ve never called myself a mathematician”, she says, “Just an epidemiologist, although nobody ever told me I could.”
She observes that epidemiologists tend to drift in from various backgrounds, not having considered the field originally. For instance, it’s not that different from zoology, apparently; “you observe groups of ants or bison, and deduce a lot to do with behaviour and mortality and migration. All of that is relevant – a few other come in from psychology, and now through genetics. It doesn’t set out to be cross-disciplinary, but what one studies is necessarily cross-disciplinary.”
She highlights that it’s only recently that cross-disciplinarity has been seen as a benefit by funders – that thirty years ago it was very hard to get a grant that crossed boundaries. But, she believes, it is an inherently beneficial approach, and this seems to be borne out by her success. I asked how she made such collaborations successful, and she replies modestly that it can sometimes be hard. Of a close relationship with a distinguished psychologist in America, she says: “He doesn’t understand what I’m doing, and I don’t understand what he’s saying, but somehow we get published!” The key to making it work, apparently, is that “he has ideas, I have ideas, and we’re both tolerant.” Sounds like an approach that would work in many situations.
For more than 30 years Professor Golding has made her academic home at the University of Bristol, although she maintains strong links to other institutions such as UCL, home of her long-time collaborator Professor Pembrey. She cites her collaborations and colleagues as one of the highlights of the institution for her
I think the university is really vibrant and has been changing so much for the better recently. Of course I think the Institutes are an amazingly beneficial way of getting cross-disciplinary research going. It has certainly worked well with the Elizabeth Blackwell Institute, and I’m sure it’s going to with the JGI.
Still research-active, although writing papers instead of crunching data, Jean still has many questions that she wishes to answer. “Now I’m retired, I can work on what I’m interested in”, Jean explains, and it is clear that her interests lie where they always have; in getting to know the data, taking good ideas from everyone, and using them to find out what is going on – aims that are reflected in the institute that now bears her name.
Read more about the ALSPAC study, what makes it unique and the findings Professor Golding has extracted from it in the follow-up blogpost. Written by Kate Oliver, PhD student at the University of Bristol and freelance science writer. With thanks to Professor Jean Golding for a fascinating conversation!