Computer Experiments

Blog written by Jonathan Rougier, Professor of Statistical Science, University of Bristol

In a computer experiments we run our experiment in silico, in situations where it would be expensive or illegal to run them for real.

Computer code which is used as an analogue for the underlying system of interest is termed a simulator; often we have more than one simulator for a specified system. I have promoted the use of ‘simulator’ over the also-common ‘model’, because the word ‘model’ is very overloaded, especially in Statistics (see Second-order exchangeability analysis for multimodel ensembles).

Parameters and Calibration

The basic question in a computer experiment is how to relate the simulator(s) and the underlying system. We need to do this in order to calibrate the simulator’s parameters to system observations, and to make predictions about system behaviour based on runs of the simulator.

Parameters are values in the simulator which are adjustable. In principle every numerical value in the code of the simulator is adjustable, but we would usually leave physically-based values like the gravitational constant alone. It is common to find parameters in chunks of code which are standing-in for processes which are not understood, or which are being approximated at a lower resolution. In ocean simulators, for example, we distinguish between ‘molecular viscosity’, which is a measurable value, and ‘eddy viscosity‘, which is the parameter used in the code.

The process of adjusting parameters to system observations is a statistical one, requiring specification of the ‘gap’ between the simulator and the system, termed the discrepancy, and the measurement errors in the observations. In a Bayesian analysis this process tends to be called calibration. When people refer to calibration as an inverse problem it is usually because they have (maybe implicitly) assumed that the simulator is perfect and the measurement error is Normal with a simple variance. These assumptions imply that the Maximum Likelihood value for the parameters is the value which minimizes the sum of squared deviations. But we do not have to make these assumptions in a statistical analysis, and often we can use additional insights to do much better, including quantifying uncertainty.

The dominant statistical model for relating the simulator and the system is the best input model, which asserts that there is a best value for the parameters, although we do not what it is. Crucially, the best value does not make the simulator a perfect analogue of the system: there is still a gap. I helped to formalize this model, working with Michael Goldstein and the group at Durham University (e.g. Probabilistic formulations for transferring inferences from mathematical models to physical systems and  Probabilistic inference for future climate using an ensemble of climate model evaluations). Michael Goldstein and I then proposed a more satisfactory reified model which was better-suited to situations where there was (or could be) more than one simulator (Reified Bayesian modelling and inference for physical systems). The paper has been well-cited but the idea has not (yet) caught on.

In a Bayesian analysis, calibration and prediction tend to be quite closely related, particularly because the same model of the gap between the simulator and the system has to be used for both calibration (using historical system behaviour) and prediction (future system behaviour).

There are some applications where quite simplistic models have been widely used, such as ‘anomaly correction’ in paleoclimate reconstruction and climate prediction (See Climate simulators and climate projections).


Calibration and prediction are fairly standard statistical operations when the simulator is cheap enough to run that it can be embedded ‘in the loop’ of a statistical calculation. But many simulators are expensive to run; for example, climate simulators on super-computers run at about 100 simulated years per month. In this case, each run has to be carefully chosen to be as informative as possible. The crucial tool here is an emulator, which is a statistical model of the simulator.

In a nutshell, carefully-chosen (expensive) runs of the simulator are used to build the emulator, and (cheap) runs of the emulator are used ‘in the loop’ of the statistical calculation. Of course, there is also a gap between the emulator and the simulator.

Choosing where to run the simulator is a topic of experimental design.

Early in the process, a space-filling design like a Latin Hypercube is popular. As the calculation progresses, it is tempting to include system observations in the experimental design. This is possible and can be very advantageous, but the book-keeping in a fully-statistical approach can get quite baroque, because of keeping track of double-counting – see Bayes linear calibrated prediction for complex systems. It is quite common in a statistical calculation to split learning about the simulator on the one hand, and using the emulator to learn about the system on the other, for pragmatic reasons (Comment on article by Sanso et al).

Sometimes the emulator will be referred to as the surrogate simulator, particularly in Engineering. Often the surrogate is a flexible fitter with a restricted statistical provenance (e.g.’polynomial chaos‘). This makes it difficult to use surrogates for statistical calculations, because a well-specified uncertainty about the simulator is a crucial output from an emulator. Statistics and Machine Learning have widely adopted the Gaussian process as a statistical model for an emulator.

Gaussian processes can be expensive to compute with, especially when the simulator output is high-dimensional, like a field of values (Efficient emulators for multivariate deterministic functions). The recent approach of inducing points looks promising  (On sparse variational methods and the Kullback-Leibler divergence between stochastic processes).

Emulators have also been used in optimization problems. Here the challenge is to approximately maximize an expensive function of the parameters; I will continue to refer to this function as the ‘simulator’. Choosing the parameter values at which to run the simulator is another experimental design problem. In the early stages of the maximization the simulator runs are performed mainly to learn about the gross features of the simulator’s shape, which means they tend to be widely-scattered in the input space. But as the shape becomes better known (i.e., the emulator’s uncertainty reduces), the emphasis shifts to homing-in on the location of the maximum, and the simulator runs tend to concentrate in one region. There are some very effective statistical criteria for managing this transition from explore to exploit. This topic tends to be known as ‘Bayesian optimization’ in Machine Learning, see Michael Osborne’s page for some more details.



EPIC Lab: Generating a first-person (egocentric) vision dataset for practical chemistry – data analysis and educational opportunities

Blog written by Chris Adams, Teaching Fellow, School of Chemistry, University of Bristol

This project was funded by the annual Jean Golding Institute seed corn funding scheme.

Our project was a collaboration between the Schools of Computer Science and Chemistry. The computer scientist side stems from the Epic Kitchens project, which used head-mounted GoPro cameras to capture video footage from the user’s perspective of people performing kitchen tasks. They then used the resulting dataset to set challenges to the computer vision community: can a computer learn to recognise tasks that are being done slightly differently by different people? And if it can, can the computer learn to recognise whether the procedure is being done well? Conceptually this is not so far from what we as educators do in an undergraduate student laboratory; we watch the students doing their practicals, and then make judgements about their competency. Since chemistry is essentially just cooking, we joined forces to record some video footage of undergraduates doing chemistry experiments. Ultimately, one can imagine the end point being a computer trained to recognize if an experiment was being done well and providing live feedback to the student; like a surgeon doing an operation wearing a camera that can guide them. This is way in the future though….

There were some technical aspects that we were interested in exploring – for example, chemistry often involves colourless liquids in transparent vessels. The human eye generally copes with this situation without any problem, but it’s much more of a challenge for computers. There were also some educational aspects to think about – we use videos a lot in the guidance that we give students, but these are not first person, and are professionally filmed. How would footage of real students doing real experiments compare? It was also interesting to have recordings of what the students actually do (as opposed to what they’re told to do) so we can see at which points they deviate from the instructions.

We used the funding to purchase a couple of GoPros to augment those we already had, and to fund two students to help with the project. Over the course of a month, we collected film of about 30 different students undertaking the same first year chemistry experiment, each film being about 16 GB of data (thanks to the rdsf for looking after this for us). It was interesting to see how the mere fact of wearing the camera affected the student’s behaviour; several of them commented that they made mistakes which they wouldn’t normally have done, simply because they were being watched. As somebody who has sat music exams in the recent past I can testify that this is true….

One of the research students then sat down and watched the films, creating a list of which activities were being carried out at what times, and we’re now in the process of feeding that information to the computer and training it to recognize what’s going on. This analysis is still ongoing, so watch this space….

The Jean Golding Institute seed corn funding scheme

The JGI offer funding to a handful of small pilot projects every year in our seed corn funding scheme – our next round of funding will be launched in the Autumn of 2019. Find out more about the funding and the projects we have supported in our projects page.

Can machines understand emotion? Curiosity Challenge winners announced

Photo courtesy of Alex Smye-Rumsby

We are pleased to announce the winners of the Curiosity Challenge are Oliver Davis and his team here at the University of Bristol: Zoe Reed, Nina Di Cara, Chris Moreno-Stokoe, Helena Davies, Valerio Maggio, Alastair Tanner and Benjamin Woolf.

The team will be collaborating with We The Curious on a prototype, which is due to be ready in October and will be going live to audiences when the new exhibition at We The Curious opens next year. Oliver Davis is an Associate Professor and Turing Fellow at Bristol Medical School and the Medical Research Council Integrative Epidemiology Unit (MRC IEU), where he leads a research team in social and genetic data science. Together his team and We The Curious will develop a ‘Curiosity Toolkit’ for a public audience called ‘Can machines understand emotion?’ The team’s toolkit will invite audiences to:

  • Share the idea that humans can produce data that help to classify emotions
  • Recognise that humans produce lots of data every day that expresses how they feel, and that researchers can use these data to teach machines to interpret those feelings
  • Experience a live example of how the type of data they produce can contribute to an Artificial Intelligence (AI) solution to a problem
  • Understand why researchers need computers to help them to analyse huge volumes of data
  • Contribute to and influence current research being undertaken by the team
  • Appreciate how these data can be used on a large scale to understand population health.

Oliver Davis says “Our toolkit will guide participants through the process of teaching machines how to recognise human emotion, using a series of five activities. This is directly relevant to our current research using social media data in population cohorts to better understand both mental health difficulties and positive emotions such as happiness and gratitude.”

Helen Della Nave at We The Curious said “The enthusiastic response from researchers to work with our public audiences was fantastic. Working with Oliver’s team will give audiences the opportunity to influence development of a new database of emotions and support future research. We are very excited about our audiences having the opportunity to get actively involved in this project.”

Through this competition, We The Curious have also offered Rox Middleton, a Post-doctoral Research Fellow at the University of Bristol a research residency for her project ‘How to use light’.

For further details of the competition requirements and background, see Curiosity Challenge.

The Jean Golding Institute data competitions

We run a number of competitions throughout the year – to find out more take a look at Data competitions.

Visualising group energy

Blog written by Hen Wilkinson, School for Policy Studies at the University of Bristol.

The project was funded by the annual Jean Golding Institute seed corn funding scheme. It emerged from Hen’s ESRC funded PhD research, supported by the SWDTC and School for Policy Studies.

Collaborative working is central to tackling the world’s complex problems but is not easy to sustain

Power dynamics and inequalities play out in all directions, in the relationships between individuals just as much as between organisations. By making ‘hot spots’ visible in group interactions it becomes easier to acknowledge and work with points of conflict that will inevitably arise and to deal with them in a creative and sustainable manner.

While researching ‘the space between’ individuals and organisations, qualitative researcher Hen Wilkinson and data scientist Bobby Stuijfzand developed a new methodology using computer software to visualize energy shifts in group interactions. Listening to audio recordings of groups working together on a task, the impact of nonverbal elements in the group interaction was striking, with dynamics between participants influenced just as much by the nonverbal content of laughs, silences, sighs, asides and interruptions, as by the words spoken.

Visualizing shifts of energy – a new approach in qualitative research

Following this observation, the ambition to visualize these tangible shifts of ‘energy’ in the groups took hold. To date, little attention has been paid to generating computer visuals in qualitative research, so creating a rigorous, systematic visualization of energy shifts was lengthy, challenging and exciting. For more detail on the rationale and methodology we developed over the course of two years and to view the final interactive versions of the design, see Visualizing energy shifts in group interactions. Among the many challenges we faced were finding and adapting an instrument to use with small and interactive qualitative datasets; establishing interrater reliability; identifying what was meant by ‘energy’; deciding which nonverbal elements to visualize; and how to present the resulting data.

On the website we present four 5-minute visualized extracts of group interaction, each drawn from a different group discussion, two of which were held in the UK and two in the Netherlands. Each extract of data is five minutes long, made up of 2.5 minutes of interaction either side of a central mid-point clash or strong challenge in the group. The five minutes of data were then scored by a team of raters listening independently to audio clips of the extract divided into meaning units, which are shown as ‘topic shifts’ on the visualizations. In this way, the qualitative data was converted into numerical values for three main variables – levels of mood and engagement as they shifted over a set period of time.

The support of seed corn funding from the Jean Golding Institute allowed us to work on the presentation of the visualizations, from realising an interactive website which showed how the numerical data we used was reached to refining the aesthetics of the design to encourage maximum engagement with the graphs and clarity of understanding in the viewer. Initial images were generated using ggplot2, a data visualization package for the statistical programming language ‘R’ – see Initial visualizations.

Initial visualizations

Following the generation of these first images, we explored the significance of data presentation through extensive design research, working with designer Derek Edwards. This drew on multiple sources in a visual exploration of accessibility, of the impact of colour, of multi-layered research and into the use of pattern, texture, animation and shape in displaying qualitative data. Slides from the design research show some of the various considerations we were reflecting on:

Design considerations

The initial images generated with ‘R’ were then refined using D3.js, a powerful and well-regarded software library used extensively to create interactive data visualizations on the web. Refining the aesthetics of the design was important to the project, both in terms of encouraging maximum engagement with the graphs and in terms of data clarity. Each graph contains multiple layers of information, from group participant engagement levels to the overall mood of the group, points of topic shift in the group discussions and dropdown text boxes of the verbal interactions between participants at any topic shift point.

The example below – visualizing a strikingly bad-tempered interaction – uses the final design we settled on (see Visualizing energy shifts in group interactions) once all considerations had been taken into account. The ‘energy line’ running through the centre of the graph is a composite of engagement and mood results and is cut across by a second nonverbal indicator of group dynamic – incidents of laughter illustrating both use and function. As outlined in the methodology sketch, we developed a categorisation for types of laughter heard in this study ranging from cohesive (green) through self-focused (yellow) to divisive (red). In this group, laughter can be seen to anticipating the shifts in mood from positive (green) to negative (red) and back again.

This project has sparked considerable interest, both in terms of its early-day implications for qualitative and mixed methods research and in terms of its potential as an applied tool for teams, organisations and collaborations to use. Further funding in 2019 through an Impact Award has enabled the interdisciplinary team working on the project to embark on further developments and connections.

We are fully aware of the work-in-progress nature of this approach and are very interested to receive feedback, comments, ideas for future applications from anyone out there! If you would like more information on this visualization project or have a comment to share, please contact the lead researcher, Hen Wilkinson, via

The Jean Golding Institute seed corn funding scheme

The JGI offer funding to a handful of small pilot projects every year in our seed corn funding scheme – our next round of funding will be launched in the Autumn of 2019. Find out more about the funding and the projects we have supported.


Reusing qualitative datasets to understand shifts in HIV prevention 1997-2013

Photo (copyright D. Kingstone): This image of Catherine and Ibi in conversation represents varied layers of clarity/blurriness that were a constituent part of the anonymising process. Decisions about the removal of potentially identifying data got talked through by both researchers until they achieved clarity about the best way forward.

A conversation between Dr Catherine Dodds and Dr Ibidun Fakoya

The project was funded by the annual Jean Golding Institute seed corn funding scheme.

Qualitative data re-use and open archiving

This project aimed to demonstrate the considerable value of qualitative data re-use and open archiving. Our team undertook in-depth anonymisation of two existing qualitative HIV datasets by applying and refining an anonymisation protocol. The two qualitative datasets anonymised were:

  • Relative Safety: contexts of risk for gay men with HIV (2008/09) 42 transcripts, Sigma Research
  • Plus One: HIV serodiscordant relationships among Black African people in England (2010/11) 60 transcripts, Sigma Research

The key aim of the project was to ready these materials through the removal of personally identifying information, so that the deidentified data can then be deposited with the UK Data Archive. We provided meta-data (including participant recruitment materials, data collection templates, related research outputs, thematic lists used in coding and details of the anonymisation) for each submission.

In this blog post, Dr Catherine Dodds and Dr Ibidun Fakoya from the School of Public Policy, converse about the ethical and practical considerations of archiving qualitative data. Catherine was involved in the original data collection and was the PI for this project, and Ibidun undertook the bulk of the anonymising work.

Ibidun: What motivated you to deposit these datasets to the UK Archive?

Catherine: A few years back I was awarded funding from the Wellcome Trust to examine the feasibility of re-using and archiving multiple qualitative datasets relating to HIV in the UK. Working with a coalition of other UK researchers on that project, we learned that while the UK Data Archive makes the process of deposit incredibly straightforward, what takes much more time is the decision-making, data transfer and preparation of data for deposit. We were looking at depositing projects that go quite far back in time, before the notion of Open Data was a widespread concept, so there was a lot to be considered in terms of readying these transcripts for deposit in a way that is useful, ethical and responsible. A key outcome of that work was the development of an anonymisation protocol to assist with the practical and ethical decision-making that is involved when readying such data for sharing in an archive.

Ibidun: How did you decide which datasets to deposit?

Catherine: During my 16-year career with Sigma Research (latterly at LSHTM), I was involved with and led on a considerable array of qualitative studies. I selected the data from Plus One and Relative Safety II because they were both undertaken just over ten years ago, at a time when it became more clear that HIV pharmaceuticals were being positioned as HIV prevention technologies. Because this is an area of particular interest for me, I wanted to personally revisit these two studies first in order to re-use them, while also anonymising them in readiness for archiving.

Ibidun: Do you think the ethical considerations for depositing data are different for qualitative and quantitative data? If so, how?

Catherine: They are absolutely different, because qualitative data tend to focus on the experiences, perspectives and human stories of participants in ways that are rich and detailed. This is one of the real strengths of this type of data. This means we need to anonymise in a way that goes beyond just identifying and removing personal names and names of organisations or places. Instead, we need to consider whether the overall narrative a person offers (as a collection of life experiences) could itself identify an individual who should be allowed to remain anonymous. This requires a highly skilled approach to anonymisation. If we were anonymising quantitative datasets, the risk of potential identification would probably be much lower, and anonymisation might just involve removing a few fields from a database.

Catherine : You weren’t involved in the original data collection, what contextual detail helped you to get started when anonymising these materials in readiness for archiving?

Ibidun: It helped that I am familiar with the HIV research in the UK. I started out working in HIV and sexual health back in 2001, so I was aware of the findings of these studies before I started the anonymisation. Nevertheless, it was useful to read through the original study materials such as the research protocols, topic guides, interview schedules and fieldnotes. Speaking to the original investigators also provided insights to the research landscape at that time. By understanding the original aims, objectives and findings from the studies, I was able to focus on just the anonymisation rather than become distracted by the themes that were emerging from the data.

Catherine: How did you tackle the task of reading through so much text and remaining alert to the requirements of anonymisation?

Ibidun: Anonymisation takes a lot of concentration. You need to remain focussed on the transcripts and read every word to ensure that you do not miss any identifiers. I knew that I would struggle to remain alert if I tried to read the transcripts on my computer because I am used to skim reading articles on screen. Initially, I had thought about printing out all the transcripts, but I am conscious of wasting paper. Instead, I made use of the “Learning Tools” in Microsoft Word. I followed these steps to improve my focus and comprehension and ensure I read every word:

  1. Go to View > Learning Tools
  2. Select Column Width to compress sentence line length to make the page narrower
  3. Select Read Aloud, to hear the document as each word is highlighted.
  4. Increase the Read Aloud speed so you are reading approximately 300 words per minute.

It takes a little while to get used to the Read Aloud function, particularly at speed, but ultimately, I found this method to be the most efficient way to remain focused and quickly read through the large volume of text.

Catherine: How did you find the anonymisation protocol that was devised as a support tool?

Ibidun: The protocol was useful for getting started with the task, particularly for straightforward guidance on how to deal with direct identifiers and geographic locations. For more complicated anonymisation (e.g. “Situations, activities and contexts”) the guiding principles set out in the protocol provided only a starting point, meaning we needed to identify cases for team discussion where the potential for identification was high.

Catherine: What advice about anonymising qualitative datasets would you give others who want to archive similar materials for re-use?

Ibidun: My top three tips are:

  1. Keep a research diary like you would do for any other study. Keep note of your reflections and ideas as these may come in handy later.
  2. Work in a team of at least two so you can discuss any ambiguous decisions about de-identification.
  3. Your first duty is to protect the anonymity of the interviewee. If you cannot do that without destroying the integrity of the data (because you have to redact too much material) then err on the side of caution and keep the transcript out of the archive. When in doubt, do not deposit.

Catherine: Is there anything that surprised you in undertaking this work?

Ibidun: I was surprised by the emotional impact of reading the transcripts. Many of the interviewees recounted traumatic events or spoke about painful personal relationships. At times I found myself angry about injustices interviewees had faced, especially those who had been subject to criminal investigations for the reckless transmission for HIV. Therefore, anonymising such sensitive information has the same ethical considerations for researchers as undertaking original qualitative data collection. Researchers undertaking anonymisation also need to pause and reflect on the effects of engaging with emotionally charged narratives and be able to discuss these with colleagues.

Catherine: What value is there to making these materials available to other researchers through the UK Data Archive?

Ibidun: I hope researchers from outside the field of sexual health and HIV research can use these narratives in new and novel ways. It is possible that themes unrelated to the original research might emerge from the data for other researchers. For example, a linguist might want to examine changes in speech patterns among gay men in the UK. A sociologist might examine the datasets about the impact of unemployment on black African migrants in England. There’s a lot of potential in re-using these datasets, perhaps in combination with other data from the UK Data Archive.

Ibidun: I can throw that question back at you, how useful are qualitative datasets for other researchers?

Catherine: I suspect that will be up to them to decide. We have had interest from PhD students and other colleagues who want to use these data to interrogate specific historical aspects of HIV in the UK. It is a shame that we have not been able to undertake anonymisation and deposit more swiftly, but our learning is that to do this work retrospectively takes a great deal of re-familiarisation and case-by-case decision making. Archivists at the UKDA have been very excited about the prospect of having a themed set of qualitative data on social aspects of HIV in their collection and are convinced that this will be of use for researchers focussed on HIV. Social historians, LGBT and queer studies specialists, anthropologists and others might also have use for the data.

Ibidun: What are the methodological considerations when re-using qualitative data?

Catherine: I have just written a methods article on this subject, but in brief:

  1. It is essential that the person re-using the data becomes as familiar as possible with the original context and goals of the project from which the data emerges. Hopefully there will be metadata available to support them in this, and they also should seek to discuss the project with those originally involved (where possible). In my own case, even though I was one of the original data collectors, I was amazed by just how much I had already forgotten (or re-structured in my own mind’s eye).
  2. It is instructive to attend to Hammersley’s (2010) reflections on re-use, which encourage us to think about the given and constructed nature of the data we encounter in these endeavours. For instance, my colleague Peter Keogh and I have approached re-use purposively; we were interested in the theme of biomedicalization and so were selective about which particular projects and transcripts we chose to analyse. We wanted to capture both the mundane and the challenging aspects of life in close proximity to HIV. At the same time, some of the given elements of these data emerged from the shadows in ways that had taken us off guard and reminded us of what it was to work in different moments and places of an unfolding epidemic. And furthermore, as Irwin et al. (2012) have also argued, bringing data and researchers together across datasets can afford an opportunity for listening out to silences which enable us to open up new interpretive avenues.

The Jean Golding Institute seed corn funding scheme

The JGI offer funding to a handful of small pilot projects every year in our seed corn funding scheme – our next round of funding will be launched in the Autumn of 2019. Find out more about the funding and the projects we have supported.