#### Blog written by Jonathan Rougier, Professor of Statistical Science, University of Bristol

In a computer experiments we run our experiment *in silico*, in situations where it would be expensive or illegal to run them for real.

Computer code which is used as an analogue for the underlying system of interest is termed a *simulator*; often we have more than one simulator for a specified system. I have promoted the use of ‘simulator’ over the also-common ‘model’, because the word ‘model’ is very overloaded, especially in Statistics (see Second-order exchangeability analysis for multimodel ensembles).

# Parameters and Calibration

The basic question in a computer experiment is how to relate the simulator(s) and the underlying system. We need to do this in order to calibrate the simulator’s *parameters* to system observations, and to make predictions about system behaviour based on runs of the simulator.

Parameters are values in the simulator which are adjustable. In principle every numerical value in the code of the simulator is adjustable, but we would usually leave physically-based values like the gravitational constant alone. It is common to find parameters in chunks of code which are standing-in for processes which are not understood, or which are being approximated at a lower resolution. In ocean simulators, for example, we distinguish between ‘molecular viscosity’, which is a measurable value, and ‘eddy viscosity‘, which is the parameter used in the code.

The process of adjusting parameters to system observations is a statistical one, requiring specification of the ‘gap’ between the simulator and the system, termed the *discrepancy*, and the measurement errors in the observations. In a Bayesian analysis this process tends to be called *calibration*. When people refer to calibration as an *inverse problem* it is usually because they have (maybe implicitly) assumed that the simulator is perfect and the measurement error is Normal with a simple variance. These assumptions imply that the Maximum Likelihood value for the parameters is the value which minimizes the sum of squared deviations. But we do not have to make these assumptions in a statistical analysis, and often we can use additional insights to do much better, including quantifying uncertainty.

The dominant statistical model for relating the simulator and the system is the *best input* model, which asserts that there is a best value for the parameters, although we do not what it is. Crucially, the best value does not make the simulator a perfect analogue of the system: there is still a gap. I helped to formalize this model, working with Michael Goldstein and the group at Durham University (e.g. Probabilistic formulations for transferring inferences from mathematical models to physical systems and Probabilistic inference for future climate using an ensemble of climate model evaluations). Michael Goldstein and I then proposed a more satisfactory *reified model* which was better-suited to situations where there was (or could be) more than one simulator (Reified Bayesian modelling and inference for physical systems). The paper has been well-cited but the idea has not (yet) caught on.

In a Bayesian analysis, calibration and prediction tend to be quite closely related, particularly because the same model of the gap between the simulator and the system has to be used for both calibration (using historical system behaviour) and prediction (future system behaviour). There are some applications where quite simplistic models have been widely used, such as ‘anomaly correction’ in paleoclimate reconstruction and climate prediction (See Climate simulators and climate projections).

# Emulators

Calibration and prediction are fairly standard statistical operations when the simulator is cheap enough to run that it can be embedded ‘in the loop’ of a statistical calculation. But many simulators are expensive to run; for example, climate simulators on super-computers run at about 100 simulated years per month. In this case, each run has to be carefully chosen to be as informative as possible. The crucial tool here is an *emulator*, which is a statistical model of the simulator.

In a nutshell, carefully-chosen (expensive) runs of the simulator are used to build the emulator, and (cheap) runs of the emulator are used ‘in the loop’ of the statistical calculation. Of course, there is also a gap between the emulator and the simulator.

Choosing where to run the simulator is a topic of *experimental design*.

Early in the process, a space-filling design like a *Latin Hypercube* is popular. As the calculation progresses, it is tempting to include system observations in the experimental design. This is possible and can be very advantageous, but the book-keeping in a fully-statistical approach can get quite baroque, because of keeping track of double-counting – see Bayes linear calibrated prediction for complex systems. It is quite common in a statistical calculation to split learning about the simulator on the one hand, and using the emulator to learn about the system on the other, for pragmatic reasons (Comment on article by Sanso et al (PDF)).

Sometimes the emulator will be referred to as the *surrogate* simulator, particularly in Engineering. Often the surrogate is a flexible fitter with a restricted statistical provenance (e.g.’polynomial chaos (PDF)‘). This makes it difficult to use surrogates for statistical calculations, because a well-specified uncertainty about the simulator is a crucial output from an emulator. Statistics and Machine Learning have widely adopted the *Gaussian process* as a statistical model for an emulator.

Gaussian processes can be expensive to compute with, especially when the simulator output is high-dimensional, like a field of values (Efficient emulators for multivariate deterministic functions). The recent approach of *inducing points* looks promising (On sparse variational methods and the Kullback-Leibler divergence between stochastic processes (PDF)).

Emulators have also been used in optimization problems. Here the challenge is to approximately maximize an expensive function of the parameters; I will continue to refer to this function as the ‘simulator’. Choosing the parameter values at which to run the simulator is another experimental design problem. In the early stages of the maximization the simulator runs are performed mainly to learn about the gross features of the simulator’s shape, which means they tend to be widely-scattered in the input space. But as the shape becomes better known (i.e., the emulator’s uncertainty reduces), the emphasis shifts to homing-in on the location of the maximum, and the simulator runs tend to concentrate in one region. There are some very effective statistical criteria for managing this transition from *explore* to *exploit*. This topic tends to be known as ‘Bayesian optimization’ in Machine Learning, see Michael Osborne’s page for some more details.