Brunel’s Network project – next stage

Figure 1: Three views of Brunel’s Network running on an iPhone X. Transitioning a public engagement project from being based on a large touch-screen display to using phones and tablets forced a significant redesign and rethink of the project. The result is a more personal and intuitive app for exploring the network of human interactions that drove the construction of Brunel’s three great ships

A blog written by James Boyd, Brunel Institute and Christopher Woods, University of Bristol.

Following many months of lockdown-based work, the Brunel’s Network project, conducted by the SS Great Britain Trust and University of Bristol collaboration, The Brunel Institute, is now available online.

The project aims to find, record, assess and weight the influence of all the individuals with whom Isambard Kingdom Brunel collaborated on major maritime engineering projects, and to display that network of individuals as a visual interactive. It began by as an idea by James Boyd, Research Fellow in the Brunel Institute, in 2018, and received a major boost in 2019 when Christopher Woods, EPSRC Research Software Engineering fellow, joined the project to help turn static network imagery into an interactive experience. The project was further accelerated in January 2020, with the support of a grant by the Jean Golding Institute, to add an additional project member, Gareth Jones.

The original aim of the project was to build an interactive digital exhibit that would be put on public display as part of this summer’s 50th Anniversary celebrations of Brunel’s second ship, the Great Britain, returning to Bristol. Members of the public would have interacted with the network via a large touch screen display. Gareth built a usable platform for Brunel’s Network over the Spring and early Summer. He realised the vision and, by the end of June, had turned the early prototype into a full production application.

However, as 2020 unfolded, the 50th Anniversary celebrations were postponed and opportunities to launch Brunel’s Network as an interactive public exhibit disappeared. More critically, it became clear that it would be challenging to make a large interactive touch screen safe for public use. In addition, socially-distanced working introduced significant obstacles. Before lockdown, much of the historical research needed for the Great Britain and Great Western networks was complete, but there was still a significant amount to do for the Great Eastern. Lockdown coupled with travel restrictions meant that, from March, archival work effectively came to a standstill, and so research into many historical figures represented in the network was slowed. Thanks to the work of UoB Special Collections archivist Emma Howgill, however, we were able to progress with a crucial element of the historical research. Emma has, in 2019 and 2020, done incredible work digitizing and transcribing the Great Eastern Letter Books – the company records behind Brunel’s third and most ambitious ship, the Great Eastern. With these sources, and regular online discussion to help build the digital interactive, the project has taken shape.

A key, yet difficult decision, was to pivot the project. The large touch screen was dropped, and instead we retargeted Brunel’s Network to run on personal touch screen devices, such as phones and tablets. This required a huge amount of work to reformat the application to support a range of much smaller display sizes, and to adapt from being an application run in a supervised way in an exhibit, into an application that an individual would explore at home. Examples of changes included adding code to automatically switch from writing the full names of individuals on the graph to writing their initials, if the screen size was small, to adding significantly more help and more intuitive controls and feedback. Another major, yet subtle, change, was to adapt the controls to better match the way users interact with phones or tablets, compared to traditional desktop interfaces. With these changes, we had an online application that was ready for launch.

Figure 2: Brunel’s Network running on an iPad. The application has a fully responsive design. Full names displayed for smaller networks (left), and only initials for larger networks (middle). Full names are used, together with highlighted edges, when users select a node.

 

In addition to forcing changes of interface, the move to smaller and more variable display sizes exacerbated the principal challenge of visualising networks, namely that they become increasingly unreadable as the number of nodes and edges increases. While we had implemented filters to allow the public to reduce the number of visible nodes, initial user testing showed that the network was still too confusing, and we were falling into the trap of generating “spaghetti charts”.

To overcome this challenge, we had to come up with some innovative new ways of presenting network data. The network is distinguished from a standard social network diagram in that node gravity is determined by the influence of an individual in the group toward a given project, rather than their connectivity. This was achieved by putting information from the historical sources into a data model that counted their contributions toward the project. Whilst their social connectivity is also highlighted, (and can be used to centre the network), the goal was to examine the extent of investment, expertise and contribution of individuals within a group to a given historical outcome. With nodes sized by project influence, feedback from early users saw them trying to rank the nodes in the network into a visual hierarchy. To display this clearly, we needed to find a way of organising the nodes so that they were spaced out – thus avoiding “spaghetti plots” – but somehow still keeping nodes that represent individuals who made large contributions near the centre, while the nodes for those that contributed less were pushed outwards. Organising the nodes as a spiral, with the largest centred and the others spiralling outward, proved a solution. An algorithm was crafted, drawing inspiration from the spirals of a nautilus shell, that placed the nodes into a uniform spiral. This originated in the centre, with the node representing the individual who made the largest contribution to each ship. Subsequent nodes were placed along the spiral, in order of contribution. With a little tweaking based on user feedback, we reached a design that clearly showed the importance of contributions from individuals in Brunel’s Network, while still also showing the connections between individuals.

Figure 3: Evolution of Brunel’s Network. The left panel shows the version of Brunel’s Network from June running on an iPad. The smaller screen leads to issues with rendering, plus a “spaghetti-graph” view. The second panel shows how we had evolved to use a spiral view by July, with the third panel showing the adoption of a responsive layout in the August version that adapted the controls based on screen dimensions.

 

With the initial launch of the project now complete, we can now move onto the next stage. The goal is to make historical communities of innovation comprehensible and easy to explore in a visual format. Exploration of this data will give deep insight into the networks and relationships that drive success and failure in large historical projects. The project remains live, with much research to be added and further updates to usability and modes of data visualisation. The project team hope that both a webinar programme and series of lectures in early 2021 will give a platform to help interested users explore it in depth. In the meantime, the project is open for historians, SNA enthusiasts and of course Brunel enthusiasts to explore:

Explore Brunel’s Network

To keep up to date with this and other projects, news, events, funding and other opportunities please Join the JGI Mailing list.

 

Dr Anya Skatova, Bristol Turing Fellow, received the prestigious UKRI Future Leader Fellowship

Dr Anya Skatova
Dr Anya Skatova

Dr Skatova’s programme will focus on developing methods to analyse shopping data to improve population health. Digital technology opens up a new era in the understanding of human behaviour and lifestyle choices, with people’s daily activities and habits leaving ‘footprints’ in their digital records. For example, when we buy goods in supermarkets and use loyalty cards to obtain benefits (e.g., future discounts), the supermarket records our purchases and creates a representation of our habits and preferences.

Until now the use of ‘digital footprint’ data has mostly been limited to private companies to track sales of their products, and to target marketing and promotions. Changes in Data Protection law in the UK, mean the public can now access and donate their data for academic research. Shopping history data are an extremely rich source of information for population health research as it can provide granular, objective data on real world choices and behaviours.  When shopping history data are used in a privacy preserving and ethical manner, these data can be utilised for public good, benefiting health research, helping to understand how everyday behaviours and lifestyle choices impact health and social outcomes.

Dr Skatova, based in the Population Health Sciences Department at Bristol Medical School, received a Turing Fellowship and project funding that built the basis for her £1.4m UKRI Future Leader Fellowship that will link transaction data to other environmental and health records collected by the Avon Longitudinal Study of Parents and Children.

The ultimate goal of the study is to put large commercial datasets — such as shopping history data — at the service of the public healthcare through contributing to early detection of diseases, developing and testing targeted interventions, and contributing to the evidence-based healthcare and health research.

What is data? Exploring data from an anthropological perspective

TDWI 2019

Blog written by Josie Price, University of Bristol graduate

Introduction

Humans are repeatedly living through and creating data, yet the uses of data have also become a source of economic, political, psychological and social power. So, what really is data?

My final year thesis for my Anthropology with Innovation degree aimed to investigate this question using an ethnography with data scientists combined with the theory of ontology. This was to better understand the multiplicity of data and its relationship to humans in contemporary western societies.

Aims of the project

  • Use my ethnography with data scientists to answer the question: What is Data?
  • Investigate the role of data in contemporary societies, where data can be human experience as well as an economy, commodity and political tool.
  • Better understand how data transitions into these multiple forms.
  • Combine the study of data with the theory of ontology to understand data from a social anthropological perspective.
  • Better understand the relationship between humans and data.

Results

What is data?

To investigate what data is, I conducted one-to-one interviews with data scientists who work to translate data into significant, meaningful results. The most significant theme was that data scientists understand data to be a model of reality. This is because data scientists understand data as multidimensional, but condensed into a ‘picture’ to provide meaning and structure to the data. This is to better comprehend what the data means, but when situated in ontological theory this functional process has parallels with Viveiros de Castro’s (1998) theory of Perspectivism that is evident in Amerindian ontologies.

Data is a model of reality

This ontology of data as a “picture” of the world can therefore help to explain the multiplicity of data because data is an abstraction of reality. Therefore, data can manifest through multiple forms as models of reality – be it to monitor human behaviour; inform a political strategy or to create an economic marketplace – changing depending on the context and purpose of the data. The ontology of data as a model of reality reveals parallels with the Ontological Turn in anthropology. The Ontological Turn argues that different worlds are experienced simultaneously, thereby denying the existence of a ‘singular truth’ and revealing the presence of dominant models that pervade society (Holbraad and Pedersen, 2017; Escobar, 1995). Likewise, data scientists’ ontology of data as a model reality helps to understand that there is not a singular truth of what data is, but data can be expressed in multiple forms depending on the context and purpose.

It is important to note that this model is not reality; it is a “picture” of reality where multi-dimensions have been condensed and distorted by human effort. This analysis helps to relocate the human in this phenomenon because these models are shaped by humans. Therefore, for data scientists, data is also something to be critical of. This ontology of data reveals the importance of a critical community, favouring error over truth and immersing in the specific domain knowledge. These are all vital components to construct models that are closer to reality.

Humans and data

This analysis of data as a model of reality therefore helps to relocate the human in the phenomenon because humans create these models of reality to provide meaning to the data. In this sense, this ontology where data carries the influence of humans could indicate a convergence of humans and non-humans, indicating a shift from ‘The Great Divides’ prominent in western ontology (Latour, 1991). The influence of humans on data further supports how data is something to be critical of, although whether this critical ontology of data is shared with the wider public is not known and is a topic for further research. Nevertheless, from the ontology of data amongst data scientists, we can learn how reality needs to constrain a model for it to be meaningful. This can help data scientists use data to create models that are closer to reality to provide richer insight to questions about the world.

Future plans

To continue the trajectory of data as models of reality, further plans for this project could be to investigate how these models of reality can affect the structures of society. For example, the relationship between data and gender and the subsequent sexism in digital technologies and data analysis could be further researched, as explored by Caroline Criado Perez in her book ‘Invisible Women’. Therefore, a question to be explored could be: ‘How does data, and digital technologies such as AI and Machine Learning, reinforce dominant structures through technology?’. This could reveal further insight into how data is understood and the relationship between humans and data.

Contact

pricejosie10@gmail.com

 

 

 

Food hazards from around the world Data Competition

We are excited to announce the winner of the 2020 Food hazards from around the world data competition is Robert Eyre with his visualisation project ‘FSA related alert tracker’. 

The Jean Golding Institute recently teamed up with the Food Standards Agency (FSA) for a data visualisation competition 

The competition

Every day the Food Standards Agency receives alerts from around the world about food products that have been found to be hazardous to human health, from salmonella in chicken to undeclared peanuts to plastic in pickles. Sometimes these products make it to our shelves in the UK and have to be recalled or withdrawn. But with so much data on food hazards at our fingertips, we want to be proactive in identifying potential hazards to UK consumers, before anyone buys a hazardous product.  

The FSA made a dataset of food alerts available and we asked for data visualisations that could help to understand how the dataset might alert us to food risks.

The winning project

The winner was Robert Eyre, PhD student, Department of Engineering Mathematics with his visualisation FSA related alert tracker.

The visualisation is a dashboard that allows the FSA to identify threats that are related. Once an article about a threat has been chosen, you can see where on the map, and where in time related threats happened.

The idea behind the visualisation is to show the threats that had been reported in the United Kingdom, and that given a threat, it should show the other threats related to it. Once a threat has been selected from the left panel, the right panel will automatically update, showing the data source, a link to the data source and information about the incident, such as when the article was published, and what the incident is about. Then, the map will highlight the source of the threat, and the country that reported the threat.

To then show the related threats, there are a series of buttons under the left panel to decide what is classed as a related event. Once one of these buttons are selected, the map is updated to show the locations of the related threats (and roughly how many threats there are by the size of the new circles). This should show the FSA where specific threats are most common when related to the United Kingdom. Additionally, a time series is shown for the related events highlighted. Here the FSA could identify any peaks or dips, that they could then investigate further for events that may have happened.

 

Image from Visualisation

 

The winner received £1000 in prize money

The runners up

Two runners-up each receiving £250 are Marina Vabistsevits & Oliver Lloyd and Angharad Stell.

Marina and Oliver received runner up for their visualisation, ‘Too much tooty in the fruity: Keeping food safe in a post-Brexit BritainA brief exploration of the UK’s reliance on the EU for food safety, and the related challenges that Brexit may bring.  

Angharad received runner up for the visualisation From a data space to knowledge discovery An interactive plotting app that allows exploration and visualisation of the dataset. 

The Jean Golding Institute data competitions 

We run a number of competitions throughout the year – to find out more take a look at Data competitions. 

Storing your data in a spreadsheet

 

Photo via Unsplash by Glenn Carstens-Peters

Blog written by Jonty Rougier, Lisa Müller, Soraya Safazadeh, Centre for Thriving Places (the new name for Happy City)

What makes a good spreadsheet layout?

We were recently trying to extract some data from the All tab of the ONS spreadsheet Work Geography Table 7.12 Gender pay gap 2018

This gave us the opportunity to reflect on what makes a good spreadsheet layout, if you want to make your data easily available to others. The key thing to remember is that the data will be extracted by software using simple and standardised rules, either from the spreadsheet itself, or from a CSV file saved from the spreadsheet. Unless you recognise this, much of your well-intentioned structure and ‘cool stuff’ will actively impede extraction. Here are some tips for a good spreadsheet:

Names

Each of your data columns is a ‘variable’, and starts with a name, giving a row of variable names in your spreadsheet. Don’t use long names, especially phrases, because someone is going to have to type these later. Try to use a simple descriptor, avoiding spaces or commas; if you need a space or some other punctuation, use an underscore instead (see below). You can put detailed information about the variable in a separate tab. This detailed information might include a definition, units, and allowable values.

In our example spreadsheet we have

Current column name Our description Better column name
Description Region names Region_name
Code Region identifiers Region_ID
Gender pay gap median Numerical values GPG_median
Gender pay gap mean Numerical values GPG_mean

There is a mild convention in Statistics to use a capital letter to start a variable name, and then small letters for the levels, if they are not numerical. For example, the variable ‘Sex’ might have levels ‘male’, ‘female’, ‘pref_not’, and ‘NA’, where ‘pref_not’ is ‘prefer not to say’, and NA is ‘not available’.

  1. Use an IDENTICAL name for the same variable if it appears in two or more tabs. It’s amazing how often this is violated: identical means identical, so ‘Region_Name’, ‘region_Name’, and ‘region_name’ are NOT the same as ‘Region_name’.
  2. There are two different conventions for compound variable names, like ‘Region name’. One is to replace spaces with underscores, to give ‘Region_name’. The other is to remove spaces and use capitals at the start of each word, to give ‘RegionName’, known as camel case. Both are fine, but it is better not to mix them: this can cause some old-skool programmers to become enraged.

Settle on a small set of consistently-used codes for common levels

NA for ‘not available’ is very common; in a spreadsheet, you can expect a blank cell to be read as NA. ‘Prefer not to say’ comes up regularly, so settle on something specific, like ‘pref_not’, to be used for all variables. The same is true for ‘not sure’ (eg ‘not_sure’).

At all costs, avoid coding an exception as an illegal or unlikely value, like 9, 99, 999, 0, -1, -99, -999; we have seen all of these, and others besides (from the same client!). If you want to use more exceptions than just NA in a variable with numerical values, then use NA for all exceptions in the values column, and add a second column with labels for the exceptions.

In our example spreadsheet, if you look hard enough you will see some ‘x’ in the numerical values columns. We initially guessed these mean ‘NA’, but in fact they do not! In the key, ‘x = Estimates are considered unreliable for practical purposes or are unavailable’. But surely ‘unreliable’ and ‘unavailable’ are two different things? Ideally only the second of these would be NA in the GPG_median numerical column. A new GPG_median_exception column would be mostly blank, except for ‘unreliable’ where required to qualify a numerical value.

Generally, we prefer a single column of exceptions, possibly with several levels. In another application the exception codes included ‘unreliable’, ‘digitised’, ‘estimate’, and ‘interpolated’.

Put all meta-data ABOVE the rows which store the data

This is because extraction software will have a ‘skip = n’ argument, to skip the first n rows. So everything which is not data should go up here, to be skipped.

  1. DO NOT use the columns to the right of your data: the extraction software will not understand, and try to extract them as additional columns.
  2. DO NOT use the columns underneath your data, for the same reason. Your variables will be contaminated, usually with character values which stop the columns being interpreted by the extraction software as numerical values.

In our example spreadsheet, there is a ‘Key to Quality’ to the right of the columns. Clearly the author of this spreadsheet was trying to be helpful, but this information is already in the Notes tab, and the result is distinctly unhelpful.

In our example spreadsheet we also have three rows of footnotes immediately underneath the data. The correct place for these is in the Notes tab, or above the data.

Do not embed information in the formatting of the cells

This is an unusual one, but our example spreadsheet has done exactly that. Instead of an additional column Quality, the author has decided to give each numerical value cell one of four colours, from white (good quality) to deep red (either unreliable or unavailable). This is useful information but it is totally inaccessible: cell format like colour is not read by extraction software.

Don’t have any blank rows between the row of variable names and the last row of data

This is not crucial because extraction software can be instructed to skip blank rows, but it is better to be safe.

Our example spreadsheet has no blank rows – result!

More information

For more information about Centre for Thriving Places check out their website