A night at the data science café

It’s a Thursday in the Greenbank, a bohemian pub in Easton, Bristol. Upstairs, people have gathered to discuss ‘the rise of data science’ at one of the monthly Science Cafés run by the local branch of the British Science Association – places where, after a short talk from a scientific expert, the floor is given to attendees to discuss the issues arising when science meets society. Today, the guest is Dr Bobby Stuijfzand, data scientist at the Jean Golding Institute, and issues arising range from the ownership of health records to the existence of free will.

How data changed the last decade

Bobby begins by explaining his role: working with people who need things done with their data, and handling other people’s problems. This broadens out neatly into a working definition of data science – ‘methods of learning from data and finding patterns’. But why is it relevant now?

He shows a photo of a 2017 concert and asks us to reflect on the changes since 2007. Diverse answers reveal some of the different motivations of audience members for coming: some comment that you have to sign in to everything these days, others that it would have been unthinkable for a president to be constantly posting on social media. I add that in 2007, “I didn’t need to be on Facebook to get invited to parties”.

The photo prompts Bobby to observe that we previously didn’t have such a sea of screens at events; nor was it weird to leave the house without a smartphone. Fitbits were yet to come. Streaming services such as Netflix have shifted the habits of viewers. In 2007 Bittorrent, the peer-to-peer sharing network used for transferring large files, many of which are pirated media, took up 25% of US internet traffic. Now legal, easy alternatives are available on demand, that’s down to 4%. While the audience responses were obviously shaped in this direction by the topic of the event, it can certainly be argued that many of the changes in the last decade come from data – they are either enabled by it, like the Fitbit, or generate it, like the indexing and recommendations provided by Netflix.

Seeing patterns in data

Bobby used his own research on eye tracking as an illustration of how data science proceeds. To begin with, you gather raw observations, such as where people’s eyes focus and for how long while completing different reading tasks. Then you come up with some ways to describe overall characteristics of this data, such as the average length of time fixating on a single point or the distance that the gaze travels before fixating again. Computer scientists might call these features, while statisticians and psychologists think of these as variables. By comparing these variables for different large groups of raw observations (data sets), you can work out which characteristics are different between groups – perhaps what people are reading or what task they are doing. Fitting mathematical descriptions to these characteristics lets you predict how they will vary if you were to collect more and more data. Finally, with these models, computers are trained to make predictions about which group new data is likely to be in based on these findings.

By taking specific observations, measuring features, looking for group differences, polishing those into mathematical predictions, and applying them with computers, a lot of complicated systems can be built. This was what Bobby handed over to the audience, with a challenge: knowing the basics ideas of data science, and what has changed since 2007, what do you think 2027 will be like? What will the future look like, and how will data science have changed it?

From annoyances to the singularity: the next decade in data

Much discussion followed, ably assisted by the selection of drinks available. Some people foresaw widespread drone surveillance and blackmail, particularly until the law caught up; some thought that supply and demand for shops and retailers would be synced up much more closely, so there would hardly ever be a run on products. Others foresaw that Artificial Intelligence (AI) would take over a lot of menial tasks.

In more extreme cases, one participant was anticipating the Singularity (an event where an AI attains sufficient intelligence that it can make itself better and better, eventually gaining the ability to do almost anything and bend the world to its will – jokingly referred to as ‘the nerd rapture’) with ‘terror and excitement’, depending on which corporation makes it happen. With the ability to model everything, someone else asked, was there any free will left?

After a brief segue into the non-deterministic nature of the quantum world and the limitations of theoretical sets of mathematical axioms, things got a little more macroscopic and tangible.

“I hope that things will become less annoying”, said a man in the middle of the room. He went on: “Adverts track what you’re looking at online – but if you’ve just bought a sofa, why would they show you an advert for a sofa? It’s in its infancy. As we learn to better use it, it’s going to become more targeted. Things are still being learned and they will become more sensible and more focused and more elegant.”

There were concerns over the resources needed to generate exponentially increasing processing power, including raw materials and rare metals. But again some hope. Another participant anticipated fewer social problems: for example, as large systems figure out why crops fail and have less famine. In the case of psychological issues, he cited work looking at the emotions of people on social networks and scanning for potential terrorists, as reasons to be hopeful.

Not everyone shared this optimism though, as a woman at the front opined:

“They know so much about you. If you do something different to what whoever’s controlling society wants you to do, they can shut you down at source.” Clearly there are some concerns about the power of data and governments.

What do we want our data to do for us?

Bobby noted that with many of these observations, the outcome depends on who is controlling the data. “It’s a technology that can be used for good or ill. We give our data out – what do we get from it?”

The same technology that enables Gmail to show you targeted ads based on the content of your email is what allows them to filter out spam, a direct benefit for users. Recently, Google Deepmind was given access to 1.6million NHS patient records, and five years of historical data, with the aim of predicting acute kidney failure – is this enough of a benefit to justify handing over this data? Actually, the UK Information Commission (ICO) ruled that the hospital did not do enough to protect the privacy of patients.

Google maps, Spotify and Open Data Bristol are all examples of initiatives that offer access to data, but they need to be accessed using an API (Advanced Programming Interface). So there is a way to get back things from your data, but only if you program. So, Bobby asked the room, if we could make a platform to use our own data, what would you want?

A key theme was access to patient records: specifically the ability to access your own information, be informed about what is going on and join it up across services. But while people wanted their information shared to give the best care available, they still wanted their privacy protected. There were also concerns about who would and wouldn’t have the ability to access their own records, the bias created by people opting out of sharing, and the political use that could be made of this.

One contributor raised the issue of legislation and that this would provide some guidance. “The legal frameworks that sit around this don’t really exist and that’s a problem. It doesn’t exist because it happened comparatively quickly, so there aren’t laws that are up to date with current collection and distribution methods. That will address a lot of concerns that people have about misuse of their data and it falling into the wrong hands.”

Bobby spoke from his own experience: “We used to have statisticians at the ONS [Office of National Statistics] who had a background in social sciences. Now our data scientists [the favoured term at the moment] tend to have more of a computer scientist/engineering background and have less of a basis in ethical and legal considerations…”

“Speaking as an engineer, I agree we often lack an ethical framework”, I interrupted.

And the concern about the potential downsides of data didn’t end at other people having information about you, as the idea of finding out something you might not want to know was raised. Sometimes having too much knowledge is a problem.

As the event wound to a close, the night was rounded off by the very British concern that:

“If all the data on ancestry was available there would be no plots for Midsomer Murders.”

A terrifying future indeed.

There were many questions about data science, not limited to the topic of data processing but also spilling out into the many areas it touches, such as legislation, ethics, ease of access of records, what is possible with internet-enabled technology, and the role of government and corporations. Discussions about such topics are therefore going to be sprawling – it seems there’s plenty of material for follow up conversations on all these aspects and more.

_________________

Science Cafés are run regularly by the Bristol and Bath branch of the British Science Association, and we are grateful to Alina Udall and Bob Foster for their work organising. More information on the concept, upcoming science cafes and other events run by the organisation can be found here: https://bristolbathsci.org.uk/

The Jean Golding Institute organises public and research events to support research collaborations, visit our events page for more information and follow us on twitter @JGIBristol

Blog post by Kate Oliver, PhD student at the University of Bristol and freelance science writer.