Building capacity for big data management for Ghana’s developing economy

A Science and Technology Facilities Impact Award (STFC IAA) won by a team from the Physics department in collaboration with the start-up iDAM and facilitated by the JGI will provide hardware and software facilities for high volume data storage and archiving, processing, visualisation, algorithm development and testing for research in academia and industry in Ghana, contributing to the development of data science and digital innovation capability in the country.

Emmanuel Bempong-Manful, Henning Flaecher, Johannes Allotey, Kate Robson Brown

The team (Dr Henning Flaecher, Prof Kate Robson Brown and Prof Mark Birkinshaw) will develop a collaboration with local partners, the Ghana Radio Astronomy Observatory (GRAO), the Development in Africa with Radio Astronomy (DARA) project, and iDAM, a local start-up founded by two Bristol PhD students, Johannes Allotey and Emmanuel Bempong-Manful.

The Government of Ghana is embarking on the digitisation of several areas of the economy, including the passport office, ports and harbours and the energy sector, with the aim to improve services and revenue collection. These developing digital services together with those still to be implemented (e.g., in the digitisation of national IDs, health records, birth and death registry) will produce an enormous volume of sensitive data that requires efficient storage and management. However, despite the looming data volumes and recent advancement in statistical and machine learning techniques for inference and predictive analysis, these techniques are still under-utilised in Ghana.

As the economy grows and evolves through digitisation, and as data volumes increase, these data science solutions will become increasingly useful for quick, efficient and reliable extraction and evaluation of information from the datasets and to support evidence-based predictions.  As a result, there is an urgent need to develop facilities and a skills-based workforce in data analytics that will be capable of manipulating big datasets to make meaningful contributions to the Ghanaian economy. However, these goals can only be achieved if modern computing infrastructure/ hardware and software solutions are available.

This STFC funded project will lay the foundation of a modern computing facility which will be hosted by GRAO and iDAM to provide the technical support and capacity building activities. iDAM has long-term plans to establish a one-stop data management hub to tackle data challenges in Ghana and is currently working with the Ghana Space Science and Technology Institute (GSSTI) and the DARA project to deliver data curation services.

Kakum Park, Nkrumah Museum, GRAO Observatory

This project will address a major societal challenge in the area of big data management in Ghana and aims to contribute to the Sustainable Development Goals (SDGs) through skills development programmes in data management and data science boosting new careers and economic growth and delivering quality data management services to the people of Ghana. The project will share regular updates via the JGI blog. If you would like to know more about this project, and would like to collaborate, please contact us via

Can machines understand emotion? Curiosity Challenge winners announced

Photo courtesy of Alex Smye-Rumsby

We are pleased to announce the winners of the Curiosity Challenge are Oliver Davis and his team here at the University of Bristol: Zoe Reed, Nina Di Cara, Chris Moreno-Stokoe, Helena Davies, Valerio Maggio, Alastair Tanner and Benjamin Woolf.

The team will be collaborating with We The Curious on a prototype, which is due to be ready in October and will be going live to audiences when the new exhibition at We The Curious opens next year. Oliver Davis is an Associate Professor and Turing Fellow at Bristol Medical School and the Medical Research Council Integrative Epidemiology Unit (MRC IEU), where he leads a research team in social and genetic data science. Together his team and We The Curious will develop a ‘Curiosity Toolkit’ for a public audience called ‘Can machines understand emotion?’ The team’s toolkit will invite audiences to:

  • Share the idea that humans can produce data that help to classify emotions
  • Recognise that humans produce lots of data every day that expresses how they feel, and that researchers can use these data to teach machines to interpret those feelings
  • Experience a live example of how the type of data they produce can contribute to an Artificial Intelligence (AI) solution to a problem
  • Understand why researchers need computers to help them to analyse huge volumes of data
  • Contribute to and influence current research being undertaken by the team
  • Appreciate how these data can be used on a large scale to understand population health.

Oliver Davis says “Our toolkit will guide participants through the process of teaching machines how to recognise human emotion, using a series of five activities. This is directly relevant to our current research using social media data in population cohorts to better understand both mental health difficulties and positive emotions such as happiness and gratitude.”

Helen Della Nave at We The Curious said “The enthusiastic response from researchers to work with our public audiences was fantastic. Working with Oliver’s team will give audiences the opportunity to influence development of a new database of emotions and support future research. We are very excited about our audiences having the opportunity to get actively involved in this project.”

Through this competition, We The Curious have also offered Rox Middleton, a Post-doctoral Research Fellow at the University of Bristol a research residency for her project ‘How to use light’.

For further details of the competition requirements and background, see Curiosity Challenge.

The Jean Golding Institute data competitions

We run a number of competitions throughout the year – to find out more take a look at Data competitions.

Cog X 2019: The Festival of AI and Emerging Technology – 10-12 June 2019

Blog written by Patty Holley, Jean Golding Institute Manager.

Mayor of London Sadiq Khan

CogX 2019 took place during the first half of London Tech Week in the vibrant Knowledge Quarter of Kings Cross in London. The conference started only 3 years ago, yet this year hosted 15,000 attendees and 500 speakers making it the largest in Europe. CogX 2019 was also supporting 2030 Vision in their ambitions to deliver the Sustainable Global Goals. Mayor of London Sadiq Khan opened the conference with a call for companies to be more inclusive by opening up opportunities for women and the BAME communities, helping London and other cities to find solutions for societal problems.

Here are some highlights:

The State of Today – Professor Stuart Russell

Professor Stuart Russell, University California, Berkeley

The first keynote delivery was from Professor Stuart Russell from University of California, Berkeley where he described the global status of data science and AI. There has been a major investment across the world in the development of these technologies and academic interest has also increased over time. For example, there has been a significant increase from 2010 to 2015 in rate recognition in ImageNet, a dataset of labeled images taking from the web. Learning algorithms are improving constantly but there is a long way to go to reach human cognition. Professor Russell had a cautionary message particularly in autonomous technology as the predicted progress may not be achieved as expected.

Professor Russell also suggested that probabilistic programming and mathematical theory of uncertainty can really make an impact. As an example, he talked about the global seismic monitoring for the comprehensive nuclear test-ban treaty. Evidence data is compared with the model daily, and the algorithm detected the North Korean test in 2013.

What is coming… Robots, personal assistants, web-scale information and extraction and question answering, global vision system via satellite imagery. However, Professor Russell believes that human level AI has a long way to go. Major problems, like the capability of real understanding of language, integration of learning with knowledge, long range thinking at multiple levels of abstraction, cumulative discovery of concepts and theories, all haven’t as yet been resolved.

Finally, Professor Russell added that data science and AI will drive an enormous increase in the capabilities of civilization, however, there are a number of risks, including democracy failures, war and attacks on humanity, so regulation and governance are key.

Gender and AI

Clemi Collett and Sarah Dillon, University of Cambridge

The talks took place in several venues across Kings’ Cross, and on the ‘Ethics’ stage, Sarah Dillon and Clemi Collett from University of Cambridge highlighted the problems with dataset bias. The issue of algorithm bias has been highlighted previously, but not the bias that may come from the actual data. Guidelines are not content or context specific. They suggested that gender specific guidance is needed, guidance on data collection and data handling, theoretical definition of fairness based on current and historic research that will take into account societal drivers, for example to investigate why some parts of society don’t contribute to data collection.

Importantly, the speakers also talked about the diversity of the workforce working in these technologies. Currently, only 17% are female, which really impacts on the technology design and development. Diversification of workforce is vital as it brings discussion within teams and companies. If this issue is not challenged, then existing inequalities will be aggravated. The speakers reiterated the need to investigate the psychological factors that affect diversity in this labour market through qualitative and quantitative research. A panel followed the talk, which included Carly Kind, Director of Ada Lovelace Institute, Gina Neff, University of Oxford, Kanta Dihal, Centre for the Future of Intelligence, University of Cambridge. Carly Kind pointed out that diversity (or lack of) will shape what technologies are being developed and used. Gina Neff highlighted the point that most jobs at risk of automation are those associated with women, and therefore gender equality in the workforce generating new tech is a necessity. One important area that should be encouraged is the interdisciplinary exchanges between gender theorists and AI practitioners and to develop novel incentives for women to encourage involvement in tech. Women need to be part of the decision making process, and support those that can become role models, building profiles that will inspire women.

The Future of Human Machine Interaction

Mark Sagar, Soul Machines

The ‘Cutting Edge’ stage hosted those working on the future of some of the cutting edge technologies. On Human Machine interaction, the conference invited three companies to talk about their current work and future ideas. Mark Sagar from Soul Machines, previously worked on technology to bring digital characters to life in movies like Avatar. Mark talked about the need of the mind to be part of a body and suggested that the mind needs an entire body to learn and interact. To develop better cooperation with new technologies, humans will need a better face-to-face interaction as human reactions are created by a feedback loop using verbal and non-verbal communication and thus, Soul Machines aims to build digital brains in digital bodies. The model learns through lived experiences, learning in real time. Mark demonstrated one example of a new type of avatar, a toddler avatar to demonstrate how digital humans are able to learn new skills. This technology aims to create digital systems and personal assistants that will interact with humans and learn from those interactions.

Sarah Jarvis, and engineer from explained how their platform uses AI to enable complex decision making using probabilistic models. Probability theory is at the core of the technology that is currently being used in finance, logistics and transport, smart cities, energy markets and autonomous systems.

Eric Daimler, CEO, Spinglass Ventures observed that there is a constraint on data availability and quality rather than data science technology. A large problem is lack of large verifiable datasets. This challenge will increase due to concerns about privacy and security. For example, social media has moved to request more regulation. There are limitations on data integration, a gap in theory and practice. Finally Eric suggested that the future brings a new era category theory that could replace calculus.

Edge Computing

Next on the ‘Cutting Edge’ stage we had speakers providing views on Edge computing. Ingmar Bosner Professor of applied AI at the Oxford Robotic Institute, talked about combining edge computing (moving computation closer to where it is being used) and deep learning. Ingmar is interested in challenges such as machine introspection in perception and decision making, data efficient learning from demonstration, transfer learning and the learning of complex tasks via a set of less complex ones. Ingmar explained how these technologies may be effectively combined in driverless technologies. Using a very simplistic method, a sat nav app could integrate with the training data to control driverless cars. In addition, the system uses simulated data to train the models and can translate into better responses in real world scenarios. Joe Baguley from MVware, providers of networking solutions, describe the current idea of taking existing technologies and putting them together to solve novel challenges, i.e. driverless cars. Automation is no longer an optimization but a design requirement and new developments in technologies mean that AI and ML can be used to manage the use of applications across platforms and networks. AI can also be used to optimise those models making them more energy efficient, for example, by making sure that only the necessary data is kept and not data that can be considered wasteful.

How Technology is Changing our Healthcare

Mary Lou Jepson  from Open Water,  a startup working on fMRI-type imaging of the body using holographic, infrared techniques, described how her discovery will offer affordable imaging technology. Samsung’s Cuong Do, who directs the Global Strategy Group described their work developing a 24/7 AI care advisor. The aim of the technology is to support medical efforts and provide an efficient alternative that can alleviate the healthcare system. A game changer use of AI will open the possibility of using biomedical data to personalize and improve the efficacy of medicine. Joanna Shields from BenevolentAI is applying technologies to transform the way medicines are discovered, developed, tested and brought to market.  Meanwhile, Sunil Wadhwani, the CEO of the multimillion dollar company IGATE Corp, is helping not-for-profit organizations to scale technologies in healthcare, leading the innovation in primary service health providers in India, applying AI to benefit those that need it the most. The panel discussed how there is an increased gap between life span and health span, with financial position being the main driver for this gap.  Technology may be able to help close the gap and help train the next generation of health practitioners, optimizing drug creation and delivery and developing cost-effective healthcare for the poorest in society. In the era of data, this can provide an advantage, as personalised data does not only include DNA but where people live, dietary information and environmental data, which brings new opportunities to develop solutions for chronic conditions. Johanna, added that “the healthcare of humans is the best and most complicated machine learning and AI challenge”.

Research Frontiers

The Alan Turing Institute hosted a stage at Cog X this year, more information about speakers and content is available on the Turing website.

Back again at the ‘Cutting Edge’ stage Robert Stojnic recommended a curated site to check the state of the art developments in ML, Papers with Code

Jane Wang’s presentation

Jane Wang, from DeepMind, explained why causality is important for real world scenarios. Jane talked about how reasoning develops in humans, when does it show up?  4 to 5 year olds can make informative interventions based on causal knowledge, sometimes better than adults, as adults have prior knowledge (bias). Jane discussed the possibility of meta learning (”learning to learn”) by learning these priors as well as task specifics. This approach may enable AI to learn causality.

The next speaker was Peader Coyle from Aflorithmic Labs, who is contributing to the online course on probabilistic programming. He talked about the modern Bayesian workflow, and suggested that lots of problems are ‘small’ or ‘heterogeneous’ data problems when traditional ML methods may not work. He is part of the community supporting the development of Probabilistic programming in Python.

Ethics of AI

Joanna Bryson, University of Bath

Increasingly there has been a worrying trend to use data science technologies to perpetuate discrimination, increase power imbalance, and support cyber corruption and a key aspect of the conference was the commitment to incorporate ethical considerations to technology development. On the ‘Ethics stage’ Professor Joanna Bryson from Bath University and one of the leading figures of the Ethics of AI, talked about the advances in the field. Recently, the OECD has published their principles on AI,  to promote artificial intelligence (AI) that is innovative and trustworthy and that respects human rights and democratic values. There is a pressing need for ethics in sustainable AI, for example by looking at bias in data collections process, not just the algorithms. One way to achieve this is by changing incentives, for example, Github can grant stars to those projects that integrate ethics very clearly in their pipeline. Most of the research in the field has been done in silo, sometimes, without addressing the impact, ethical guidelines recommend to closely link research and impact. One very important aspect of this topic is the issue of diversity, as people’s background will affect the outputs in the field. Another important aspect of fairness in this area has been the drive to support open source software However, the community now has a challenge to develop strategies for sustainability.

Data Trusts

Exploring Data Trust

A significantly different approach to data rights, was addressed in the discussion ‘Data Trusts’ by Neil Lawrence, Chair in Neuro and Computer Science, University of Sheffield and Sylvie Delacroix, Professor of Law and Ethics, University of Birmingham and Turing Fellow. With GDPR, we as data providers have rights, but it’s not easy to organise who has our data and what they use it for – we often click ”yes” just to access a website. The speakers suggested the need of new type of entity that operates as a Trust. With this mechanism, data owners choose to entrust their data to data trustees who are compelled to manage the data according the aspirations of the Trusts’ members. As every individual is different, society needs an ecosystem of trusts, which people could choose and move between. This could provide meaningful choices to data providers, ensuring that everybody has to make a choice regarding how to use their data (e.g. economic value, societal value), and contributing to a growing the debate around data ownership.

It was a fascinating couple of days at CogX listening about the great advances in technology. A key message was that these developments need to be guided by the critical need for equality and the environmental challenges we face. Listening to the co-founder of Extinction Rebellion Gail Bradbrook was really an inspiration to continuously strive to use data science and AI for social good.

More information is available in their video channel.

Attendance at Cog X was funded by the Alan Turing Institute.


A night at the data science café

It’s a Thursday in the Greenbank, a bohemian pub in Easton, Bristol. Upstairs, people have gathered to discuss ‘the rise of data science’ at one of the monthly Science Cafés run by the local branch of the British Science Association – places where, after a short talk from a scientific expert, the floor is given to attendees to discuss the issues arising when science meets society. Today, the guest is Dr Bobby Stuijfzand, data scientist at the Jean Golding Institute, and issues arising range from the ownership of health records to the existence of free will.

How data changed the last decade

Bobby begins by explaining his role: working with people who need things done with their data, and handling other people’s problems. This broadens out neatly into a working definition of data science – ‘methods of learning from data and finding patterns’. But why is it relevant now?

He shows a photo of a 2017 concert and asks us to reflect on the changes since 2007. Diverse answers reveal some of the different motivations of audience members for coming: some comment that you have to sign in to everything these days, others that it would have been unthinkable for a president to be constantly posting on social media. I add that in 2007, “I didn’t need to be on Facebook to get invited to parties”.

The photo prompts Bobby to observe that we previously didn’t have such a sea of screens at events; nor was it weird to leave the house without a smartphone. Fitbits were yet to come. Streaming services such as Netflix have shifted the habits of viewers. In 2007 Bittorrent, the peer-to-peer sharing network used for transferring large files, many of which are pirated media, took up 25% of US internet traffic. Now legal, easy alternatives are available on demand, that’s down to 4%. While the audience responses were obviously shaped in this direction by the topic of the event, it can certainly be argued that many of the changes in the last decade come from data – they are either enabled by it, like the Fitbit, or generate it, like the indexing and recommendations provided by Netflix.

Seeing patterns in data

Bobby used his own research on eye tracking as an illustration of how data science proceeds. To begin with, you gather raw observations, such as where people’s eyes focus and for how long while completing different reading tasks. Then you come up with some ways to describe overall characteristics of this data, such as the average length of time fixating on a single point or the distance that the gaze travels before fixating again. Computer scientists might call these features, while statisticians and psychologists think of these as variables. By comparing these variables for different large groups of raw observations (data sets), you can work out which characteristics are different between groups – perhaps what people are reading or what task they are doing. Fitting mathematical descriptions to these characteristics lets you predict how they will vary if you were to collect more and more data. Finally, with these models, computers are trained to make predictions about which group new data is likely to be in based on these findings.

By taking specific observations, measuring features, looking for group differences, polishing those into mathematical predictions, and applying them with computers, a lot of complicated systems can be built. This was what Bobby handed over to the audience, with a challenge: knowing the basics ideas of data science, and what has changed since 2007, what do you think 2027 will be like? What will the future look like, and how will data science have changed it?

From annoyances to the singularity: the next decade in data

Much discussion followed, ably assisted by the selection of drinks available. Some people foresaw widespread drone surveillance and blackmail, particularly until the law caught up; some thought that supply and demand for shops and retailers would be synced up much more closely, so there would hardly ever be a run on products. Others foresaw that Artificial Intelligence (AI) would take over a lot of menial tasks.

In more extreme cases, one participant was anticipating the Singularity (an event where an AI attains sufficient intelligence that it can make itself better and better, eventually gaining the ability to do almost anything and bend the world to its will – jokingly referred to as ‘the nerd rapture’) with ‘terror and excitement’, depending on which corporation makes it happen. With the ability to model everything, someone else asked, was there any free will left?

After a brief segue into the non-deterministic nature of the quantum world and the limitations of theoretical sets of mathematical axioms, things got a little more macroscopic and tangible.

“I hope that things will become less annoying”, said a man in the middle of the room. He went on: “Adverts track what you’re looking at online – but if you’ve just bought a sofa, why would they show you an advert for a sofa? It’s in its infancy. As we learn to better use it, it’s going to become more targeted. Things are still being learned and they will become more sensible and more focused and more elegant.”

There were concerns over the resources needed to generate exponentially increasing processing power, including raw materials and rare metals. But again some hope. Another participant anticipated fewer social problems: for example, as large systems figure out why crops fail and have less famine. In the case of psychological issues, he cited work looking at the emotions of people on social networks and scanning for potential terrorists, as reasons to be hopeful.

Not everyone shared this optimism though, as a woman at the front opined:

“They know so much about you. If you do something different to what whoever’s controlling society wants you to do, they can shut you down at source.” Clearly there are some concerns about the power of data and governments.

What do we want our data to do for us?

Bobby noted that with many of these observations, the outcome depends on who is controlling the data. “It’s a technology that can be used for good or ill. We give our data out – what do we get from it?”

The same technology that enables Gmail to show you targeted ads based on the content of your email is what allows them to filter out spam, a direct benefit for users. Recently, Google Deepmind was given access to 1.6million NHS patient records, and five years of historical data, with the aim of predicting acute kidney failure – is this enough of a benefit to justify handing over this data? Actually, the UK Information Commission (ICO) ruled that the hospital did not do enough to protect the privacy of patients.

Google maps, Spotify and Open Data Bristol are all examples of initiatives that offer access to data, but they need to be accessed using an API (Advanced Programming Interface). So there is a way to get back things from your data, but only if you program. So, Bobby asked the room, if we could make a platform to use our own data, what would you want?

A key theme was access to patient records: specifically the ability to access your own information, be informed about what is going on and join it up across services. But while people wanted their information shared to give the best care available, they still wanted their privacy protected. There were also concerns about who would and wouldn’t have the ability to access their own records, the bias created by people opting out of sharing, and the political use that could be made of this.

One contributor raised the issue of legislation and that this would provide some guidance. “The legal frameworks that sit around this don’t really exist and that’s a problem. It doesn’t exist because it happened comparatively quickly, so there aren’t laws that are up to date with current collection and distribution methods. That will address a lot of concerns that people have about misuse of their data and it falling into the wrong hands.”

Bobby spoke from his own experience: “We used to have statisticians at the ONS [Office of National Statistics] who had a background in social sciences. Now our data scientists [the favoured term at the moment] tend to have more of a computer scientist/engineering background and have less of a basis in ethical and legal considerations…”

“Speaking as an engineer, I agree we often lack an ethical framework”, I interrupted.

And the concern about the potential downsides of data didn’t end at other people having information about you, as the idea of finding out something you might not want to know was raised. Sometimes having too much knowledge is a problem.

As the event wound to a close, the night was rounded off by the very British concern that:

“If all the data on ancestry was available there would be no plots for Midsomer Murders.”

A terrifying future indeed.

There were many questions about data science, not limited to the topic of data processing but also spilling out into the many areas it touches, such as legislation, ethics, ease of access of records, what is possible with internet-enabled technology, and the role of government and corporations. Discussions about such topics are therefore going to be sprawling – it seems there’s plenty of material for follow up conversations on all these aspects and more.


Science Cafés are run regularly by the Bristol and Bath branch of the British Science Association, and we are grateful to Alina Udall and Bob Foster for their work organising. More information on the concept, upcoming science cafes and other events run by the organisation can be found here:

The Jean Golding Institute organises public and research events to support research collaborations, visit our events page for more information and follow us on twitter @JGIBristol

Blog post by Kate Oliver, PhD student at the University of Bristol and freelance science writer.