Developing an Integrated and Intelligent Algo-trading System

JGI Seed Corn Funding Project Blog 2022-2023: Jin Zheng

Introduction:

The financial trading landscape is constantly evolving, driven by advancements in technology and the need for faster, more efficient decision-making. Traditional algo-trading strategies have become central features of modern financial markets due to their speed, accuracy, and cost-effectiveness. However, these strategies often rely solely on the analysis of present and past quantitative data, neglecting the importance of incorporating qualitative data in the decision-making process. To address this limitation, we aim to develop an integrated and intelligent algo-trading system that combines advanced technology, data integration, and intelligent decision-making.

Data Integration:

In financial trading, individuals collect diverse information from various sources, including market-based data, past performance, financial reports, public opinion, and news. Our integrated system seeks to leverage the power of data integration by combining market-based quantitative data with qualitative data sources. By integrating these different data types, we can gain a more comprehensive understanding of the financial landscape and make informed trading decisions.

Advanced Technology:

The integrated system will harness advanced technologies such as artificial intelligence (AI), cloud computing, and machine learning algorithms. These technologies enable the analysis and processing of vast amounts of data in real-time. AI algorithms can identify patterns, trends, and correlations that may not be immediately apparent to human traders. Cloud computing provides the scalability and computing power necessary to handle large volumes of data and perform complex calculations. By leveraging these advanced technologies, our system can enhance decision-making and improve trading performance.

Intelligent Decision-Making:

The core objective of our system is to enable intelligent decision-making in the trading process. While traditional algo-trading strategies focus on quantitative analysis, our integrated approach incorporates qualitative data, allowing traders to better assess potential risks and identify market trends. By factoring in qualitative data, traders can make more informed decisions and adjust their strategies accordingly. Intelligent decision-making is achieved through the application of AI and machine learning algorithms, which can analyze vast amounts of data and provide valuable insights to traders.

With support from the Jean Golding Institute, we successfully ran a hybrid workshop on Machine Learning and data science in finance. The workshop aimed to bring together experts and enthusiasts in the field to exchange knowledge, share insights, and explore the intersection of machine learning and finance. We were fortunate to have a line-up of esteemed speakers, including four external experts one internal expert who are renowned in their respective areas of expertise. Their diverse backgrounds and experiences enriched the workshop and provided valuable perspectives on the application of machine learning in finance.

We have successfully finished the development of a robust data pipeline and have created a unified API that efficiently retrieves data from various sources. We have implemented effective data cleaning techniques and implemented measures to filter out spam. Additionally, we have utilized Graph Neural Networks (GNN) to determine the influence rate of each account and calculate the daily sentiment rate for several stocks with significant market capitalization. Furthermore, we have incorporated predictive models into our system.

Moving forward, our next objective is to create a cloud-based web service that empowers users to build their own trading robots, develop unique trading strategies, and design customized trading algorithms. To enhance the user experience, we will incorporate advanced data visualization techniques, allowing traders to effortlessly interpret and analyze the vast array of information available. Moreover, we aim to enhance the system’s capabilities by integrating machine learning algorithms for improved decision-making and risk management. Our ultimate goal is to create a user-friendly and versatile platform that caters to the needs of researchers, individual traders, and students alike. Through this platform, users will be able to gain practical experience, enhance their financial knowledge, and utilize cutting-edge technologies in the field of algorithmic trading.

Seeking ground truth in data: prevalence of pain and its impacts on well-being and workplace productivity

JGI Seed Corn Funding Project Blog 2022-2023: Neo Poon

Chronic pain is a major health issue across the globe. Researchers estimated that at least 10% of the population in the United Kingdom are suffering from pain conditions. If we consider the entire world, some estimated that over 20% of the population have chronic pain and that results in more than 20 million ‘pain days’ per year. Naturally, it is important to examine how pain conditions affect people’s well-being and their productivity in the workplace.

Our research team (Digital Footprints Lab at the Bristol Medical School, led by Dr Anya Skatova) specialises in using Big Data to investigate human behaviours and social issues. In our previous works, we have already established a link between the purchase of pain medicines and the proportion of people working part-time across geographical regions of the United Kingdom, which suggests an economic cost of chronic pain and an impact on national productivity.

With the funds provided by the Jean Golding Institute (JGI), we decided to directly investigate the ‘ground truth’. That is, instead of examining pain at geographical levels, we designed a survey to ask individuals about their pain conditions, well-being, physical health states, and employment status. Importantly, and relevant to JGI’s focus on data science, the survey also asks individuals to share their shopping history data with us. With the General Data Protection Regulation (GDPR) in place, residents in the United Kingdom have the right to data portability, which means people can choose to share their data held by companies to external organisations, such as a university or a research team. In our design, participants are asked to donate their loyalty card data related to their shopping at a major supermarket with us. This study allows us to ask important questions, such as how the frequency and types of pain relief purchases are related to different types of pain conditions reported by participants. We further ask questions including how pain conditions affect people’s life satisfaction and their ability to work, which might collectively have an impact on their shopping patterns beyond just the purchases of pain relief products.

The JGI funds facilitates the data collection process, which is being finalised at the moment of writing. Moving forward, this study will allow us to define chronic pain with shopping patterns alone, which can drive future research: by connecting the frequency and types of pain medicines with self-reported pain conditions from this study, we can find a way to define a metric and more accurately compute the prevalence of chronic pain from transaction data itself. Our research team has ongoing partnerships with other supermarket and pharmacy chains, which provide us access to commercial data for research purposes. When we conduct similar research using these external data and when it is not possible to directly involve participants with surveys, we can then employ our metric and estimate the proportion of people suffering from chronic pain. Furthermore, our study also includes questions about menstrual pain, which is an important but seldom studied aspect of pain experience, which opens up further avenue for research. Potentially we can examine how menstrual pain impacts the quality of life and people’s workplace productivity. Finally, our study also controls for Covid-19 history, which might have a long-term effect on pain conditions and subjective well-being, paving the way for research studying the longitudinal effect of Covid-19.

Large-sample evaluation of deep learning and optical flow algorithms in nowcasting radar rainfall-flooding events

JGI Seed Corn Funding Project Blog 2022-2023: Jiao Wang and Ahmed Mohamed

1. Aim

This seed corn project aimed to identify a large sample of rainfall events that result in flooding in Great Britain and to evaluate deep learning and optical flow algorithms for radar rainfall nowcasting at the event and catchment scales.

2. Data collection

During the project, we collected hourly time series for observed rainfall and flow data over ten years across 458 catchments in Great Britain (Figure 1). Moreover, we classified these catchments based on various criteria, such as area, location, land cover types, and human activities. Meanwhile, the UK Met Office’s radar rainfall mosaic data with high resolutions in space (1 km) and in time (5 minutes), generated by the UK Met Office Nimrod system, were also collected covering Great Britain.

Figure 1. Location map of 458 catchments in Great Britain.
Figure 1. Location map of 458 catchments in Great Britain.

3. Rainfall-flooding events identification

We applied a recently developed and novel objective methodology called the DMCA-ESR method to separate rainfall-flow events for each catchment. This process yields a total of 18,360 events, encompassing a wide range of magnitudes and durations. The threshold of peak flow for each catchment was set based on flooding information derived from local government reports and previous studies. We also removed overlapping events based on predefined criteria for event occurrence and termination. Consequently, 442 rainfall events that contributed to flooding were identified. Radar data were then extracted specifically for each event based on event start and end times.

4. Deep learning and optical flow algorithms

We employed the UNet, a convolutional neural network (CNN), for rainfall nowcasting. The model was trained, evaluated, and validated using the radar data. The two-year radar data from 2018 to 2019 were split into 80% for training and 20% for evaluation. The UNet model takes the previous 12 radar rainfall frames as input to forecast the subsequent 12 frames.

Additionally, we used three optical flow methods in rainfall nowcasting, namely, SparseSD, Dense, and DenseRotation. These methods utilize the concept of motion estimation to predict the movement of rain patterns in radar images. The Eulerian persistence, which assumes that the current rainfall distribution will remain unchanged in the near future, was used as a standard baseline.

5. Evaluate rainfall nowcasting Performance

We evaluated the performance of the deep learning model and optical flow algorithms for nowcasting all 442 events in Great Britain. The accuracy of the nowcasts was assessed using two metrics: Mean Absolute Error (MAE) and the critical success index (CSI). Figure 2 illustrates the average metric results for a specific rainfall event.

Figure 2. Verification results of five rainfall nowcasting models in terms of two indicators, MAE and CSI (at a rain intensity threshold of 10 mm/h) for a rainfall event in Great Britain, which occurred from 23/05/2013 8:00 PM to 30/05/2013 5:00 AM.
Figure 2. Verification results of five rainfall nowcasting models in terms of two indicators, MAE and CSI (at a rain intensity threshold of 10 mm/h) for a rainfall event in Great Britain, which occurred from 23/05/2013 8:00 PM to 30/05/2013 5:00 AM.

Based on Figure 2, we observe a general decline in the performance of all models as the lead time increases. The Eulerian Persistence baseline exhibits the lowest performance. Regarding the MAE, the UNet initially shows lower performance compared to the optical flow-based algorithms at the early lead times (t+5, t+10, and t+15). However, as the lead time progresses, the UNet’s advantage becomes more prominent, and it outperforms the other models at longer lead times (after t+25). SparseSD, Dense, and DenseRotation demonstrate relatively similar performance. In terms of the CSI values at a rainfall intensity threshold of 10 mm/hr, the UNet exhibits superior performance compared to the other models, except at lead times of 20 and 25, where DenseRotation slightly outperforms it. Among the optical flow-based models, DenseRotation demonstrates the best overall performance.

6. Next steps

For our upcoming steps, we have outlined the following objectives:

  1. Evaluate the five rainfall nowcasting models at the catchment scale.
  2. Compare the performance of the algorithms using information theory criteria.
  3. Provide a comprehensive summary that highlights any patterns or relationships between the catchment characteristics and the nowcasting model performance.
  4. Utilize the rainfall nowcasts for hydrological modelling and evaluation.