MagMap – Accurate Magnetic Characteristic Mapping Using Machine Learning

PGR JGI Seed Corn Funding Project Blog 2023/24: Binyu Cui

Introduction:

Magnetic components, such as inductors, play a crucial role in nearly all power electronics applications and are typically known to be the least efficient components, significantly affecting overall system performance and efficiency. Despite extensive research and analysis on the characteristics of magnetic components, a satisfactory first-principle model for their characterization remains elusive due to the nonlinear mechanisms and complex factors such as geometries and fabrication methods. My current research focuses on the characterization and modelling of magnetic core loss, which is essential for power electronics design. This research has practical applications in areas such as the fast charging of electric vehicles and the design of electric motors.

Traditional modelling methods have relied on empirical equations, such as the Steinmetz equation and the Jiles-Atherton hysteresis model, which require parameters to be curve-fitted in advance. Although these methods have been refined over generations (e.g., MSE and iGSE), they still face practical limitations. In contrast, data-driven techniques, such as machine learning with neural networks, have demonstrated advantages in addressing multivariable nonlinear regression problems.

Thanks to the funding and support from the JGI Institute, the interdisciplinary project “MagMap” has been initiated. This project encompasses testing platform modifications, database setup, and neural network development, advancing the characterization and modelling of magnetic core loss.

Outcome

Previously, a large-signal automated testing platform is produced to evaluate the magnetic characteristics under various conditions. Fig. 1 shows the layout of the hardware section of the testing platform and Fig. 2 shows the user interface of the software that is currently used for the testing. With the help of JGI, I have managed to update the automated procedure of the platform including the point-to-point testing workflow and the large signal inductance characterizing. This testing platform is crucial for generating the practical database for the further machine learning process as its automated function has largely increased the testing efficiency of each operating point (approx 6-8s per data point).

Labelled electrical components in a automated testing platform
Fig. 1. Layout of the automated testing platform.
Code instructions for the interface of the automated testing platform
Fig. 2. User interface of the automated testing platform.

Utilizing the current database, a Long Short-Term Memory (LSTM) model has been developed to predict core loss directly from the input voltage. The model shows a better performance in deducing the core loss than traditional empirical models such as the improved generalized Steinmetz equation. A screenshot of the code outcome is shown in Fig. 3 and an example result of the model for one material is shown in Figure 4. A feedforward neural network has been tried out as a scalar-to-scalar model to deduce the core loss directly from a series of input scalars including the magnetic

flux density amplitude, frequency and duty cycle. Despite the accuracy of the training process, there are limitations in the input waveform types. Convolutional neural networks have also been tested before using the LSTM as a sequence-to-scalar model. However, the model size is significantly larger than the LSTM with hardly any improvement in accuracy.

Code for the demo outcome of the LSTM
Fig. 3. Demo outcome of the LSTM.
Bar chart showing ratio of data points against relative error code loss (%)
Fig. 4. Model performance against the ratio of validation sets used in the training.

Future Plan:

Although core loss measurement and modelling is a key issue in industrial applications, the reason behind these difficulties is the non-linear relationship between the magnetic flux density and the magnetic field strength which is also known as the permeability of the magnetic material. The permeability of ferromagnetic is very sensitive to a series of external parameters including temperature, induced current, frequency and input waveform types. With an accurate fitting between the relationship of magnetic flux density and field strength, not only

the core loss can be precisely calculated but also the current modelling method that is used in Ansys and COMSOL can be improved.

Acknowledgement:

I would like to extend my gratitude to JGI for funding this research and for their unwavering support throughout the project. I am also deeply thankful to Dr. Jun Wang for his continuous support. Additionally, I would also like to express my appreciation to Mr. Yuming Huo for his invaluable advice and assistance with the neural network coding process.

Unveiling Hidden Musical Semantics: Compositionality in Music Ngram Embeddings 

PGR JGI Seed Corn Funding Project Blog 2023/24: Zhijin Guo 

Introduction

The overall aim of this project is to analyse music scores by machine learning.  These of course are different from sound recordings of music, since they are symbolic representations of what musicians play.  But with encoded versions of these scores (in which the graphical symbols used by musicians are rendered as categorical data) we have the chance to turn these instructions in various sequences of pitches, harmonies, rhythms, and so on. 

What were the aims of the seed corn project? 

CRIM concerns a special genre of works from sixteenth century Europe in which a composer took some pre-existing piece and adapted the various melodies and harmonies in it to create a new but related composition. More specifically, the CRIM Project is concerned with polyphonic music, in which several independent lines are combined in contrapuntal combinations. As in the case of any given style of music, the patterns that composers create follow certain rules:  they write using stereotypical melodic and rhythmic patterns. And they combine these tunes (‘soggetti’, from the Italian word for ‘subject’ or ‘theme’) in stereotypical ways. So, we have the dimensions of melody (line), rhythm (time), and harmony (what we’d get if we slice through the music at each instant. 

A network of musical notations
Figure 1. An illustration of music graph, nodes are music ngrams and edges are different relations between them. Image generated by DALL·E.

We might thus ask the following kinds of questions about music: 

  • Starting from a given composition, what would be its nearest neighbour, based on any given set of patterns we might chose to represent?  A machine would of course not know anything about the composer, genre, or borrowing involved in those pieces, but it would be revealing to compare what a machine might tell us about this such ‘neighbours’ in light of what a human might know about them. 
  • What communities of pieces can we identify in a given corpus?  That is, if we attempt to classify of groups works in some way based on shared features, what kinds of communities emerge?  Are these communities related to Style? Genre? Composer? Borrowing? 
  • In contrast, if we take the various kinds of soggetti (or other basic ‘words’) as our starting point, what can we learn about their context?  What soggetti happen before and after them?  At the same time as them?  What soggetti are most closely related to them? And through this what can we say about the ways each kind of pattern is used? 

Interval as Vectors (Music Ngrams) 

How can we model these soggetti?  Of course they are just sequences of pitches and durations.  But since musicians move these melodies around, it will not work simply to look for strings of pitches (since as listeners we can recognize that G-A-B sounds exactly the same as C-D-E).  What we need to instead is to model these as distances between notes.  Musicians call these ‘intervals’ and you could think of them like musical vectors. They have direction (up/down) and they have some length (X steps along the scale). 

Here is an example of how we can use our CRIM Intervals tools (a Python/Pandas library) to harvest this kind of information from XML encodings of our scores.  There is more to it than this, but the basic points are clear:  the distances in the score are translated into a series of distances in a table.  Each column represents the motions in one voice.  Each row represents successive time intervals in the piece (1.0 = one quarter note). 

An ngram for a section of music
Figure 2. An example of ngram: [-3, 3, 2, -2], interval as vectors. 

Link Prediction 

We are interested in predicting unobserved or missing relations between pairs of ngrams in our musical graph. Given two ngrams (nodes in the graph), the goal is to ascertain the type and likelihood of a potential relationship (edge) between them, be it sequential, vertical, or based on thematic similarity. 

  • Sequential is tuples that come near each other time.  This is Large Language Model which computes ‘context’. LLM then produces the semantic information that is latent in the data. 
  • Vertical is tuples that happen at the same time.  It is ANOTHER kind of context. 
  • Thematic is based on some measure of similarity.   

Upon training, the model’s performance is evaluated on a held-out test set, providing metrics such as precision, recall, and F1-score for each type of relationship. The model achieved a prediction accuracy of 78%. 

Beyond its predictive capabilities, the model also generates embeddings for each ngram. These embeddings, which are high-dimensional vectors encapsulating the essence of each ngram in the context of the entire graph, can serve as invaluable tools for further musical analysis. 

Tracing Voices: A Visual Journey through Latin American Debates about Race  

JGI Seed Corn Funding Project Blog 2023/24: Jo Crow

I’m a historian who is keen to learn how digital tools can strengthen our analysis of the material we find in the archives. I research histories of race, racism and anti-racism in Latin America. I’m particularly interested in how ideas about race travelled across borders in the twentieth century, and how these cross-border conversations impacted on nation-state policies in the region.  

The book I am currently writing investigates four international congresses that took place between the 1920s and 1950s: the First Latin American Communist Conference in Buenos Aires, Argentina (1929); the XXVII International Congress of Americanists in Lima, Peru (1939); the First Inter-American Conference on Social Security in Santiago, Chile (1942); and the Third Inter-American Indigenista Congress, in La Paz, Bolivia (1954). These were very different kinds of international meetings. but they all dedicated a significant amount of time to debating the problem of racial inequality, especially the ongoing marginalisation of indigenous peoples. 

Who was at these congresses? Who spoke to whom, and what conversations did they have? Where did the conversations took place? What did the rooms look like? How were they set up? And what about the spaces outside the formal discussion sessions – the drinks receptions that delegates attended, the archaeological sites and museums they visited, the film screenings and book exhibitions they were invited to, the restaurants they frequented, the hotels they stayed in? Luckily, I have found a great variety of source materials – conference proceedings, newspaper reports, personal and institutional correspondence, memoirs of participating delegates – that help me begin to answer these questions.

Black and white photos from a newsletter of men sat down in a room for the  XXVII International Congress of Americanists in Lima
Photographs of the XXVII International
Congress of Americanists in Lima. Published in
El Comercio newspaper, 11 September 1939.
Black and white photo of three delegates at the III Inter-American Indigenista Congress in La Paz.
Photograph of three delegates at the III Inter-American Indigenista Congress in La Paz. Included in an International Labour Organization report of 1954. 

As part of my JGI seed-corn project, I’ve been able to work with two brilliant researchers: Emma Hazelwood and Roy Youdale. Emma helped me to explore the uses of digital mapping for visualising the “who” and “where” of these congresses, and Roy helped me to experiment with machine-reading. In this blog, I share a few of the things we achieved and learnt.   

Digital Mapping

Emma started by inputting the data I had on the people who attended these congresses – their names, nationalities, where they travelled from – into Excel spreadsheets. She then found the coordinates of their origins using an online resource, and displayed them on a map using a coding language called Python. Below are a few of the results for Lima, 1939. The global map (Map 1) shows very clearly that this was a forum bringing together delegates from North, Central, and South America, and several countries in Europe too. We can zoom in to look more closely at the regional spread of delegates (Map 2), and further still to see what parts of Peru the Peruvian delegates came from (Map 3). For those delegates that were based in Lima – because we have their addresses – we can map precisely where in the city they or their institutions were based (Map 4).

Global map with red dots to show delegate locations and a green dot to highlight Peru
Map 1. The global map shows very clearly that this was a forum bringing together delegates from North, Central, and South America, and several countries in Europe.
Map of South America on the left and a zoomed in version on the right with red dots to show delegate locations and a green dot to highlight Peru
Map 2 (left) shows a zoomed in version of the global map to see the regional spread of delegates. Map 3 (right) shows what parts of Peru the Peruvian delegates came from.
Satellite image of Lima with different colour dots to symbolise different institute locations
Map 4. For delegates in Lima, the satellite image maps where in the city they or their institutions were based. 

In some ways, these visualisations don’t tell me anything I didn’t already know. From the list of conference attendees I compiled, for instance, I already had a sense of the spread of the countries represented in Lima in 1939. What the maps do do, however, is tell the story of the international nature of the conference much more clearly and speedily than a list or table can. With the city map, showing where Lima-based delegates lived and worked, we do learn something new. By plotting the addresses, I can envisage the contours of the space they occupied. I couldn’t do that in my head with just a list of the addresses, especially without knowing road names.   

The digital maps also help with comparative analysis. If we look at the global map (like Map 1) of all four congresses together we get a clear view of their very similar reach; most delegates to all of them were from South America. We are also able to swiftly detect the differences – for example, that the Lima conference attracted more delegates from Europe than the other meetings, or that there were no delegates from Europe at the 1954 congress in La Paz. We can then think about the reasons why.  

Satellite image of Lima with an old map layered on top with different colour dots to symbolise different locations
Map 5. Shows the main venues for the XXVII International Congress of Americanists.

Map 5 above takes us back to Lima. It shows the main venues for the XXVII International Congress of Americanists. It visualizes a circuit for us. I don’t think we can perceive this so clearly from a list of venues, especially if we are not very familiar with the city. Here we can see that most of the conference venues and the hotels where delegates stayed were clustered quite closely together, in Lima’s historic centre. Delegates could easily walk between them. There are a few outliers, though: one of the archaeological sites that delegates visited, the museum that threw a reception for delegates, and a couple of restaurants too. This prompts further questions and encourages us to imagine the delegates moving through the city.  

Machine Reading

As well as digital mapping, I’ve been keen to explore what machine or distant reading can add to our analysis of debates about race in early twentieth century Latin America. It’s widely known, for example, that, in the context of the Second World War, many academic and government institutions rejected the scientific validity of the term race (“raza” in Spanish). A machine reading of the proceedings of these four congresses gives us concrete, empirical evidence of how the word race was, in practice, used less and less from 1929, to 1939, to 1942, to 1954. Text analysis software like Sketch Engine, which Roy introduced me to, also enables us to scrutinise how the term was used when it was used. For instance, in the case of the 1929 conference in Buenos Aires, Sketch Engine processes 300+ pages of conference discussions in milliseconds and shows us in a systematic way which so-called “races” were being talked about, the fact that “race” was articulated as an object and a subject of the verb, and how delegates associated the term race with hostile relations, nationhood, indigenous communities, exploitation, and cultural tradition (see below). In short, it provides a really useful, methodical snapshot of the many different languages of race being spoken in Buenos Aires. It is then up to me to reflect on the significance of the detail, and to go back to specific moments in the text, for example the statement of one delegate about converting the “race factor” into a “revolutionary factor”.  

Results from a text analysis in Sketch Engine
Results from a text analysis in Sketch Engine for the 1929 conference in Buenos Aires. The result shows us in a systematic way which so-called “races” were being talked about.

In all, I’ve learnt how digital tools and methodologies can productively change how we’re able to look at things, in this case “race-talk” and who was speaking it. By looking differently we see differently too. What I’d like to do now is to trace where the conversations went from these congresses, and see how much they shifted and transformed in the process of travel.  


Jo Crow Professor of Latin American Studies , School of Modern Languages 

Using ‘The Cloud’ to enhance UoB laboratory data security, storage, sharing, and management

JGI Seed Corn Funding Project Blog 2023/24: Peter Martin, Chris Jones & Duncan Baldwin

Introduction

As a world-leading research-intensive institution, the University of Bristol houses a multi-million-pound array of cutting-edge analytical equipment of all types, ages, function, and sensitivity – distributed across its Schools, Faculties, Research Centres and Groups, as well as in dozens of individual labs. However, as more and more data are captured – how can it be appropriately managed to comply with the needs of both researchers and funders alike?  

What were the aims of the seed corn project? 

When an instrument is purchased, the associated computing, data storage/resilience, and post-capture analysis is seldom, if ever, considered beyond the standard Data Management Plans. 

Before this project, there existed no centralised or officially endorsed mechanism at UoB supported by IT Services to manage long-term instrument data storage and internal/external access to this resource – with every group, lab, and facility individually managing their own data retention, access, archiving, and security policies. This is not just a UoB challenge, but one that is endemic of the entire research sector. As the value of data is now becoming universally realised, not just in academia, but across society – the challenge is more pressing than ever, with an institution-wide solution to the entire data challenge critically required which would be readily exportable to other universities and research organisations. At its core, this Seed Corn project sought to develop a ‘pipeline’ through which research data could be; (1) securely stored within a unified online environment/data centre into perpetuity, and (2) accessed via an intuitive, streamlined and equally secure online ‘front-end’ – such as Globus, akin to how OneDrive and Google Drive seamlessly facilitate document sharing.   

What was achieved? 

The Interface Analysis Centre (IAC), a University Research Centre in the School of Physics currently operates a large and ever-growing suite of surface and materials science equipment with considerable numbers of both internal (university-wide) and external (industry and commercial) users. Over the past 6-months, working with leading solution architects, network specialists, and security experts at Amazon Web Services (AWS), the IAC/IT Services team have successfully developed a scalable data warehousing system that has been deployed within an autonomous segment of the UoB’s network, such that single-copy data that is currently stored locally (at significant risk) and the need for it to be handled via portable HDD/emailed across the network can be eliminated. In addition to efficiently “getting the data out” from within the UoB network, using native credential management within Microsoft Azure/AWS, the team have developed a web-based front-end akin to Google Drive/OneDrive where specific experimental folders for specific users can be securely shared with these individuals – compliant with industry and InfoSec standards. The proof of the pudding has been the positive feedback received from external users visiting the IAC, all of whom have been able to access their experiment data immediately following the conclusion of their work without the need to copy GB’s or TB’s of data onto external hard-drives!  

Future plans for the project 

The success of the project has not only highlighted how researchers and various strands within UoB IT Services can together develop bespoke systems utilising both internal and external capabilities, but also how even a small amount of Seed Corn funding such as this can deliver the start of something powerful and exciting. Following the delivery of a robust ‘beta’ solution between the Interface Analysis Centre (IAC) labs and AWS servers, it is currently envisaged that the roll-out and expansion of this externally-facing research storage gateway facility will continue with the support of IT Services to other centres and instruments. Resulting from the large amount of commercial and external work performed across the UoB, such a platform will hopefully enable and underpin data management across the University going forwards – adopting a scalable and proven cloud-based approach.  


Contact details and links

Dr Peter Martin & Dr Chris Jones (Physics) peter.martin@bristol.ac.uk and cj0810@bristol.ac.uk 

Dr Duncan Baldwin (IT Services) d.j.baldwin@bristol.ac.uk  

Successful Seedcorn Awardees 2024-2025

The Jean Golding Institute Seedcorn Funding is a fantastic opportunity to develop multi and interdisciplinary ideas while promoting collaboration in data science and AI.  We are delighted that a new cohort of multidisciplinary researchers has been supported through this funding.

Leighan Renaud – Building a Folk Map of St Lucia

Leighan Renaud

Dr. Leighan Renaud is a lecturer in Caribbean Literatures and Cultures in the Department of English. Her research interests include twenty-first century Caribbean fiction, mothering and motherhood in the Caribbean, folk and oral traditions in the Anglophone Caribbean, and creative practices of neo-archiving. 

Louise AC Millard – Using digital health data for tracking menstrual cycles

Dr. Louise Millard is a Senior Lecturer in Health Data Science in the MRC Integrative Epidemiology Unit (IEU) at the University of Bristol. Following an undergraduate Computer Science degree and MSc in Machine Learning and Data Mining, they completed an interdisciplinary PhD at the interface of Computer Science and Epidemiology. Their research interests lie in the development and application of computational methods for population health research, including using digital health and phenotypic data, and statistical and machine learning approaches. 

Photo of Louise AC MIllard on the right

Laura Fryer – Visualisation tool for Enhancing Public Engagement Using Supermarket Loyalty Card Data

Photo of Laura Fryer on the left

Laura is a senior research associate in the Digital Footprints Lab based within the Bristol Medical School. Their aim is to use novel data to unlock insights into behavioural science for the purposes of public good. Laura is particularly passionate about broadening the public’s understanding of digital footprint data (e.g. from loyalty cards, bank transactions or wearable technology such as a smart watch) and demonstrating how vital it can be in developing our understanding of population health within the UK and beyond.  Laura’s project is focused on developing a data-visualisation tool that will support public engagement activities and provide a tangible representation of the types of data that we use – building further trust between the public and scientific researchers.  

Nicola A Wiseman – Cellular to Global Assessment of Phytoplankton Stoichiometry (C-GAPS)

Dr. Nicola Wiseman is a Research Associate in the School of Geographical Sciences. They received their PhD in Earth System Science from the University of California, Irvine, where they specialized in using ocean biogeochemical models to investigate the impacts of phytoplankton nutrient uptake flexibility on ocean carbon uptake. They also are interested in using statistical methods and machine learning to better understand the interactions between marine nutrient and carbon cycles, and the role of these interactions in regulating global climate. 

Photo of Nicola A Wiseman on the right

Georgia Sains – Collecting & Analysing Multilingual EEG Data

Georgia Sains is a Doctoral Teaching Associate in the Neural Computation research group at the School of Computer Science. Her research is focused on the overlap between Computer Science, Neuroscience, and Linguistics. Georgia has worked on developing models to help understand how linguistic traits have evolved. More recently, she has been using Bayesian modelling to find patterns between grammar and neurological response and are now focused on using Electroencephalography experimentation to explore the relationship between linguistic upbringing and how the brain processes language. 

Alex Tasker – Building a Strategic Critical Rapid Integrated Biothreat Evaluation (SCRIBE) data tool for research, policy, and practice

Dr. Tasker is a Senior Lecturer at the University of Bristol, a Research Associate at the KCL Conflict Health Research Group and Oxford Climate Change & (In)Security (CCI) project, and a recent ESRC Policy Fellow in National Security and International Relations. Dr. Tasker is an interdisciplinary researcher working across social and natural sciences to understand human-animal-environmental health in situations of conflict, criminality, and displacement using One Health approaches. Alongside this core focus, Dr. Tasker’s work also explores emerging areas of relevance to biosecurity and biothreat including engineering biology, antimicrobial resistance, subterranean spaces, and the use of new forms of evidence and expertise in a rapidly changing world for climate, security, and defense.

Photo of Alex Tasker on the right