Archive

Storing your data in a spreadsheet

 

Photo via Unsplash by Glenn Carstens-Peters

Blog written by Jonty Rougier, Lisa Müller, Soraya Safazadeh, Centre for Thriving Places (the new name for Happy City)

What makes a good spreadsheet layout?

We were recently trying to extract some data from the All tab of the ONS spreadsheet Work Geography Table 7.12 Gender pay gap 2018

This gave us the opportunity to reflect on what makes a good spreadsheet layout, if you want to make your data easily available to others. The key thing to remember is that the data will be extracted by software using simple and standardised rules, either from the spreadsheet itself, or from a CSV file saved from the spreadsheet. Unless you recognise this, much of your well-intentioned structure and ‘cool stuff’ will actively impede extraction. Here are some tips for a good spreadsheet:

Names

Each of your data columns is a ‘variable’, and starts with a name, giving a row of variable names in your spreadsheet. Don’t use long names, especially phrases, because someone is going to have to type these later. Try to use a simple descriptor, avoiding spaces or commas; if you need a space or some other punctuation, use an underscore instead (see below). You can put detailed information about the variable in a separate tab. This detailed information might include a definition, units, and allowable values.

In our example spreadsheet we have

Current column name Our description Better column name
Description Region names Region_name
Code Region identifiers Region_ID
Gender pay gap median Numerical values GPG_median
Gender pay gap mean Numerical values GPG_mean

There is a mild convention in Statistics to use a capital letter to start a variable name, and then small letters for the levels, if they are not numerical. For example, the variable ‘Sex’ might have levels ‘male’, ‘female’, ‘pref_not’, and ‘NA’, where ‘pref_not’ is ‘prefer not to say’, and NA is ‘not available’.

  1. Use an IDENTICAL name for the same variable if it appears in two or more tabs. It’s amazing how often this is violated: identical means identical, so ‘Region_Name’, ‘region_Name’, and ‘region_name’ are NOT the same as ‘Region_name’.
  2. There are two different conventions for compound variable names, like ‘Region name’. One is to replace spaces with underscores, to give ‘Region_name’. The other is to remove spaces and use capitals at the start of each word, to give ‘RegionName’, known as camel case. Both are fine, but it is better not to mix them: this can cause some old-skool programmers to become enraged.

Settle on a small set of consistently-used codes for common levels

NA for ‘not available’ is very common; in a spreadsheet, you can expect a blank cell to be read as NA. ‘Prefer not to say’ comes up regularly, so settle on something specific, like ‘pref_not’, to be used for all variables. The same is true for ‘not sure’ (eg ‘not_sure’).

At all costs, avoid coding an exception as an illegal or unlikely value, like 9, 99, 999, 0, -1, -99, -999; we have seen all of these, and others besides (from the same client!). If you want to use more exceptions than just NA in a variable with numerical values, then use NA for all exceptions in the values column, and add a second column with labels for the exceptions.

In our example spreadsheet, if you look hard enough you will see some ‘x’ in the numerical values columns. We initially guessed these mean ‘NA’, but in fact they do not! In the key, ‘x = Estimates are considered unreliable for practical purposes or are unavailable’. But surely ‘unreliable’ and ‘unavailable’ are two different things? Ideally only the second of these would be NA in the GPG_median numerical column. A new GPG_median_exception column would be mostly blank, except for ‘unreliable’ where required to qualify a numerical value.

Generally, we prefer a single column of exceptions, possibly with several levels. In another application the exception codes included ‘unreliable’, ‘digitised’, ‘estimate’, and ‘interpolated’.

Put all meta-data ABOVE the rows which store the data

This is because extraction software will have a ‘skip = n’ argument, to skip the first n rows. So everything which is not data should go up here, to be skipped.

  1. DO NOT use the columns to the right of your data: the extraction software will not understand, and try to extract them as additional columns.
  2. DO NOT use the columns underneath your data, for the same reason. Your variables will be contaminated, usually with character values which stop the columns being interpreted by the extraction software as numerical values.

In our example spreadsheet, there is a ‘Key to Quality’ to the right of the columns. Clearly the author of this spreadsheet was trying to be helpful, but this information is already in the Notes tab, and the result is distinctly unhelpful.

In our example spreadsheet we also have three rows of footnotes immediately underneath the data. The correct place for these is in the Notes tab, or above the data.

Do not embed information in the formatting of the cells

This is an unusual one, but our example spreadsheet has done exactly that. Instead of an additional column Quality, the author has decided to give each numerical value cell one of four colours, from white (good quality) to deep red (either unreliable or unavailable). This is useful information but it is totally inaccessible: cell format like colour is not read by extraction software.

Don’t have any blank rows between the row of variable names and the last row of data

This is not crucial because extraction software can be instructed to skip blank rows, but it is better to be safe.

Our example spreadsheet has no blank rows – result!

More information

For more information about Centre for Thriving Places check out their website

 

 

 

 

 

Optimization of ultra-thin radiation resistant composites structures for space applications

Composites have been used for space applications due to their high performance properties. However, the environmental conditions experienced during space exposure lead to severe structural damage.

Blog by Mayra Rivera Lopez, PhD Researcher, Bristol Composites Institute (ACCIS) Advanced Composites Collaboration for Innovation and Science, Department of Aerospace Engineering, University of Bristol

This project was awarded seed corn funding from the new Jean Golding Institute Post Graduate Researchers seed corn award scheme 2020.

Project aims

  • Select current ultra-thin carbon fibre reinforced composites (CFRP) based on thermoset resins used to manufacture space deployable structures. The presence of nanofillers on these resins will be considered for some samples.
  • Set the selected CFRPs samples in a plasma generator chamber to simulate space exposure at a low Earth orbit for up to twelve months.
  • Perform 3D surface topography analysis by using the Alicona Infinite focus microscope on each sample. These tests will identify any surface imperfection on the composites, leading to a deterioration of their mechanical properties.
  • The overall void volume will be analysed and identify significant resin modifications which could act as a focal point for crack propagation or radiation damage.
  • Collect infrared analysis data from these composites before and after being exposed to plasma conditions (available from previous experiments at the University of Bristol).
  • Create a correlation between the chemical structure of the resins obtained from the infrared data and the voids presence.
  • Create a database to predetermine the best thermoset overall performance for space structures applications and establish how to optimise it.

Results

Materials Selection and Atomic Oxygen Exposure Details

Two different epoxy composites were selected to be analysed, including CY 184, Aradur 2954 and MTM44-1 materials. EP0409 Glycidyl POSS nanofillers at different contents (0, 5, 10, 15, and 20 wt%) were applied to these. The convention (1x2y3z) was selected where the subscripts display the content of each component. Moreover, two curing techniques were applied, the use of autoclave and compression plates. Table 1 displays the composition and manufacturing details of the laminates.

Table 1: Composition of the composites resins

A JLS Designs Plasmatherm 550–570 radio frequency plasma generator was used to expose the samples to Atomic Oxygen (AO). A frequency of 150 W, constant pressure of 100 Pa and a constant O2 flow of 0.3 NL/min were applied to simulate space conditions for an equivalent of twelve months.

Surface Roughness Analysis

The 3D Surface topography images were obtained from scanned composite areas of 2.5mm × 2.5mm after AO exposure. An Alicona Infinite Focus instrument with a 5× objective was used to characterise the samples. The overall void volume was calculated on Matlab by using the 3D dataset files obtained for each sample and an average depth 2D threshold of each sample. On Figure 1, the composites depressions are observed due to the resin degradation after exposure. On Table 2 the overall void content for each sample is presented. The laminates containing POSS(3) have a smoother pattern (except 180280320) compared to the laminates with only MTM44-1 (including 4kN, 6kN and 8kN), as a silica-based coating was created, protecting the fibres from being exposed.

Table 2: Average voidage volume of composites after twelve months of AO exposure
(a) Composites with POSS content
(b) Composites with MTMM4-1 only

Figure  1:  3D  voidage  comparison  of  the  cured  ultra-thin  composites  and  KaptonTMH  after  twelvemonths of AO exposure

Fourier Transform Infrared (FTIR) Spectroscopy

A Perkin Elmer Fourier-transform infrared (FTIR) spectrometer was used to analyse the Attenuated Total Reflectance (ATR) of the surface spectroscopy. Fifteen scans were performed per sample over a spectral range of 650–4000 cm−1. Figure 2a displays the absorption bands of the composites with POSS nanofiller. Significant peaks at the aliphatic amine N-H stretching region (2800-3000cm−1), as well as an intense band around 1100 cm−1 were observed. This is related as after AO samples exposure, the epoxy rings were opening due to the reaction with the POSS and diamine molecules, leading to the construction of hydroxyl groups. Figure 2b the laminates without nanofillers were presented, including KaptonTM H, with a significant change on the oxirane ring due to C-H stretching vibrations.

(a) Composites with POSS content
(b) Composites with MTM44-1 only

Figure 2: Infrared spectra of the cured epoxy ultra-thin composites and KaptonTM H after twelve months of AO exposure

Correlation

A standard correlation, following Pearson’s method applied on JupyterLab was followed based on the dataset obtained from each sample after exposing them on an interval of two months, until a total of twelve months was achieved. Figure 3 shows the relationship between each selected infrared band presented on Fig 2 and the voids content of each composite. The overall voidage for laminates with nanofillers (Fig.3a) relies more on the change of the POSS band (1100cm−1) due to coating creation. However, this value is less dependent on the C-H stretching vibrations (2850cm−1). For MTM44-1 composites (Fig.3b), C-H stretch band presented the most positive correlation with the overall voidage and the most negative is with the C-H bend region (800cm−1). It was observed that mostly all the bands were directly related to each other after each exposure.

(a) Composites with POSS content
(b) Composites with MTM44-1 only

Figure 3: Correlation between selected infrared bands and overall voidage content

Future work

For future work, the dataset size established during this project could increase, by selecting a wider range of thermosets or any other types of materials applied for space structures. This will allow the selection of the material accordingly to the structure performance require. Moreover, further tests, such as Thermogravimetric Analysis (TGA), Dynamic Scanning Analysis (DSC) and Nuclear Magnetic Resonance (NMR) could be applied to achieve a more detailed characterisation of the resins. The mechanical properties of the samples could also be analysed based on three-point bending test and tensile testing to observe the stiffness, strength and toughness properties and identify the advantages or disadvantages of the nanofiller percentages on the structures. Finally, this analysis could also be applied to different areas,such as the aeronautical, maritime and civil sectors, as well as medical engineering areas to select the best resin composite depending on the application required.

Contact details

mayra.riveralopez@bristol.ac.uk

Bristol Composites Institute (ACCIS), Advanced Composites Collaboration for Innovation and Science, Department of Aerospace Engineering, University of Bristol BS8 1TR

Jean Golding Institute Seed Corn Funding Scheme

The Jean Golding Institute run an annual seed corn funding scheme and have supported many interdisciplinary projects. Our next round of funding will be in Autumn 2020. Find out more about our Funding opportunities

Predicting cause of death and the presence of bad healthcare practice from free-text summaries

We used machine learning to predict cause of death from free text summaries of those diagnosed with prostate cancer; and the presence of bad practice in health and social care for those who have died with learning difficulties.  

Aims

Knowledge of underlying cause of death (UCoD) is a key health outcome in research and service improvement but is not always accurately recorded. Likewise, the identification of bad practice is also vital to understanding health outcomes in the population and improving health care provision 

We have applied machine learning (ML) classifiers to over 4,000 death reviews (free text summaries), from the Cancer Research UK Cluster randomised trial of PSA testing for prostate cancer (CAP) and Learning Disability Mortality Review programme (LeDeR). Each review was assigned a label either prostate cancer death or poor healthcare practice by independent experts. This expert assignment was used to train the ML techniques to: 

  1. Identify the key elements (words and phrases) that are good predictors of
  • Prostate cancer death or 
  • Poor health or social-care practice  

2. Add user confidence by explaining how the ML techniques work, rather than solely relying on the prediction probability that is output by the classifier. In this sense we add transparency to the ML.

We developed the methodology using data from the CAP project, and subsequently applied it to the LeDeR data to test how well it could generalise.  

Results

The first step was to build a tool to predict prostate cancer death from the free text summaries. Using a random forest (RF) classifier with a bag-of-words feature set we found that we could predict prostate cancer death with >90% accuracy. We then investigated how the RF was classifying the free-text summaries by looking at which elements in the free-text summarises were used by the RF to assign prostate cancer death. To do this we investigated a variety of potential visualisation techniques: 

Word clouds

Word clouds provide a visual representation of the words (or group of words) that are most predictive of prostate cancer deaths across the dataset. The word clouds show that clinically important signs of progressing prostate cancer, are key to identifying prostate cancer deaths.  

Figure 1: A word cloud showing the most important features for classification of prostate cancer death (using a random forest). The size of the word indicates the ‘importance’ of the feature.

Interpretable text-outputs

We also used both tree-interpreter and LIME to identify which words contributed most to any given classification. We then visualised these contributions by formatting the original free-text summary to indicate which text elements contribute to the classification, so that the classifier’s decision can be understood by a human reader. 

Figure 2: Snapshot of human-interpretable classifier output. Text elements in the cause of death review are formatted to show their contribution to the classification. The size of the font indicates the importance of the word or phrase and the colour illustrates whether the word indicates prostate cancer death (blue) or not (red).

Writing style

Using free-text is dependent on the quality of the text, which can influence the performance of ML techniques.  

These visualisations use the t-SNE algorithm to show clear clusters of free-text summaries that contain similar elements. Here we can identify the three main clusters, where the summaries share commonalities in style, and represent three different authors, where each author is presented by a specific colour. This analysis even brought the authorship of some reviews into question (note the two separate pink clusters, which appear to be so divergent in this data representation). 

Figure 3: T-distributed stochastic neighbour embedding (t-SNE) of the feature space. Each point is a cause of death review and their proximity in the space indicates their similarity in terms of the language they contain. Colours represents authors of the review. The blue line is a boundary that separates the two distinct clusters of reviews written by the pink author.

Hard vs Easy cases

Cases where there was disagreement between the panel of experts about the cause of death are potentially the same cases where the ML classifier is less certain of the prediction. Figure 4 illustrates that the cases the expert panel found “hard” to assign cause of date (purple and black points in the right hand panel) often sit in the location where prostate cancer (red) and non-prostate cancer deaths (blue) meet in the left hand panel. This is a space where the ML techniques are less certain about when predicting cause of death. This is confirmed by figure 5, which shows that the classifier performs much worse on the ‘hard’ cases than the ‘easy’ cases. 

Figure 4: The two panels show the same embedding of the feature space with points coloured according to: cause of death (left panel) and ‘difficulty’ (right panel). Difficulty of a case is assessed by the ’cause of death route’ which provides a range of how hard the human experts found it to determine the actual cause of death.
Figure 5: Receiver operating characteristic (ROC) curves showing the classifier performance for ‘hard’ and ‘easy’ cases.

The classification of ‘hard’ cases appears to be a more difficult task and we have yet to produce a classifier with good performance for this purpose.  It could be that we need to engineer additional features from the reviews to successfully predict hard cases. Also, natural language elements, such as negation or hedging terms are notoriously difficult to detect but may improve performance at this task 

Generalising to LeDeR

Our methodology was successfully applied (see figures 6 and 7) to the LeDeR dataset, which contains more verbose reviews and a much larger number of authors. The ability of the method to generalise to this challenging dataset validates our approach and encourages us to further develop the approach for application across new domains in the future.  

Figure 6: Receiver operating characteristic (ROC) curves showing the classifier performance when trained to predict ‘poor practice’ on the LeDeR dataset. Note that this classifier is currently overfitted and additional work is required to produce a classifier that generalises better.
Figure 7: Similar to figure 1, this word cloud shows the most important features for classification of poor healthcare practice in the LeDeR dataset.

Future plans for the project

This exploratory project has identified several avenues for further development. Notably, we have developed Python code that can predict prostate cancer deaths from free-text summaries, and demonstrated its application to a second dataset (LeDeR). This documented code will be shared on GitHub to allow others to apply our methodology to their data.  

Future work will focus on developing use cases for applying this methodology. For instance, gaining understanding of the textual elements that are key to decision making could be used to: 

  1. Provide decision support for identifying prostate cancer deaths or poor healthcare practice, reducing the need for clinical experts to review free-text summaries. 
  1. Identify the important elements of the free-text summary for the authors to target the key sections of the medical history, which would speed up the data collection. 

The ability to accurately predict hard cases would help to allocate reviewer resource more efficiently and so is a strong candidate for further development. We are also keen to produce a dashboard that would allow users to interactively explore their own datasets using the methods we have developed. We are exploring external funding opportunities to continue this project and are writing up our results for publication in a special edition of Frontiers in Digital Health.   

Contact details and links

Dr Emma Turner, Population Health Sciences 

Ms Eleanor Walsh, Population Health Sciences 

Dr Raul Santos-Rodriguez, Engineering Mathematics 

Dr Chris McWilliams, Engineering Mathematics 

Dr Avon Huxor, School of Policy Studies 

Methods derived in part from: FAT-Forensics toolbox 

Jean Golding Institute Seed Corn Funding Scheme

The Jean Golding Institute run an annual seed corn funding scheme and have supported many interdisciplinary projects. Our next round of funding will be in Autumn 2020. Find out more about our Funding opportunities

 

Digital Humanities meets Medieval Financial Records

Blog by Mike Jones, Research Software Engineer in Research IT, University of Bristol

The purpose of this project was to explore the use of ‘Digital Humanities methodologies’ in analysing an English-language translation of a medieval Latin document. We used data analysis tools and techniques to extract financial data and entities (e.g. people, places and communities) from the translation. This process then enabled the creation of example visualisations, to better interpret and understand the data and prompt further research questions. 

 Primary source 

The focus of the project was a single Irish Exchequer receipt roll from the latter years of King Edward I’s reign (1301–2). A receipt roll holds information on the day-to-day financial dealings of the Crown. It provides a rich source of material on not only the machinery of government but also the communities and people that, for various reasons, owed money to the king. An English-language calendar published in the late nineteenth century exists but was found to be deficient. A full English-language edition of the roll was edited by Prof Brendan Smith (Co-I, History) and Dr Paul Dryburgh (The National Archives) and published in the Handbook of Select Calendar of Sources for Medieval Ireland in the National Archives of the United Kingdom (Dublin, Four Courts Press, 2005). The original document is in The National Archives (TNA), London, with the document reference E 101/233/16. 

Transcript to tabular data 

The starting point was the text published in the Handbook of Select Calendar of Sources for Medieval Ireland. A Python script was used to trawl the text, looking for details of interest, namely dates and payments, and add them to a CSV (tabular data) format. Using the Natural Language Toolkit we attempted to extract entities, such as people and places, using an out-of-the-box Parts of Speech (POS) tagger. The results were not perfect, with some places identified as people, but it was an encouraging starting point. 

In the tabular data, each row recorded a payment, including the financial term, date, the geographic location or other entity they are categorised, the value owed to the Irish Exchequer. Payments, recorded in pounds, shillings, pence or marks, were converted to their value in pennies for more straightforward computation. We also checked our computed totals against those calculated by the medieval clerks of the Irish Exchequer these were one penny out, the clerks having missed some fractions of a penny on 24 May 1302! 

Data analysis and visualisations 

With the data in a tabular format, it could be queried with the pandas data library, and visualised with the Matplotlib and Seaborn visualisation libraries. Querying the data, we were now able to create several visualisations, ranging from summary statistics for the financial year, drilling down to monthly, weekly and daily activity. We were also able to visualise the days the Exchequer sat, compared to days it did not sit due to holidays and feast days.  

For example, the total value of the receipts for the financial year was £6159.18s.5d. In the following plot we can break-down the payments into the four financial terms: Michaelmas (September–December), Hilary (January–March), Easter (April–June) and Trinity (June–August), as shown in the chart. 

Other plots highlighted the variability of income, the amount of business (number of transactions), and the number of days the Irish Exchequer sat each term. This is illustrated in the following radar plots, where we plot all three variables – total revenue, length of the term and amount of business – with each variable represented as a percentage of the total value for the year. 

What is immediately striking in these plots is that the Hilary term is relatively long but has the least business and income. In contrast, the Easter term is quite short but provides the most income. These plots confirm what the historians expected – the sheriffs made their proffers to the Exchequer in the Michaelmas and Easter terms and thus were anticipated to be busier. 

Reception and response 

While working on the project, findings and visualisations were shared on social media. This prompted interest and questions from other historians. For example, Dr Richard Cassidy asked (https://twitter.com/rjcassidy/status/1240944622186217472): Was income concentrated in the first week or two of the Michaelmas and Easter terms, from the sheriffs’ adventus, as in the English receipt rolls from the 1250s and 60s?We were able to generate plots that showed in the Irish Exchequer the bulk of the income came in the fourth week and not the second. 

Note: in the tenth week of Michaelmas, the spike in payments against a lower number of transactions is accounted for by Roger Bagot, sheriff of Limerick, returning £76.6s.8d. for the ‘debts of divers persons’; and £100 being returned by William de Cauntone, sheriff of Cork, in forfeited property of felons and fugitives. 

Limitations and research questions 

Clearly there are limits to the analysis, since the project only examined one financial year. It would thus be interesting to analyse trends over time. How does the 1301/2 financial year compare to others in Edward I’s reign? What trends can be seen over the years, decades and centuries? How was the income from Ireland affected by war, rebellion, famine and plague? Are there trends to be gleaned from the different administrations under varying chancellors? Also, does income reflect the changeable effectiveness of English royal authority in Ireland? Can we confirm the ‘decline of lordship’ narratives in the historiography of fourteenth and fifteenth century Ireland?  

Future work 

It is our intention to build on this initial work with the support of external funding. An application has already been made under the AHRC/IRC scheme ‘UK-Ireland Collaboration in the Digital Humanities’ to support a Network to investigate suitable DH approaches to the entire series of Irish receipt rolls, covering the years 1280-1420. Despite being unsuccessful, our application was highly rated and we intend to apply for a major research grant worth up to £1m under the same scheme when details are announced. Furthermore, we are committed to collaborating with Beyond 2022, an ambitious project to create a Virtual Record Treasury of Irish history. Beyond 2022 have commissioned the digitisation of a large number of documents held at The National Archives, London, including the Irish Exchequer receipt rolls. Plans include creating English-language translations of the Irish receipt rolls in TEI/XML, the de facto standard for encoding texts. It will then be possible to construct a pipeline, that builds upon this seed-corn funding work, that results in researchers exploring and formulating research questions around English colonial rule in Ireland and how the Irish interacted with English machinery of government. 

Further Details 

More detailed information about the project can be found in a series of blog posts, and the source code and data are available on GitHub

Jean Golding Institute Seed Corn Funding Scheme

The Jean Golding Institute run an annual seed corn funding scheme and have supported many interdisciplinary projects. Our next round of funding will be in Autumn 2020. Find out more about our Funding opportunities

Decoding pain: real-time data visualisation of human pain nerve activity

Blog post by Manuel Martinez, Research Software Engineer and Dr Jim Dunham, Clinical Lecturer, from the School of Physiology, Pharmacology and Neuroscience at the University of Bristol

We are developing new tools to analyse human pain nerve activity in real time. This will aid diagnosis in chronic pain and enable individualised, targeted treatments.

Some patients with chronic pain have abnormally increased activity in their “pain detecting” nerves (nociceptors). We do not know which patients have this problem and which do not. If we could determine which individuals suffer with these ‘sensitised’ nociceptors, we could treat them more effectively, by giving medicines to ‘quieten’ their nerves.

We record from human nociceptors using a technique called microneurography. Sadly, this technique is only used in research as it is too time consuming and unreliable to use clinically. To bring microneurography closer to the clinic we sought to:

Improve Real-time Data Visualisation

  • Improve the way real-time neural data is displayed by replacing a legacy oscilloscope-like trace with a 4D ‘smart’ visualiser.

Close the Loop

  • Develop and implement automated real-time robust spike detection algorithms.
  • Develop and implement closed-loop dynamic thresholding algorithms to automatically control the electrical stimulus energy.

These developments have the potential to significantly increase experimental efficiency and data yields.

Figure 1 Conceptual set-up for a closed-loop experiment. An electrical stimulus of a predefined intensity is applied to the skin (A). If the stimulus intensity is large enough, the nerve will fire and send a “spike” of activity towards the brain. The electrical activity of the nerve is recorded “upstream” at some distance away from the stimulation site (B). These spikes are digitised and processed in a computer (C) so that they can be visualised in real time to aid in electrode placement. The resulting recordings can be exported for further analysis in third-party software tools (D).

Real-time Data Visualisation

Microneurography allows for nerve activity to be recorded by means of a fine electrode inserted through the skin into the nerve. After insertion into the nerve, the skin supplied by that nerve is electrically stimulated to cause activity in the nociceptors (Figure 1). Recording this activity is difficult; it requires careful positioning of the electrode and is further complicated by the small amplitude of the nerve signal in comparison to noise.

Figure 2A shows the legacy oscilloscope-like visualiser commonly used in microneurography. The signal trace represents the voltage measured in the recording electrode as a function of time. The evoked neural spikes are indicated by green arrows. The large spikes (indicated by the red lighting symbol) correspond to a signal artefact caused by the electrical stimulation system.

“Pain” nerves conduct slowly and therefore have characteristically long latencies. These latencies show good correlation between successive firings. Therefore, accurate electrode placement can be verified by the presence of consecutive spikes of similar latency after the stimulus event.

Figure 2B shows our novel 4D visualiser. Here, the signal amplitude is encoded via colour, with lighter colours representing high amplitudes. This colour scaling can be adjusted in real time by the user. The vertical axis corresponds to latency after the stimulus event and the horizontal axis to a series of stimulus events. Therefore, a constant latency spike manifests itself as a line in this visualiser.

This is a significant improvement over the legacy visualiser as the subtle changes in colour and the alignment between two consecutive spikes can be readily identified by eye in real time. This greatly increases the clinician’s situational awareness and contributes to maximising experimental yield.

Figure 2 Microneurography data recording from the superficial peroneal nerve as seen in the legacy oscilloscope-like visualiser (A) and the novel 4D latency visualiser (B-C). Two units of similar latency can be readily identified and have been indicated with green arrows. A possible third unit at a longer latency has been indicated with a dotted arrow. This third unit is only noticeable in the 4D visualiser as it is below the noise level in the oscilloscope trace.

Closed-loop stimulation control

The electrical energies required to evoke nociceptor activity are not constant. These changes in electrical ‘threshold’ may be useful in understanding why patients’ nerves are abnormally excitable. Unfortunately, balancing signal detection against stimulation energy in the context of real time analysis of small amplitude signals is difficult and primed for failure.

To improve reliability and reproducibility, we have developed a dynamic thresholding algorithm that automatically controls stimulation energy once a unit has been manually identified (i.e. a line can be seen in the visualiser). This is conceptually simple: decrease the stimulation energy until the unit ceases to fire, then increase it until it starts firing again.

In practice, the robust detection of spikes is challenging as existing approaches are only successful in environments with high signal-to-noise ratios (SNRs). To address this, our proof-of-concept algorithm first takes a set of candidate spikes (obtained using a simple threshold crossing method – green points in Figure 2C). Then, these candidate spikes are temporally (latency) filtered so that only those around a small region of interest near the detected track remain. This detection algorithm, despite its simplicity, has shown promising performance on pre-recorded and simulated data and is now ready for testing in microneurography.

Revolutionising human microneurography

We seek to revolutionise human microneurography: bringing it into the clinic as a diagnostic tool; informing treatment decisions and demonstrating ‘on target’ efficacy of new analgesics.

The novel 4D visualiser and automated closed-loop experimental tools developed here will be validated in microneurography experiments in healthy volunteers and then made publicly available in the spirit of open-source research. Additionally, we will integrate more advanced methods of ‘spike’ detection into the algorithm to maximise sensitivity and specificity.

We anticipate our first patient trials of these novel tools within the next 12 months. Our visualiser will enable rapid identification of abnormal activity in nociceptors, paving the way towards data-driven, personalised treatments for patients living with chronic pain.

Contacts and Links

Mr Manuel Martinez Perez (Research Software Engineer, School of Physiology, Pharmacology & Neuroscience)

Dr Jim Dunham (Clinical Lecturer, School of Physiology, Pharmacology & Neuroscience)

Dr Gethin Williams (Research Computing Manager, IT Services)

Dr Anna Sales (Research Associate, School of Physiology, Pharmacology & Neuroscience)

Mr Aidan Nickerson (PhD student, School of Physiology, Pharmacology & Neuroscience)

Prof Nathan Lepora (Professor of Robotics and AI, Department of Engineering Mathematics)

Prof Tony Pickering (Professor of Neuroscience and Anaesthesia, School of Physiology, Pharmacology & Neuroscience)

Jean Golding Institute Seed Corn Funding Scheme

The Jean Golding Institute run an annual seed corn funding scheme and have supported many interdisciplinary projects. Our next round of funding will be in Autumn 2020. Find out more about our Funding opportunities