Interpreting Brain Data Is Hard. Here’s How Machine Learning Could Help
With the right equipment, getting brain data is relatively easy to do. You simply attach sensors to the scalp to measure brain activity. Interpreting that data, however, is not so easy. In fact, it’s one of the biggest challenges brain researchers face today. But a recent breakthrough involving Muse data and machine learning models could help change that.
The Brain Data Dump
The challenge with brain data isn’t that there’s too little, but that there’s too much. And not enough people to review it. An electroencephalogram (EEG) is the test that measures your brain activity. Labeling EEG data points is time-consuming, and requires the expertise of a small group of people in high demand: neurologists and sleep specialists. For that reason, it’s also expensive.
An image of an EEG Spectogram representing brainwaves in frequency (Hz) against time (sec). See orange to red colors representing spikes in frequency.
Imagine, for a moment, having only two neurologists in a lab. They’re hunched over their computers trying to analyze hundreds of thousands of EEG charts like the one above. Due to the sheer magnitude of data, it’s simply not possible to manually review, or label, all of it. As a result, a lot of EEG data points sit in computers in labs around the world, unlabeled and, relatively, unused. So in order to scale EEG interpretations and make them accessible for things that aren’t medical emergencies—like sleep tracking and meditation—we need a computer to be able to review these, rather than a human.
A Discovery In The Lab
Hubert Banville is a researcher who spends a lot of his time in labs working on brain research for Interaxon (aka Muse) as well as at Inria and the Universite Paris-Saclay. He develops algorithms based on these piles of unlabeled data. Machine learning—training a computer to recognize patterns—uses these algorithms to convert the data into a language it can understand, reads, and makes a decision based on what the algorithm learned. Recently, Banville and his team made an interesting discovery that basically makes sifting through these piles of data way easier.
They found that they could extract information from the unlabeled data using various self-supervised machine learning techniques (1). Self-supervised means they look at unlabeled data and try to predict things about it to learn features that could be useful for the next step, prediction. Their self-supervised approach picked-up on elements of EEGs that could help predict sleep stages or physiological disorders—information that correlated with age and other physiological phenomena. Banville shares one of their findings.
“For example, one of our approaches learned to identify whether EEG data were recorded close in time, or came from different parts of a recording. It turns out that if you’re good at solving this simple task, you’ll do pretty well when it comes to identifying which sleep stages someone is in.” – Hubert Banville
A visualization of self-supervised learning features and methods (temporal shuffling) on a dataset found in a previous sleep stage study shows the distribution of 5 sleep stages as scatterplots above. There is a clear structure related to sleep stages (even though no labels were available during training). They not only corresponded to the labeled sleep stages but they are also sequentially arranged: moving from one end of the sleep cycles to the other (W, N1, N2, and N3 sequentially).
They also discovered that having large amounts of unlabeled data actually allowed machine learning algorithms to understand patterns in that data and make better predictions than using limited amounts of labeled data. In fact, with unlabeled data from thousands of EEG recordings, accuracy levels were up to 20% higher than when using smaller amounts of labeled data alone. And this happened in two different EEG problems: identifying sleep stages in overnight recordings and detecting EEG pathologies (1).
What This Means For You (and Brain Research)
This discovery could enhance everything from your consumer sleep and wellness support tools like Muse S to neurological disorder diagnostics. EEG datasets generated with Muse technology—some of the largest in the world—have enabled the application of a new machine learning approach. This means more reliable automation, which could help lower costs and increase access to insights from EEGs for Muse as well as the global neuroscience community.
Hubert Banville is a Ph.D. student in Computer Science at Inria, Université Paris-Saclay, and a researcher at InteraXon Inc. With a background in biomedical engineering (Polytechnique Montréal), he previously conducted research on functional neuroimaging and hybrid brain-computer interfaces at the MuSAE Lab (INRS, Université du Québec). His current research focuses on learning representations from EEG and other biosignals using self-supervised learning, with a focus on consumer neurotechnology applications.
References: