The value of clean data

In my line of research we have fancy algorithms to remove outside contamination to the data we collect. The problem with collecting electrophysiological data (electrical recordings from a person) is there is so much damned noise everywhere. The problem is magnified when you collect data that have a low signal to noise ratio (meaning lots of noise, not a lot of signal). Signal in this case is the thing we’re interested in measuring and while we have dozens of algorithms to filter (remove) the noise, there’s still no substitute for data that was well collected.
When we record EEG, or non-invasive brain activity (more here)(even more) we are working in a microvolt range or 0.000001 volt and we can work in fractions of a microvolt range so even smaller than that. That means our equipment needs to be sensitive. It’s so sensitive in fact it picks up the electric fields generated by wires in the walls, which we call line noise because it’s literally noise from the power lines. Line noise can be orders of magnitude larger than your data, so we typically will just filter out the ~60 Hz (USA) frequency from the data completely because we can’t really use it.
Now I’ve talked about this all before, even in the two posts I just linked to, but I haven’t really touched on why collecting good data is important, especially if we can just throw a bunch of filtering tools at the data and clean all that noise out. The answer is simply that there is no substitute for low noise data. But when working with equipment that sensitive how do we reduce or eliminate sources of noise?
That is something I’ve spent the past four years trying to figure out. Most of the issues we encounter where noise >> signal are from the way we set up the equipment. When we record EEG, we use a sensor that has weight, a cable, and is not firmly affixed to the head, so there are several sources of noise from this alone. Pulling on the cable will cause the electrode to shift, causing noise in the data. Moving, jumping, running, etc, will also cause the sensor to shift relative to the skin and, surprise, cause noise.
Then there’s the gel. The best equipment we can use are what’s referred to as “wet” electrodes. This is contrast to “dry” electrodes and the difference is wet electrodes get the nickname from the use of a conductive gel. The gel makes contact through the hair to the scalp and the sensor, think of it as a liquid cable. It’s easy to wash out and funny enough, makes the hair look very nice afterwards. While I don’t recommend replacing your shampoo/hair care regimen with it, the only problem with wet electrodes is the cleanup associated with them.
Under gelling (making bad connection) or over gelling (causing multiple sensors to be bridged) all lead to noise, surprise, surprise. So the setup is important and requires you to take your time. We can check how well the connection is made thankfully. When we have a good connection we know that we’ve done our due diligence and can worry about the other problems.
So you’ve arranged the cables so they don’t pull, you’ve gelled just the right amount, and you made sure the cap the sensors go into (which looks like a swim cap and is unfortunately not exactly the most comfortable of headwear) fits correctly, so really the rest comes down to how you designed the experiment. Making sure the sensors don’t move as the person does whatever you’re trying to do is a good first step, trying to eliminate other sources of noise, like electrical equipment in the room if possible, helps.
Writing all this out makes it sound easier than it is, gelling alone can take hours. The trick is to spend the time prior to setup to make sure you know what the heck to do to get the best result. There is a lot of prep and pre-experiment work that goes into making sure the dataset you collect is as good as it can possibly be.
While filtering (removing) the noise is helpful, the best way to get a good result is to minimize noise in the first place. This isn’t always possible, I was doing a study using electrical stimulation for example, which caused a lot of artifacts that could not be filtered or removed or reduced in any way, shape, or form. Instead I had to work around the problem as best as I could, which caused its own headaches and for years (multiple) I was struggling trying to make sense of it all. That would be “last paper,” but we made it through.
Yesterday I discussed another way we can reduce noise, though only tangentially. I’m custom making cables that are shielded, or really double shielded to help reduce the effect of outside noise in the data. It turns out experiments aren’t the only places we want low electrical interference, there are applications like high fidelity music, where you want to keep electrical interference to a minimum and I’ve been doing extensive reading on how other fields manage dealing with this sort of problem.
At the end of the day, the best way to deal with noise is to anticipate the sources and do your best to eliminate or minimize the contribution. In my case that means I’ve spent weeks making these cables and we’re not even sure they will work, in the noise reduction sense. However, if I don’t do my best to make them, then we’ll never know for sure.
It seems like that the solution for almost every problem in life is just a little bit of planning. That doesn’t mean we can get a perfect outcome, but it does improve our odds and I’ll take all the help I can get! And to anyone else collecting data, especially hard to get data, good luck!

But enough about us, what about you?