Data rush

Okay well I don’t know how successful our experiment was… yet. I know we have data, which was the first hurdle, but does that data tell us anything? Will we unlock the deepest secrets of the human body (almost certainly not, but we can dream), or will we just end up with a whole lot of nothing. That’s the problem with trying something for the first time, you could have something, or you could have nothing and it’s a lot of work before you find out the answer.
So for those unaware, I’m picking up from yesterday’s post (this one). We collected our first dataset using the experiment I imagined for the “big idea” I had. Shortly after I did some pre-processing checks to make sure we actually had data. The problem has been dealing with all sorts of noise from the room we’re working in. Data is there, noise can be removed, but online (as in while it’s happening) we can’t quite tell if we’re recording data or absolutely nothing.
Now the long and hard struggle starts to make sense of the data. Before we can even “look” at the data we need to clean the data! That can take anywhere from a few hours to a few days. It’s a lot of work to clean data and sometimes you go through the whole process, just to start over and do it again to see if you can get a better result. Sometimes I can spend a whole work week (40+ hours) cleaning data sets depending on how hard they are to work with.
There’s a lot of steps to data pre-processing. First we need to format the data into a usable (see: organized) manner. Then we run a whole bunch of cleaning algorithms. Things that remove line noise (60 Hz — in the US — electrical signal), pops (sensor tugs that cause jumps in the data), and movement artifacts (sensor moving relative to the skin), eye blinks, eye movement, EMG (muscle), and other artifacts.
Each step (or at least most) have variables we can adjust to get the ideal amount of noise removed and that number is roughly the same for each data set, but not quite. There is some room to adjust so you can repeat the same step multiple times to see if you can get a better result or if you start removing data. How do you know if you’re removing data? Well you just have to look to see if the algorithm is taking away too much stuff from the data that doesn’t look like noise.
Pops, line noise, eye blinks, etc. all have a “look” to them that you can (mostly) identify visually. So making sure the data have those bits removed and not stuff that may look like neurological signal is key. It’s an art really and it comes with practice, here’s a fun “game” for ICA cleaning I point people to for example. All this takes time and running the algorithms take time, plus there’s a lot of code writing involved to make sure you get the data the way you want it. That, as you may have guessed, also takes time.
In the end I spent 10 or so hours yesterday fooling around with the data and I’m still not finished cleaning it. I’m close, but I decided to call it a night late last evening because it was getting to be a bit much. Today I hope to tackle the remaining steps, and assuming I don’t go back and start from scratch like I did close to the end of yesterday, I should be able to take a rough glimpse at the data to see if we have anything good.
It’s a rush job and I hope to make actually good looking ways to showcase what we found later, but we’re dying to see if we got anything useful and I’m confident we did, but you don’t know until you make the plot. I’m excited, but also a bit anxious to know if this will pay off.
If it works, it’s going to be big, really big!
But enough about us, what about you?