The work continues
Yesterday I mentioned all the work that was ahead for me. And work there is! While I have one dataset done (mostly) I need to process the rest of the data (ideally) to make my deadlines. Not all the data needs to be processed, but most of it would be nice, all of it would be great, and right now I have exactly no data processed so I am not exactly sure how much I’ll be able to finish today. Sometimes that’s just how these things go though.
For those unfamiliar with how we handle EEG data in general, a bit of background. We don’t use “raw” signals. I mean we can, but with technology the way it is, even if we want to use something in “real-time” we can still process the data before using it. Real-time in quotes because it’s really pseudo-real-time as in a bit of a delay, but not enough to notice. In any case, when we’re doing things that aren’t real-time, meaning we can process the data offline and make it as nice as possible, there’s a lot of trial and error that goes into the pre-processing.
Pre-processing is the step where we “clean” the data. Most of us in the field use a variation of what’s called Makoto’s pipeline, which is named (as these things go) after the person who created it. The pipeline we use in the lab for EEG isn’t exactly Makoto’s pipeline, but it’s fairly close so I often call it a modified Makoto’s pipeline because our lab has added in several steps that aren’t in the pipeline Makoto suggests. The problem with any pipeline isn’t that it’s difficult to implement, it’s that there are things that need to be adjusted to make it as good as it can be.
Every step has weights or variables that can be tuned to best do what you want to do with the data and using the wrong values can either overclean (remove useful data) or underclean (leave artifacts or non-brain signal) in the data. So for me pre-processing consists of three steps, the first is to write out the code for the pipeline, the second is to validate it and make sure it works with the first dataset you want to process (not always the first dataset you collected), the last step is to tune all the parameters to make sure it does what you expect it to do.
I always try to drive this point home since it was hard learned. Data cleaning (pre-processing) is a very hands on process. You can automate it, but if you don’t have to do it, you shouldn’t automate it. Each dataset will require a different level of cleaning because everyone has different hair, or maybe there was less gel used so the connection wasn’t as great (meaning an even lower signal to noise ratio than we normally deal with), whatever the case you can’t treat every dataset as the same because they are unique.
That doesn’t mean you change the pipeline you use, because you don’t. It just means you need to adjust parameters slightly between datasets. There’s a specific step people tend to try to automate, independent component analysis, which is a time consuming process of looking at estimated source signals and picking and choosing which to keep and which to discard. I’ve written about this process in the past pretty in-depth (here) and we can/do automate this process, but to me it’s one of the more important steps because the data can be complex enough that it sometimes requires going by “feel” and by feel I mean looking at all the ways we can visualize that data and make sure we want to keep it or get rid of it.
Pre-processing can take hours for one single dataset, but since my data is so complex, I feel like it’s going to, on average, take even longer. The last step in particular is aligning my EMG and EEG data together since they are on different systems. We can do it fairly easily, but there are things that make this hard as well. I’m hoping to mostly automate this, but you can’t completely automate it no matter what you do. There’s a lot of reasons for this, most I don’t want to go into too much, but you have to add flags into the data to align things with, on occasion you can have double flags added, or maybe your flag was a false start so you don’t want to use it and need to remove it prior to aligning the data.
Now that we’ve covered all that I can explain that despite trying to get everything done on a semi-fast schedule I’ve gotten zero new datasets fully processed. I’m on the last step for one dataset (aligning EMG/EEG), but with the exception of the single dataset that’s been through the pipeline, I’ve still got quite a bit of work ahead for me.
Most of yesterday was set tuning everything and while the values will change slightly between datasets they should be close enough that I won’t have to play with them for hours (like I did yesterday). Meaning today should go somewhat smoother and with a little luck (a lot of luck) I can get multiple datasets processed. I don’t think I’ll get all nine remaining done, but I should be able to make a large dent into it and if I can find some time during the week to finish the rest (maybe).
I really need to finish this, so if I can’t find time to do it, I’ll have to make time. So now that this is done, it’s off to work I go!
But enough about us, what about you?