Dissertation data update 2

Yep, we’re still being lazy and just numbering the updates. Maybe I’ll come up with fancy titles, but for now, this works. Plus this makes it easier for me to search back later. So now that I’ve justified my laziness, let’s talk about where we are. We are here. Okay, seriously, currently I’m still in the pre-processing phase, but there’s good reasons for that. Today I may actually finish pre-processing and I’m hopeful I can hammer out at least a rough idea of what we have… maybe.
The deadline for my DARPA submission is about a month away and I still haven’t done much with the data I finished collecting at the beginning of this month. So far, I’ve met with my helpers twice, once last week and once this week, to go over what I’m doing and how I’m doing it. Frustratingly for me, they are new, but that was expected, so if step 1 is loading the file and step 2 is the full on pre-processing of the data, we’re somewhere at step 1.25, not quite pre-processing, but I think I’ve got them loading the file at least.
It reminds me of how clueless I was when I first started, but it’s also a fun little reminder that I’ve learned a lot in the past four years, even though you don’t ever really notice it. It’s like when I got my BSME, I thought for sure I had somehow slipped through the cracks because I didn’t feel like I had learned anything new, but then I realized I could do all sorts of “fun” math, control systems, etc. that I wouldn’t have been able to do going in, so sometimes learning is sneaky like that. The problem is I can’t afford to be slow right now, we need to hit the turbo button.
After taking the day to rest yesterday (here), I’m feeling pretty good today and I’m somewhat anxious to get through the pre-processing step of my data. Pre-processing (which I promise once my helpers learn, I will share here… eventually), is just everything we do to the data to make it ready for analysis. We can theoretically do this “online” as in pseudo real-time, and we do, but in my case we want to get the best result possible so we’re doing what’s called “offline” analysis, meaning not real-time. The reason is no matter how good your online data cleaning is, it is no substitute for the manual processing and checking and adjusting the filters and things.
Since I’ve been hard at work on some of my other projects (ie – big idea), I ended up writing a lot of code that I knew I would be able to use on this project. The goal is to get to the step where I can plug in the code and get something useful, but to do that I need to finish the pre-processing steps. Part of that is taking the data I collected, which was “multimodal” meaning we recorded different things, in my case both EEG and EMG, and align the data. Each mode of recording is a seperate system (in this case two systems) and we need to align our data so that we can look at it and see what happens.
Since each system has a different sample rate, meaning a different time resolution, we need to either upsample or downsample our data so the number of samples match. Sample rates can be super low (< 1 Hz) or super high (> 20 KHz) depending on what you’re using and what you want to get out of your data. For reference let’s say we sample at 1000 Hz, that’s 1000 samples (measurements) a second or 1 measure every 1 ms. That’s a lot of data when an experiment can go for hours (hours, with an s).
Importantly we can’t start both systems at the exact same time, or at least we can’t do it easily. Instead we send common “flags” or markers to both datastreams so we can use those markers to align the data since they get sent to both systems at the exact same time. This gives us a very high temporal (time) resolution and that’s important when having data that’s even 1ms off can be detrimental to your result and understanding of the data.
Since this is not new to me, I’ve got a few ideas about how to speed up the alignment and segmenting (cutting up the data into the interesting chunks) process. Today I am hoping to get at least most (if not, all) of the pre-processing done and start testing some of those ideas for alignment. It will be easy to verify that the data are aligned, so if it works then that’s great, if not I have other ways to align the data using code I’ve relied on in the past. It just means modifying it for my new dataset, which isn’t hard, just time consuming and as we’ve noted, time is not on my side right now.
With that update out of the way, I hope the next time I post (update 3) I have some better news. Technically for next month, I only need a few of the datasets ready to go and that’s really what I’m aiming for, but I won’t complain if I can get it all done. We’ll just have to see how creative I can get with my code.
But enough about us, what about you?