We're a little crazy, about science!

Data mountain

Well the week is over and I’ve barely been able to drag myself out of bed it’s been so exhausting. Unfortunately, we’re doing it all again next week too, so it’s not over yet. But the first of two test experiments are done, I say test because these don’t actually count toward our original set of experiments we’re doing later this summer. Meaning there’s going to be a lot of work coming my way, fun times.

It feels like it’s been awhile, but it’s been two days since I’ve had a serious post around here (okay, none of my posts are serious). After everything that’s been going on I’m surprised I made it, but here we are and I’m happy to say the first experiment went better than expected! We got a lot of data, so much data. Meaning I’ve got quite a bit of work to do moving forward, but that’s the fun of it I guess.

The experiment itself “only” lasted about four hours, but with setup and teardown, it took closer to seven. Which we didn’t even “teardown” we left it for next week. Don’t ask me how they are going to use that room the rest of the week with all sorts of cables and carts hanging around, but we left it all there (mostly anyway). First, I am exhausted. I don’t know how experiments can be so physically demanding when it feels like you’re just sitting there the whole time, okay standing, but they are. We did have a lot of disconnecting, reconnecting, placing cables, etc. so it’s not like I was just watching the computer the entire time, there was strangely little of that. It’s not like I was competing in a triathlon or something, but tell that to my body.

Now that the …hard? part is done I have so much data. We’ve never collected this much data before, in fact there’s so much data I’m not sure I could do a complete analysis of this first dataset even if I had a full month to work with it. It’s a lot and this is coming from someone who works with high density EEG data all the time (64+ channels of just EEG, then we add in EMG, etc. and it adds up quickly). This is almost double what we normally collect and we spread it across several different systems to make sure we could get all of it. That last bit, that’s the problem.

In a perfect world, we would all sample the data at the same rate. The problem is digital data. When we take a measurement from a person it’s a “continuous measurement” or an analog signal. Your heart doesn’t beat in intervals, it’s doing something all day every day until it doesn’t anymore. But computers can’t deal with that kind of data easily. We live in a digital world, so we convert it to a digital signal. Digital signals are not continuous measures, they are discrete measures. We can have any value to a certain degree, but we can’t stream the signal we have to chop it up into samples. Our sample rate determines how fine of a “chop” we have. At 100 Hz, we can sample at 100 measures a second, 1000 Hz gives us 10x that and 10,000 Hz gives us 100x more, but the time doesn’t change, it’s always 1 second and that’s where the problem starts.

We used three different pieces of equipment. In an ideal world we were going to sample everything at the same rate, 10,000 Hz or 10,000 samples per second. This means everything would play nicely and I would be able to sync the data across all the systems. Thursday I discovered our first problem. We have two of the three systems and the highest sample rate one goes is to 5000 Hz, not a problem I could change the other sample rate to match, or so I thought. The other system let us sample at 2000, 4000, or 10,000 Hz. Nothing in between and as close as 4000 is to 5000 it’s not exact. So I checked and sure enough the other system went from 100, 500, 1000, 5000. Nothing matched, so I will have to resample the data to match everything.

We can either up sample or down sample, which as the name suggests up sampling resamples the data at a higher rate, we “interpolate” the values between our samples so we can take a 5000 Hz signal and make it 10,000 Hz by (over simplified here) drawing a line between the two points and selecting the value that line crosses in the middle of the two points (on our time axis). This can (and does) cause problems so we often down sample (remove data points to match the lower sample rate, which also can (and does) cause problems. Still if we’re aware of the limitations we can address them, just more work and more steps for us to go from what we have to what we need.

If that was the only issue then well so be it. But it turns out the other system had its own sample rates and wouldn’t you know it, none of those sample rates matched any of the sample rates we used. The system we were borrowing went up to a impressive 58,000 samples a second (58,000 Hz), but didn’t have a sample rate that matched any of the others our two systems were using. So we settled on 5000 Hz, 10,000 Hz, and 12,000 Hz. That was the best we could do so now we’ve got to do some fancy resampling to make it all match and so we can do our analysis with the entire dataset.

Three different systems also means three different data formats. Luckily I know how to export two of them to MATLAB, my preferred data processing software. The other lab, which does work similar to the stuff I’m used to doing, just in a completely different way (if that makes sense) uses MATLAB as their preferred software, so all in all, we should be able to pass that hurdle pretty easily… with luck anyway.

The only problem now is the mountain of data we collected and I wish I could say I was joking, but I am slightly worried about being able to store it all on my computer with the size it is. I can make room for it for sure, but often times I save copies of steps along the way and that will eat up hard drive space quickly so I may need to resort in storing copies of the data on the hospital server while I’m processing it all. We’ll have to wait and see.

This is one of two tests, meaning I have one more experiment next week then four more in the coming month(ish?) we’re not sure when those will start, but they will be sooner rather than later I believe. Meaning I’m going to have to make sense of all the data fast if we ever plan on publishing… anything… ever.

2 responses

  1. Well that’s annoying. I would ask what you needed three different systems for, but maybe that’s something you can’t tell us yet.

    I hope you’re holding up okay, what with being so tired. You’re not even doing your dissertation yet.

    Liked by 1 person

    June 18, 2022 at 7:24 pm

    • Oh well technically we didn’t need three systems, that’s the funny part. We just didn’t have enough channels on any one system to do all the recording we needed so we had to combine our efforts so to speak. If we had ~$75k or so we could’ve bought something, but I’m working on a budget of less than $2k right now so work with what you’ve got I guess.

      UGH DON’T REMIND ME! haha I am not so patiently waiting for the new equipment to arrive (in roughly 3-4 weeks… wahh!!). Once that happens it will be an exhausting couple of weeks to get all the data needed, but I’m hopeful that once I have the dataset the rest I can do at a more reasonable pace. I’m behind schedule on an already tight schedule, but I gave myself roughly 2x more time than needed to process the data so I’m hopeful that I can still make my self-imposed graduation deadline.

      Liked by 1 person

      June 19, 2022 at 11:18 am

But enough about us, what about you?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.