Challenges in data collection

It had to come sooner or later, I needed to finally take a look at some of the data I collected for my dissertation. The problems were many and the biggest issue is that I’ve never worked with the software we used to collect the data, so there were issues… lots of issues. So today let’s recap the problems I’ve faced and hopefully I the next time around this won’t be so damned painful.

This should be the easy part. Collecting data for my dissertation project should be about as painless as it comes. I’ve done data collection in the past without any issues and I even collected all the data for “last paper” by myself, which is very unusual in the work I do. Normally you have a team of people helping, at least one or two extras to lend a hand when needed. So going into my data collection phase I felt fairly confident that it would be a walk in the park. Turns out that’s not the case.

The problem is the hardware, as usual. Not even the stuff we normally use, but some of the EMG sensors we’re using. It’s a pain because while I’ve used the sensors themselves in the past and frankly I currently use them in my work at the hospital, I haven’t used the software that comes with the sensors. There’s probably good reasons for that and one of those reasons may be the problems I had when I finally went to collect my data.

Every so often, the computer would freeze or the software would just straight stop saving the data. This was problematic because you need to watch the screen to catch the problem, which was on a second laptop about 10 feet away from where I was collecting my EEG data. So we (I) lost data as I went especially the first time it happened, which I’m not even sure how much of the first few blocks of experiments I actually captured and how much is lost to the data gods.

Normally we have one file for EEG and one for EMG. They are very large, very long streams of data. EEG is typically collected in our lab using 64 channels, so 64 different streams of data at 4000 Hz (or 4000 samples per second) for hours worth of data, I’ll let you do the math there, but the datasets are huge and hard to work with. The same goes for EMG, but luckily I’m only collecting 10 sensors worth of data, so roughly 1/6th the size of the EEG data.

The two streams are synced using what we call a trigger box. It places a marker in both streams of data. A good practice is to place one between every block (trial) you perform and align the blocks accordingly afterwards so you don’t lose a sample here or there (which can and does happen). This ensures high fidelity in the time domain and is important because we’re comparing the brains reaction to something that the person is doing so having them aligned is something we take very seriously. If you don’t your data are garbage and you’ve wasted time.

Which means I’ve probably wasted my time.

Instead of having a nice clean 2 file step, one with EEG and one with EMG, I have one EEG file and 23 EMG files. All because the software kept stopping and forcing me to restart the data collection. It was painful, incredibly painful, but I got the data and now I need to figure out how to align spotty EMG data to my EEG data.

Following my own best practices, I’ve put triggers at the start and stop of each block and I know for at least the later part of the experiment I was good about getting full blocks collected before it stopped. It got to the point where I would stop and start the software again, but it turns out that didn’t do much to resolve whatever problem I was having. So I mostly have blocks worth of EMG files, mostly. Some of the blocks didn’t get completed because the software would freeze mid trial.

Now here’s the really screwed up part. I have 23 EMG files and no way to tell which order those files go. The naming isn’t sequential, I had to name each file I saved and being a professional idiot, I kept naming them variations of, “EMG data” instead of, I don’t know, being smart and numbering them.

So my files are grossly out of order, some definitely don’t have an end trigger to sync with and there are large gaps in my data for the EMG side. It’s a mess and I’m not sure I can make sense of it. However, today we’re going to put our best efforts forward and try to do just that, because I really need to get the data for my dissertation and if I can’t make sense of this data, then I’m back at having zero participants, which is going to be problematic.

The only good news is I have funding for my project, so we ordered the same EMG system for our lab and specifically for my project. I know for a fact that the order was placed and the company we’re getting our system from has acknowledged the order was placed, so now we just have to wait. Will that solve my problems with data collection? Maybe, maybe not.

I still don’t know if I can use the software I use at work to collect the data, I SHOULD, but for whatever reason it didn’t work with the system the lab we borrowed from had. I’m assuming it’s because the system they had was very old, that’s the hope. If not, at least we’ll have our own system so I can work out the details and try to troubleshoot the software. I’m assuming there was a problem with the way the template they loaded for me worked. That’s probably a good assumption, but who knows? I trust the company and while I’ve never used their software, I’m assuming it works because researchers use it all the time. The only problem I have is figuring out what went wrong.

For now though we’re going to try to make sense of this data and see what I can do with what I have. It may not be much, but it’s a start.

On current events:

A quick aside. Normally I write something about events like what’s been occurring with more frequency (yet again) here in the US. Specifically the fact that a bunch of kids just got killed and the cops did absolutely less than nothing to stop it. Murder sprees because the people in power think everyone should be able to buy military grade weapons without any sort of hurdle makes me incredibly angry. I hate that we have to live like this in the US and that the people in power are okay with it because they know they are safe and they don’t care about the rest of us. I’ll probably write on it eventually, but with my mental health the way it is, I can’t do the topic justice right now. Just know that I see it too and I’m angry, in case you were wondering.

This entry was posted on May 26, 2022 by The Lunatic. It was filed under 365 Days of Academia - Year three, Dissertation Work and was tagged with academia, college, data science, dissertation, Education, learning, neurology, PhD, research, school, science, student.

→

←

6 responses

writerofminds

This is one of those questions that might be impertinent, but I’ll ask just in case: do your EMG files have “created at” and “modified at” timestamps that you could use to put them in order, even if the filenames don’t provide any clues?

I haven’t been getting involved in the current events discourse either, but your feelings and your position do have my sympathy. We can talk more about it if you do a blog on it.

LikeLiked by 2 people

May 26, 2022 at 12:27 pm

Reply
- Michael Faragher
  
  This was my first thought as well. Virtually every file system has a time/date stamp on file creation and modification. Right clicking on the file in Windows should show the properties.
  
  DOS compatible command line (like Windows) you can use “dir /t:c” (not case sensitive) to list the directory and creation time.
  
  On POSIX (Unix and Linux, possibly Mac) “ls -ltc” (case sensitive) gets time of last modification (because why would life be easy) A better solution is ” stat * | grep -E “File|Birth” ” which will return something like:
  
  File: app.config
  Birth: 2022-05-01 22:47:44.631108100 -0500
  File: app.manifest
  Birth: 2022-04-19 21:09:26.463542000 -0500
  …
  
  for every file in the current directory.
  
  LikeLiked by 1 person
  
  May 26, 2022 at 5:24 pm
  
  Reply
  - The Lunatic
    
    Oh nice! Thank you. I’m worried because the files were copied from the computer I was using to my personal computer, so the time stamp may have changed. I’m not sure if that’s the case here, but I do recall seeing that happen before. Of course I could be remembering wrong, but we’ll have to see once I dig more into the data.
    
    LikeLiked by 1 person
    
    May 27, 2022 at 5:14 pm
    
    Reply
- The Lunatic
  
  Yeah that was my thinking too, but I’m not sure it will be enough. Since the data were copied afterwards sometimes those “created at” and “modified at” times change. I’m going to try to use them and hope for the best, but we’ll have to see. I’m hopeful though!
  
  LikeLiked by 1 person
  
  May 27, 2022 at 5:13 pm
  
  Reply
  - writerofminds
    
    On Windows at least, it looks like copying changes the “Created” stamp but not the “Last Modified” stamp. So I think it’ll be okay.
    
    LikeLiked by 2 people
    
    May 27, 2022 at 6:09 pm
    
    Reply
    - The Lunatic
      
      Oh perfect! Well now I’m extra hopeful and it would be nice to be able to use the data!
      
      LikeLiked by 1 person
      
      May 27, 2022 at 6:10 pm
      
      Reply

But enough about us, what about you? Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Lunatic Laboratories We're a little crazy, about science!