The art of coding
Well I did it. I got all the code written for the next two (of three) steps for processing my data. I can’t do the third until the second step is done for at least one dataset, but I’m almost there and the last bit of code is to visualize the data, which is substantially easier. So let’s talk about what I did and why I won’t have to do it again now.
Let’s just get this out of the way now, code should be modular. You write the code in such a way that it is versatile and doesn’t rely on certain properties of the input dataset. Don’t worry, I’ll explain that further. In the end if you write some code for a dataset, if you need to do the same process again, the code should cover that dataset too.
I have a dataset that has eight repeated trials and eight conditions. For my second code, I could’ve written it quickly and just used a counter to count through my repeated trials and without “knowing” what condition I am processing. I could just separate it based on the order 1-8 is condition 1, 9-16 condition 2, and so on. This would make my code very rigid though. What if I want to reuse the code and I have 9 repeated trials the next time, or 7? Then I would need to change every instance where I relied on this property of the first dataset and that takes time and you could make a mistake somewhere without knowing it.
To avoid this I wrote some code that doesn’t use that property. In my full unprocessed dataset I have certain “flags” to label certain data. Let’s say one of those flags is a rest and one is move. Doesn’t matter what move means, let’s just say those are the two conditions I am working with. The first code I wrote pulls out the “chunks” I’m interested in, my second code processes the data, and the third (unwritten) code will visualize that data (make a pretty plot). Well in my first code I also pulled out those labels so now I have my conditions and labels attached to those conditions. This gives me a way to keep track of which set is what instead of relying on just the number.
Since I thought out the steps before I wrote my codes, the second code looks for these labels and uses them to separate my data into the correct bins once they are processed so all the repeated trials with the same condition get grouped together no matter how many trials I have. Now if I ever have to reuse this code I can just hit run without needing to double check that it’s correct. I know its correct because I’ve written it in a way that makes it easy to check since everything is labeled.
There’s a term for writing code that isn’t reusable or poorly written. It’s called technical debt. If I write code that’s bad, if I have to reuse it I need to take the time to fix it. It takes twice as long to edit this bad code because you don’t remember exactly how you wrote it and what it does. The more bad code you have that one works with one dataset, the more technical debt you accumulate.
That isn’t to say your coding doesn’t improve as time goes on, when I started writing matlab scripts they were awful and now they are… slightly less awful. Yeah, they aren’t great (in my opinion anyway), but I took the time to write them as best as I could. Sure it takes a bit of time upfront, but it means that when I need to reuse it, I won’t have to rewrite it. It will do what I need it to do with little or no modification.
Needless to say, despite the lack of motivation I’ve had, I got most of the work done that I needed to do. I don’t think I’ll ever really use this analysis again, it was a one off for the experiment that I had to do. However, if I ever need to do it again, I’ve got code to do it. If others need the code, I can give them a good foundation to build on. Just something to keep in mind next time you need to write a piece of code.