First project deadline!

I didn’t have anything particularly lined up for today to add to the “in statistics” posts I’ve been doing, so today I thought it may be better to outline what I’m doing in the statistics course I’m taking. At the very least it may help me get it done, because as usual around here, the project is due… today. Yep, it’s yet another mad dash to the finish. Will I make it? Will I ever figure out what statistics is? Will I learn to stop asking questions in this format? Find out all this and more!
One more time from the top. I’m a third year PhD candidate in neuroengineering. When I’m not making prosthetics, 3D printing a copy of my spine (seriously), or trying to get my degree, I’m here telling my story. I’ve been blogging every day for the past two years so people can get a full picture of what the PhD process — or at least my PhD process — looks like. For the past week or so I’ve been going over stuff I’m learning in my last required class for my degree, it’s a statistics class because I hate myself. However, I’ve got my first project due today and so instead of devoting a ton of time to try and simplify the why of statistics, I’m going to talk about the first project a bit.
In short, the first project is awful. I enrolled in this course to learn more about how to do the statistics I need to do for my research. It’s a good course don’t get me wrong, but I think the instructor has a different idea about what we should be learning. We’ve covered a lot over the past few months and we’ve been doing it using the “R” software (free to download if you want to torture yourself too!).
If this class was taught in MATLAB I wouldn’t have any problem. I’m pretty good with MATLAB these days and it’s my prefered software for everything I do because everything I do is built around the software. Funny how that works. Unfortunately, a lot of stats people use R, which is why I am using R. R we getting it yet? (See what I did there?) Anywho, the point is this course has been a lot of learning the frustrating syntax differences between what I do in MATLAB and what I want to do in R.
There are several things that annoy me about this software, from the way the variables are stored, to the way you save them. It’s a whole thing and I’ll stop complaining about the stupid, horrible, no good, very bad, software. The main point here is we’re using this software for the course because it’s the software of choice for people in the statistical field. Okay, so I can buy that logic, I accept that’s what we’re doing and all that, but then we have this project…
The task isn’t hard, we’re supposed to replicate several plots from a manuscript given the dataset used in the manuscript. That would be nothing if it weren’t for the fact that I need to do this in R. For the past month in my downtime it’s been a crash course in how to use R and lots of googling how to do things. I’ve spent more time trying to figure out workarounds for things and hacking solutions together to get the plots looking perfectly like the manuscript. A few interesting highlights from this experience.
The first is legends are a pain. I put off creating my legend until I had all three plots, since it’s one legend to rule them all. I thought this would be smart because how hard could making a legend be? It turns out the answer is very. It was very hard to make the legend the way I needed it. So hard in fact, that I had to go back and reformat the data into a whole different organization to make it all work.
Then I wrote a bit of code that gave me even numbers. Finally something easy and you know what it worked… for the first two numbers. I wanted 0 to 1 in increments of 0.2, what it spit out was 0, “”, 0.2, “”, 0.4, “”, “”, “” 0.8, “”. The “” is an empty column/row/whatever you want to call it. So apparently my copy of R doesn’t think 0.6 and 1 are divisible by 0.2, which was odd and I had to go the less flexible route of manually creating the numbers I wanted. That alone took me nearly 30 minutes before I gave up and wrote it out manually. I could do it in MATLAB in seconds.
Oh and my favorite bit of hacking things together, all this needs to be run when you “knit” the file together. Yeah I can’t use variables in my workspace to create (in this case) my PDF file with all this stuff loaded into it. So the workaround I figured out was to save all the data to the folder I’m working in and knit it together that way. Which is how I discovered there was no way to clear variables except the ones I wanted without downloading a package and using some weird syntax to do it. For those interested or having a similar frustration, the library is gdata and the command is keep(Whatever, You, Want, To, Keep, Here, sure = TRUE). Yeah you type out that whole command and you still need to make sure that you are sure you want to only keep that data. I don’t know why that makes me so angry, but here we are.
The grading for this class is such that this three part project is worth 40% of my grade, but it’s all basically a test at how good I am at googling R commands. I didn’t want to learn how to make pretty graphs in R, I wanted to learn about why statistics are the way they are and not something completely different (if that makes any sense). So in that aspect the class has been somewhat of a letdown. I’ve got a perfect score so far in all the homeworks and what not, but even then I could completely fail the course over the stupid project. That won’t happen, but I hate that so much of my grade depends on my ability to use R.
Anyway that’s today’s rant. I promise we’ll get back to doing some stats here soon, or maybe I’ll discuss my PhD work since there’s been a few changes since the last time I talked about it.
But enough about us, what about you?