## Fun with Rstudio

Okay, not really. Having to use R is a pain. I’m not a fan and the structure they use is very confusing to me as someone who uses MATLAB on a regular basis. I understand matrices, I regularly make and successfully work with higher dimensional matrices ( > 3, which hurts your brain to think about a 20+ dimensional matrix, but hey whatever gets the job done). R on the other hand feels foreign and the commands feel clunky.

Quick recap! I’m a third year PhD candidate in neuroengineering and I want to end paralysis. Okay, that probably won’t happen, but I want to significantly advance the field to enable new treatments for paralysis and restore function. My BS and MS are in mechanical engineering and I was one class shy from a minor in biology (long story). This blog is my daily journey to my PhD and this term is my final class! No, still not close to graduating, but I am finishing the last of my required courses so I can switch to research full time. Ha, jokes on me I already do research full time! This final class is brutal though, it’s a statistics course. I believe the actual name of the course is statistical methods for research or something along those lines and one of the more frustrating parts of the class is needing to use R to do the assignments. I hate R.

If you’ve never programed before, learning a new programing language is a lot like learning a new spoken language. Somethings feel familiar like organization and structure, other things feel completely foreign, like the words themselves. Sure there is overlap and sometimes my MATLAB and R commands are eerily similar, but for the most part it feels like I’m trying to fly a plane when I only know how to drive a car. There’s been a lot of googling for solutions to odd problems I’ve backed myself into so today I figure I would introduce one of my latest issues and how I worked around it.

As I mentioned I’m used to working with higher dimensional matrices. They make life easier because I can apply functions to certain dimensions, like taking the mean across a four dimensional matrix, which returns a three dimensional matrix and the first time I did it I was shocked it worked. It still hurts thinking about taking the mean across anything higher than three dimensions, I mean we live a three dimensional life, we literally cannot think in four or more dimensions. That being said, I’ve come to a someone tense understanding of how to use MATLAB to apply functions across higher dimensions.

So when my latest assignment was applying the same functions over columns of data in different two dimensional matrices, or rather data frames (that’s a variable structure in R) , my brain wanted to make a 3D matrix and then get a nice tidy output of a two dimensional matrix with the values from my operations. Simple, right? Well I tried combining my data frames and ended up getting something I had never seen before. Rstudio called it a “large list” so instead of making a 3D data frame I ended up with a list of data frames, which I guess MATLAB’s equivalent would be a cell array?

A cell array let’s me put whatever data I want in each cell, it doesn’t have to be the same dimensions, length, and it doesn’t even have to be numbers, it could be strings of words. I use them a lot to keep track of data when I save it for later because I can add a little note reminding me what data I have in each cell. The problem is I know how to work with a cell array in MATLAB, no clue how to work with a “long list” of data frames in Rstudio.

Thankfully google came to my rescue and I found the commands I needed after a whole day of hunting and finding the bits of code I needed. If you find yourself working with a list of data frames (I’m so sorry) I’ll put the commands here just for you! Because this was frustrating and I’m sure I am not the only one to back himself into a corner trying to work with R for the first time. The main command I used was:

Output <- lapply(InputList, function(x) { mutate(x, …)})

The function lapply is apparently “list apply” and let’s me create my own function across the list. It’s not as nice as what I would’ve come up with in MATLAB, but it works with some caveats. First it only takes a function, the second input (function(x)) is not where you put the function, it’s how we create a new function, so we’re defining a function called function, our input to that function is x, which is (in this case) “InputList” and for my function I am using mutate(x, …) where x is of course InputList (but we call it x in this, the lapply function will insert InputList into x for us), and the … is where I add in my commands I want. In my case I was multiplying a few columns together , mutate lets me create, modify or delete columns, so I just created a new set of columns using the mutate function that was the output from my multiplication of the columns I wanted.

For example say all my data frames in my list of data frames had the same column names and those column names were:

Year Var 1 Var2 Var3 Var4

I could use the lapply to apply different functions across all the data frames in my list, but for my case I wanted to multiply columns together and have them appended to the end of the columns such that each of my data frames in my list would look like this:

Year Var 1 Var2 Var3 Var4 NewVar1 NewVar2

And let’s say NewVar1 = Var1*Var2 and NewVar2 = Var2*Var3, well to use lapply to do that I would just say:

Output <- lapply(InputList, function(x) { mutate(x, NewVar1 = Var1*Var2

, NewVar2 = Var2*Var3)})

Then my output “Output” would be a new list of data frames with the two new variables added to the end. Of course, then I wanted to pull those out, apply a second function to them and get rid of the stupid list. I just wanted a data frame with the result, so first I had to get rid of Var1 to Var4, which I did like so:

NewOutput <- lapply(Output , “[“, -c(1:5))

Which removes the columns one through five from each data frame -c(1:5) you could remove 1:x columns by using -c(1:x) or remove any middle columns from column x to column y using -c(x:y). In my case I want to remove Year through Var4 which are the first through fifth columns, so the function above did that just fine. As before, lapply applies this across all the data frames in my list so I am left with every data frame in the list having just NewVar1 and NewVar2

Lastly, I wanted to sum my columns and then smash all the lists back into a data frame since working with a list was turning into a huge pain and I hate it so, so, very much! To do that I used lapply one more time to sum my columns, you could sum rows too, but since my columns had the data I wanted I went that route.

FinallyaDataFrame <- data.frame(lapply(NewOutput , function(x) colSums(x)))

For summing the rows, it’s literally rowSums(x), the data.frame on the outside of the lapply function takes the list and turns it back into a data frame, so row 1 was the column sums of my first data frame and row n was the sum of the columns in my nth data frame! If I had summed over the rows, the first column of my data would be data frame 1 and the nth column would be the nth data frame. This is because summing down the columns gives me a single row and summing across rows gives me a single column.

There is a sapply, which simplifies the long list back into a data frame (or maybe just a matrix?). I didn’t use it and by the time I found out it even existed I had already found the solution I used. If it isn’t broke don’t fix it! I’m most likely never using R again after this class, so I’m not trying to be super proficient, I’m just trying to get the work done as best as I can.

So there you have it, found a way out of the corner I found myself in. Is this a very niche situation? Maybe, but like I said I spent a whole day trying to find a solution to my problems and hopefully my blog is more visible than the bits and pieces of the solution I had to force together to get this to work. If this helps someone, then I am happy to have committed it to the pages of my PhD journey.

Until next time!

## But enough about us, what about you?