Intro to MATLAB – Part 3
To debug or not to debug. Just kidding you’re always going to need to debug. My class has officially ended, so this weekend we’re posting the last two parts to the four part Intro to MATLAB series. This lecture I taught my class how to use the debugger in MATLAB to solve any sort of problem they may run into and how to make sense of any issues they had. Unfortunately this means there is no code associated with this class, but we can still go into detail. The best part about being able to debug is that it makes you look like a coding god, so it’s a skill worth learning.
I’m honestly surprised working the debugger isn’t something normally taught in a programming class. At least in my experience and the experience of the students I worked with, none of them had learned how to debug in any of the programming classes they took and all of them had taken at least one class. Debugging code is easy once you learn how to do it, so easy in fact we covered how to do this in just a few hours and by the end the students were able to do it on their own.
The example I gave in class was using a bit of code I had some trouble with myself. It was a good example because I did not write the function I was trying to use, the documentation was frustratingly unclear (sorry if the person who wrote this code sees this, but it’s true), and I couldn’t for the life of me figure it out. The function would throw out error after error until I figured out what the problem was.
The story goes something like this, raincloud plots are a new(er) type of plot that shows the raw data in scatter plot form and a half violin plot next to it. The end result is a violin “cloud” and raw data point “rain drops.” Below is an example using some example data (not mine, this was an example from the creator of the function I was demoing.
To me, this is an ugly plot, we all have our aesthetic preferences and in our lab we have very specific ways we like our data, this is not it. I’ve since heavily edited the code they created to make this plot to allow alternative inputs for sizes of the dots, the mean line, stuff like that. Editing the code was the second part of this class (you’ll see why shortly).
The first part was to figure out how to input data into this function. The hint given in the comments for how the data was to be fed into the function was this (taken directly from the code):
Where ‘data’ is an M x N cell array of N data series and M measurements
And ‘colours’ is an N x 3 array defining the colour to plot each data series
For anyone asking this is British code, hence the British spelling of color. Anyway long story short I knew that my data needed to be in a cell array, but the series and measurements were confusing. I couldn’t tell if each data point needed to be one cell and I couldn’t tell which way the cells should be arranged. From the description I correctly guessed that the rows were measurements and the columns were data “series” which was a new term for me, but it turns out that is your number of subjects.
Yet every single time I tried to run this code it returned the same frustrating error.
RaincloudPlotter is the name of the code I wrote to format my data (which was in the humble matrix form) into the cell array needed to run this function. It works now, but for a few hours I was furiously banging my head against my keyboard trying to figure out why this wasn’t working. Since I couldn’t just wing it, I had to go into the debugging process to figure it out. First let’s talk about where my search took me and what these error codes mean.
When MATLAB kicks out an error like this the top error is actually the last error it encountered before it quit running. Everything that ran prior to that gets listed in the subsequent error messages. This means that if I write some code that calls a function, then that function returns an error, MATLAB will give me the error from that function first, then the second error is the code that called that function. This can get strung along so you may get 10 errors, the top is the last and the bottom is going to be whatever you ran to call the function that gave you an error. The stuff in between is the “hidden” steps it took for that function to be called.
Since in my case we had two errors this was as simple as it gets. The first error (top) is the issue with the raincloud code itself, the second error is just the line in my code calling this function. There is a format to how MATLAB returns an error that is important to understand, it will say something like the example above which I’m writing out here for anyone who needs a screen reader:
Error using rm_raincloud (line 25)
number of colours does not match number of plot series
This tells me that the function rm_raincloud (the function I downloaded) could not run because of something on line 25. The code also returned the second line “number of colours does not match number of plot series” which is a custom error message the people who wrote the code included to help me figure out why I had an error. The issue is that I have no clue why that error existed! In hindsight the issue is clear, but at the time I had no idea what I was doing wrong. The short answer was I fed my data into the function in the wrong direction since that’s how I normally work with data.
Anyway, back to our class discussion. The handy thing about MATLAB is that if I have a function in a code that I’m calling (and MATLAB can find that function to run it) if I right click on the name in the code, I can open that function directly without having to search for it like so:
To get the pop up all you do is right click the function and select the open “rm_raincloud” or whatever the name of the function is that you’re trying to debug. According to MATLAB the quick key for this is Ctrl+D, but I just right click and open since it seems to be faster/easier/less work. Once we have the rm_raincloud function open we simply scroll down to the line our error was on, which was line 25.
Notice that line 25 will return that custom error message “number of colours does not match number of plot series” if size(colours) does not equal [n_series 3]. For those who don’t recall I talked extensively on how we can get the size of the dimensions of our variables using either the length or size functions (lecture 2) and why length is not your friend. It’s almost like I planned this class…
So size will return both dimensions of our 2D matrix and we are comparing all dimensions, hence the all(size(…. in the code of our colours variable with this mysterious n_series variable and the second dimension of length 3. I know already that our colours variable is of length n x 3 so this mysterious 3 isn’t so mysterious, they are just making sure that the colours are the correct length by hard coding the 3, which is smart since I probably would’ve just checked the number of rows and not the number of columns (remember its rows x columns so the 3 means 3 columns) and we could have a user try to input something other than a 3 column color input.
From this information alone I know that something is wrong between the number of colours I’m giving the code and this n_series number, but I’m not sure what the code “sees” to give me this error since at the time I thought everything I was inputting was the right size. Nothing seemed to fix it so I had to go to step 2 of the debug process.
When we run a function a new workspace is created. Remember our workspace has all the variables we’re using. If I run a function and the variables that function creates get left in my workspace it would turn into a huge mess really fast. So MATLAB creates (then destroys) a workspace specifically for the function. Thus, the n_series variable only exists when I run this function. At this point you may think all hope is lost and there is no way to debug this, but fear not! We can actually break into this workspace by pausing the code. There are a few ways we can do this, but this is my method.
First I select the dropdown menu from under the run button in the editor window (where you write your code). Then I turn on pause on errors. That step is optional because we’re going to do something else that should make this redundant, but I do it because sometimes the next step won’t go as planned.
The second step is to add a stop sign to our code. Okay, this isn’t actually called a stop sign, but it’s a red dot that stops our code and thus I associate it with a stop sign. In my rm_raincloud function I am going to add a pause to the code before we get to that error check (line 25). To do that I just click the – on the left side of the screen where the line numbers are listed and it adds a red dot (shown below). You can only do this on lines that have actual code written on them because a comment won’t be read by the program so you would never be able to pause there by definition. You’ll notice that between lines 22 and 25 there is no code, just comments, so there are no – to add a stop (technically it’s a pause, but work with my brain).
Now we have a fancy little red dot on line 22 that tells MATLAB to pause once it reaches that point. This will work most of the time, but on occasion you’ll add a dot in the wrong place (inside an if statement maybe) that won’t run or gets skipped over, whatever the reason. That’s why we also select pause on error, to catch that or because I can “step” my code line by line when it’s paused I may accidentally run the error line and this will prevent me from being kicked out of the function workspace (mostly).
So first, let’s take a look at my workspace now to remind everyone what that looks like and to show you I have a lot of “stuff” in my workspace at the moment.
This is my “base” workspace or the bottom workspace (the one I work out of). It will exist for as long as I have MATLAB open and it’s the one we see all the time. I have several variables inside this workspace and notice at the top it’s simply called workspace. I point this out because if I run the code I have now (with my pause) the workspace will change and become this:
Notice the name of our workspace changed to Workspace – rm_raincloud. That’s because we’re now in our rm_raincloud workspace and not our base workspace. Also notice that the only variables that exist are the variable the function takes as inputs or variables it has created.
Here is what the function looks like, we have a red dot and then a green arrow. I “stepped” it to the next line since the pause will stop at that line (as in it won’t run the line you’re on) and I wanted to show n_series. I gave this function just data, colours, plot_top_to_bottom inputs, and I did not give it the density_type or bandwidth options, so in my workspace in the image above this you see only those variables. Since I stepped my code to the next line we also have a n_plots_per_series variable and a n_series variable.
Before we continue though, I should explain how to “step” your code. I use the term step because that’s the name of the button you press. In MATLAB when you run code and have a pause the code window will give you the debugger options as seen below. This is the same window that has my code (technically MATLAB opened my function and this is that code window). Several of the buttons have changed, but it looks pretty similar to the buttons shown on the step where we selected pause on errors 5 images above.
Continue will run the code until the error occurs OR until the code finishes successfully. The step button is next to that and there is a “step in”, “step out” and “run to cursor” buttons next to step. Step in will break into another function the function or code you’re in calls, step out will run it and go back to the original function, run to cursor will do just that and is useful if you want to run to a certain point, but I hardly use those options, step and continue are my main buttons.
Another note, you can add more pauses, you could pause at every line if you wanted (which would be annoying but however you want to spend a slow weekend, I’m not here to judge). If I hit continue it will run until the next pause. So often times instead of run to cursor I will simply add a pause further in my code and run to it instead. I don’t have specific examples of when you would do this, but you’ll know when you’re debugging because it will make sense to do that.
Last thing I want to point out about this before we continue. ABANDON ALL HOPE YE WHO ENTER HERE! You can, if you felt so inclined, edit the variables in the function workspace. There are two catches to doing this, the first is that it will not change the base workspace variable here (if it exists in the base workspace) because the variable listed here got copied to your function workspace, so even if the name is the same the variable is different. The second is that doing so could cause serious issues with whatever you’re working on and if you find that editing a variable fixes your issue it’s hard to implement the fix after the workspace is destroyed. I’ve only come across one instance where this helped me, just one, and I don’t recommend doing it. In fact, I’m only telling you this to warn you never to do it. It is a horrible idea and will only end in ruin. So figure out why your code isn’t working instead and change the function/input/base variable/etc. instead of the function workspace. You can (and should) edit code in the debugger and it will keep it, the function workspace is off limits though. BE WARNED!!!!!!!!!!!
Okay now that we’ve thoroughly covered the pitfalls let’s get back to our issue at hand. I need this line to be true:
assert(all(size(colours) == [n_series 3])
Now that I’m in my function workspace I can check that this hold true, or at least see what is going on. since I can see in the function workspace my n_series variable equals 1 (from the function workspace image above).
Now I see the problem, I have 2 colors and only one series so in this case my colours are either too long or I am not correctly organizing my data and my n_series is wrong. Like I mentioned before using this information I correctly deduced that the way I was organizing my data was not correct. Each cell should contain the number of samples for that subject and condition (row x column and technically dependent and independent variable). Now that I had that information I could (and did) fix my code!
Fun story, when I taught this class I thought I was being smart by showing how parts of it ran before I ran the full code together, well then it threw out several errors and I had to go through and debug my code on the fly while I had a full class of students watching me. After a mild panic attack I went through and debugged the code using the steps I’ve outlined above. It was a nice reminder that no matter how much preplanning you do things can go wrong. It was also good for the students because they got to see me debug in real-time and they got to see my thought process as I solved the issue.
Tomorrow we will dive into the final topic for my class! Unfortunately part of tomorrows class won’t work exactly online like this because I gave out some EEG data for them to practice with, but don’t worry we’re going to get into how to modify code and some best practices for doing it. I can’t hand out data to work with, but you can create your own example data and I’ll even give some examples on how to do that.
If you want to play around with the raincloud plot function, you can download it here. They have a handy tutorial, but it didn’t work for my case because the examples they gave were sufficiently different enough that I couldn’t translate what I was doing to what they were doing.
Until next time, don’t stop learning!