Intro to MATLAB – Part 5
Well we’re doing it. We’re adding to the MATLAB course I taught. Today we’re going to dive into functions, more importantly why you may want to write your own function, when you probably don’t need to, and how to tell the difference. This of course was inspired by the stuff I had to do the other day and when I realized I should just stuff everything into a function, life became a lot less complicated. Don’t worry, functions are your friends!
For those of you just joining, I taught a MATLAB class over the summer and like all the things I teach, I want to share the knowledge. If you want a refresher, or you’re just starting, you can find the full Intro to MATLAB class in the Intro to MATLAB category. This will follow along with the style of my other classes, where I don’t explicitly show you HOW to do something, but I go in-depth into why you would want to do something and while we will cover some of the basics of how. More often than not the why is going to be more helpful, the how comes with practice and as you progress in programming the how becomes a little clearer.
So first let’s talk about what a function does. The simplest way of explaining a function is that a function is a piece of code that gets used often. Descriptive I know, but stay with me. Say I want to do the same set of things over and over again to some data, a function could be written to do that thing for you so all you need to do is call the function instead of writing the script repeatedly or putting everything in an ever growing for loop (I cover for loops here).
There are already a lot of functions out there and people write new and interesting ones all the time. In MATLAB everything we do is basically just calling another function to perform an action for us. If I want the mean of a vector for example I would do something like this:
v = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
where the >> is what MATLAB returns, but notice I used something called mean(v) well v is our data, but the mean( ) part is a function we’re calling to find the mean of the data and if we open it by using the command
we find a function that is 271 lines of code. It’s well written (as most professional functions are), a little light on the comments in my opinion, but they don’t try to cram everything into one line like some coders try to do. The point being I can use a single command mean( ) to run this function which saves me from having to write out all the code myself every time I want to take the mean value of some data.
Here’s the trick, I can take the mean of a vector, but I can also take the mean of an n-dimensional matrix, so if I have a three dimensional matrix, I can use mean(v,3) to take the mean across the third dimension. That’s where the power of functions comes into play. See I could easily write a code to compute the mean of the vector above, that’s not that hard we’re just adding and dividing by the number of elements so what the hell, I’ll do it now without any prior attempt (seriously, I’m writing the code for the first time here)
S = 0;
for i = 1:length(v)
S = S+v(i);
M = S/length(v);
A few notes, first that length itself is another function that I’m calling, technically the whole for loop is a function itself. I did it this way to avoid using sum, which is another function that could return the sum of the vector. Basically I wanted to use no functions to do this and I ended up using two. I could do sum(v)/length(v) but that’s still two functions. Frankly, that’s the point. Functions are useful things and can be incredibly powerful if written the right way. Let’s go back to the mean function for a look at what I mean (see what I did there?).
We could (and just did) write a simple function to calculate the mean of a vector. However if I were to use the above code to calculate the mean using a matrix as an input I would get the wrong answer (trust me I double checked to see how MATLAB handled the case, but feel free to give it a shot yourself). Which is part of why the function is a few hundred lines of code and not five or six. But how does the function actually do it’s thing?
Well it relies on several functions itself, but we can specify which dimension to take the mean across so I could have a five dimensional dataset and I could take the mean across the fifth dimension by writing mean(FiveD, 5) which would return a four dimensional dataset, so the second input is telling the function what I want it to do, now I could enter mean(FiveD) and it would return a four dimensional dataset, but per the comments in the function, “for N-D arrays, S is the mean value of the elements along the first array dimension whose size does not equal 1.” In short, there is a default behavior that the function follows if you don’t specify something.
This is one of the reasons I like the size function and not the length function. The length function is a simple function that returns the length of the largest dimension of a N-dimensional matrix so if I had a 3-D matrix that was twice as long in the third dimension as it was in the rows or columns I would get the length of that dimension only. If I wanted to know how many columns or rows there were, I could not get that information using the length function. The size function lets me specify what I want by entering size(FiveD, 5) which would give me the length of the fifth dimension specifically. This is important when working with data that could change dimensions while you’re doing a task so length may return different dimensions than the one you actually want as the program you are writing runs.
Now there are several ways we can write a function in MATLAB, the easiest is to fix the inputs, so for example the mean function, input 1 is your data always, input 2 is the dimension you want to take the mean across always. You can specify these things in the function using (yet another function called) nargin, which is short for n arguments in. So for the mean function nargin(1) is your data then the mean function figures out what you’re doing with the second argument based on what “type” of argument it is, because we can take the mean across all dimensions using the “all” command, but that is not a numerical input so you need to account for that in your function.
That’s the thing about writing functions, you need to anticipate user error and account for that, so usually we have a catch all statement somewhere in the code or we verify that the input types are the expected types. The function I just wrote for example accepts 3-D and 4-D matrices, but if length(size(data)) < 3 or length(size(data)) > 4 it will give you an error. size(data) will return the size of all the dimensions so the output would be something like 5 10 3, so by taking the length of that output we can verify that our matrix is the right size.
Writing a function means you can avoid rewriting code bits you will end up reusing, so writing them to be flexible is important, but you can always go back and edit it to add more functionality later (like you could do with any code you write) so you’re forced to keep a function the same way it is, in fact I’ve edited several different custom written functions that were someone else’s function that I wanted to add to or change the way things worked.
I mean we could write whole book (and people do) on how to write functions and why they are important, but the main things I want to pass along are: 1) functions save you from repeating yourself and 2) functions can be incredibly flexible tools if well written. Writing one is no different than writing a piece of code, the main difference is how you start the code, when writing a function your code starts like this
function Output = FunctionName(Input1,Input2,etc…)
Then you would save your function with the function name, in this case I picked the boring FunctionName as my function name, so I would save it like that. Input1 is what I’m calling my first input so I would refer to it in the code that way and Input2 would be my second input so I could call it that way in my code no matter what the name of the input data is. Taking my super simple, very bad mean code that I just wrote we could turn it into a function by doing this:
function Out = BadMean(data)
S = 0;
for i = 1:length(data)
S = S+data(i);
Out = S/length(data);
Then in whatever code I was writing I would call this code by writing BadMean(v) and it would return the mean of my vector v. Now in the function I have out = BadMean, out is just my output and I can name it whatever I want, the user won’t ever see that, but it is what the function will return when called so if I had multiple outputs, say BadMean gave not only the mean, but the sum (really a bad code) instead of out I would write [Out, S] = BadMean and BadMean would give both the value for Out and S as the output. This can be expanded as needed and the user can avoid getting the S output for example by just writing MeanValue = BadMean(v) or if I just wanted the sum (again, bad code) I could just get that back from my function by writing [~,SumValue] = BadMean(v) which in MATLAB the ~ skips over that output argument. Confusingly inputs can be skipped, but not by using ~, you need to use brackets  or it will give you an error.
Now there are no “checks” on my BadMean function, so I could give it a N-dimensional matrix and if N ~= 1 (does not equal in MATLAB code) then it would literally give me a bad mean value, as in the incorrect value. So be very careful when you write your own function to limit the types of input, in fact let’s limit BadMean to a vector only, to do that I could simply write something like this:
function Out = BadMean(data)
if length(size(BadMean)) > 1
error(‘You fed me bad data!!!!’)
S = 0;
for i = 1:length(data)
S = S+data(i);
Out = S/length(data);
and if I gave this code some data that was anything higher dimensional ( > 1) it would give the user an error that says “You fed me bad data!!!” which is ironically a bad error message because it doesn’t tell the user what they did wrong, but you get the idea. Return in MATLAB will break out of whatever you’re doing, so it’s useful for anything that you run, loops, code, function, etc.
Before we end, there are a few different ways that you can set your inputs in a function. My favorite way to do it is by letting the user explicitly state what the input is from a list of different acceptable inputs. That looks something like this (not my code, but a good example)
The first column is a list of inputs the user can specify, the second is what the type of input should be, the third gives acceptable values for inputs (if applicable, which is why most are ) and the last column is the default value if the user does not specify anything. So if this was part of my BadMean code, I could specify frequency (I don’t know what I am doing using frequency in a mean code, but work with me here) by writing:
BadMean(data, ‘freq’, 1:100)
Then my code would automatically assign the numbers 1-100 (1:100 returns a vector of numbers from 1 to 100) to my freq variable then I can work with it in the code. I included a couple extra lines in the screenshot to show that there is another function finputcheck, that checks to make sure the inputs are the correct “type” (which is listed in the second column) then it assigns all the inputs to a variable g. if there is an error, it give returns the error and error(g) returns the reason for the error which is typically the input doesn’t match the expected type.
In my code g is a structure full of all the inputs the user specified (or my defaults), so I would call frequency by g.freq and not freq since that variable is stored inside this structure g. This is useful when you’re debugging because you can group all the variables together and with a single line of code you can verify that you have the correct types of inputs. It even lets the user specify certain strings as on, off, etc. which is helpful because it’s much more intuitive than having to specify a number to turn something on or off (by using a 1 or 0 for example).
Okay so that covers the basics of what a function does, why you may want to write one, and I even when (probably way too much) in-depth into how to have inputs and outputs for the function. The stuff in the middle is just the things you want the function to do and while it may be a struggle at first, once you get better at writing code the middle stuff gets easier. Unfortunately, that comes with practice and time. It’s not super helpful to teach via me giving examples, but I will probably add a few more posts to this series to discuss how to think about structuring code… eventually.
In the meantime I hope this was a helpful intro into the wonderful world of functions.