Data visualization dilemma
Data visualization is an important topic to think about. How do you best convey what the data are telling you? It’s something I struggle with because I take it so seriously. Most things can be done simply, the old standby the line graph, box plot, or even scatter plot all work well enough, but more often than not, you want to tell a story and sometimes the obvious plot isn’t the best choice.
Sure a line graph is easy to read, but what happens when you start stacking data? For example two lines on a line graph probably isn’t too hard to follow, but what about three, four five, twenty? After enough data even the best looking line graph will start to look like a jumbled mess. So picking the best chart type to display your data is a lot like picking a new home or a car. You have a lot of factors to consider and even after committing, you may find issues you didn’t think about.
I love to walk, so for me a home that provides walkability is important to me. Sometimes the things we really want are obvious like that. Same thing with data, sometimes you know that a scatter plot will do the job just fine, but there are other bigger picture things to consider. I like walking, but I hate walking in the rain. My data may look good as a scatter plot, but maybe you want to compare multiple data series and a scatter plot may not do the trick.
Basically the point I’m trying to make is that we can all sit down, casually glance at some data, and figure out a decent way to plot it, but when telling a story sometimes you need more than decent. When working with the type of data I deal with, there are a lot of different factors that could be considered when finding the perfect plot style.
Most often than not I’m comparing several different conditions. The simplest choice for me is almost always going to be the box plot. It’s quick, it displays the data we want to show, and they are very easy to read. It’s a solid choice and sometimes simple is going to be the way to do it. However, my last paper I opted for what’s called a chord diagram (my paper). Those plots made it easier to compare the average across a lot of different variables. We included the box plots to go with it, but we had I believe six times more plots in the supplementary to show off the same data and the box plots (in my opinion anyway) were not easy to compare across conditions.
Once again I’m at a crossroads trying to figure out the best way to display some data. On one hand I could do something fancy, I personally like fancy or unexpected. The tradeoff is reading the “fancy” plot isn’t always as straightforward. I knew that when I committed to the chord diagrams in that last paper and the feedback from the reviewers confirmed the suspicion that without knowing how to read them, they can be hard to follow.
On the other hand, using that last paper as an example, the chord diagrams not only helped me tell the story I wanted to get across, but they did so in a way that made sense. Once you understand how to read them, they are very informative and they give the reader the chance to compare between and across conditions without having to go back and forth, all the data are right there waiting to be understood. Once hospital-PI got over the initial fear of them, he really liked the style and my approach to displaying the data.
Last paper (here) is currently in review, but I also took the chance on doing some unique data visualizations there as well. Hopefully that will be published soon because I’m really excited to showcase those. They were a lot of work, but I think they came out particularly cool.
My main point here is simply that picking a style is hard. Sometimes you need to go back and forth a lot before you settle on one. That’s frankly the difference between buying a house or a car and creating a plot. You only invest time into making the plot so as long as you have more time, you can try different styles of plot.
Unfortunately, in my latest case there is not a lot of time. Instead I need to really look at some examples of plots and figure out how to turn one of those into exactly what I’m looking for, basically on the first attempt. Now that doesn’t mean that I won’t be able to change it later, but I need to be confident going into the decision.
For me, seeing example plots work best and I have just the place to browse for different styles of plot. More often than not I make my plots in MATLAB, it’s just easier for me as I’ve learned how to work in that language best. I do, and have, on occasion used R for plot creation (via RStudio, which is surprisingly easy to use). No matter what software I use though, I like to visit The R Graph Gallery to get ideas (here). They list just about every plot you can think of. It may not be exhaustive, but it’s pretty close and covers a lot of the bases. Since I can’t give any particular tips because this is so person/situation/data dependant, I think the website is probably the best way to go.
I particularly like that they give example code and variations of plots to help get you started. Sometimes you need examples of the things you can change when you’re looking for a good plot. Some things are easier to modify than others, but also changing one thing can make the plot look completely different. When I was making my chord diagrams, the website convinced me to give Rstudio a shot to make them instead of using MATLAB and I think it was a good choice in the end even if it was my first time making a chord diagram and I’m not as fluent in R.
This time around I’m not sure what software I’ll use, but first thing first, I need to select a type of plot…