A dissertation dilemma
Okay, since I’m still waiting on my new and fancy toys to arrive for my dissertation I’ve been debating about going through the whole thing once again using the equipment I already have access to. Since there’s a limited amount of time I may just have to power through and go ahead with data collection using the old equipment, but it’s still anyone’s guess if I’ll do that because frankly I’m not sure it will be worth it. Since it’s up in the air, I figure I can write out my thought process and hopefully figure it out one way or the other.
I really want to graduate. I mean I love school, don’t get me wrong, but I’m ready to be finished with my PhD. It’s been a struggle and with the whole working full time thing now, it’s just a lot to do all at once. The average time to complete a PhD is roughly five years and that is exactly what I want to do. I want to be finished, I’m ready, really ready. All I need to do is finish my dissertation, easier said than done though and that’s the problem!
I collected my first dataset just a few weeks ago and I’ve barely touched it. It’s a mess and as I explained previously (here), the two streams of data I have aren’t exactly two continuous streams like we would ideally have. Instead I have one stream of data that is continuous (thankfully!) and a second stream that’s broken into roughly 20 files. Each of those files has at least a beginning marker that aligns it to the first stream of data so I should (in theory anyway), be able to pop that data into the correct place and be done with it. Theoretically, I still haven’t tried because it’s a lot of work and I figure if I ignore the problem long enough it will go away or I’ll stop being so lazy and I’ll just do it, whichever comes first.
Now my plan was to have all my data collected at this point. Or rather most of my data collected at this point, part one of two really. I have a single dataset which puts me at about 10% of the data I need (assuming I even keep it). Until I get my equipment however, I’m stuck finding time to work around another labs schedule to borrow the equipment I need. Equipment that is old, semi-broken, and doesn’t sync with my preferred software for this type of work!
The options are I either wait for my new equipment to show up and that way I can do all the work inhouse. Things will go much quicker and smoother when this happens, but it’s still a little bit off, maybe a lot of bit off depending on how quickly the company can send us the equipment. The other option is to attempt to use the equipment from the second lab (schedule permitting since mine is already packed with work related stuff). The problem is that I don’t know why the equipment kept freezing and/or stopping, so there is no promise that it will work differently if I reattempt it.
On one hand, the sooner I get the data the better. On the other, having good data to work with makes my job easier. The problem is figuring out which is the faster option of the two. Do I wait and have a smooth(er) data collection process, or do I just go for it and deal with the messy data?
If I wait for the new equipment things should be better. I SHOULD, in theory anyway, be able to collect the data the way I want without issue making the processing and analysis much simpler. On both the backend and frontend of this, I will save time and energy. Both of those are great things, but it means I need to just get all the data as quickly as possible so I can meet my proposed graduation time.
Now if I collect some data (probably just a couple experiments), I will most likely run into the same issues if I don’t have someone helping, which I probably won’t and even if I do there’s no guarantee of a success. So while this means I won’t have that big rush that I would have if I wait, I pay for it in the long run because now I am dealing with non-continuous data. Meaning even in the best case scenario I have dozens of files that represent a single experiment. The worst case is that I lose several different trials and the experiment runs far longer than I planned.
Now to be honest, after writing this all out I really think waiting may be the better option. Not only will that make the processing job easier, it will also make the collection smoother, meaning people won’t be waiting forever for me to finish troubleshooting mid-experiment. It also means I will have complete datasets for certain (or mostly certain, I guess I could run into problems no matter what). So maybe I’ll just wait and reach out to the company to see if we can speed up getting the equipment, or maybe they could give me a better idea of when the equipment should be expected.
Yeah, I think that’s the best option here. Good talk everyone.
Glad to help! 😉 I usually end up explaining a process to a cat or a rubber duck or something. The Internet works just as well, apparently.
From a risk analysis side, I’d create little plans for each task. What does each entail and how long will it take? From there prerequisites, parallelization options, and failure rates. From there planning (and comparison of plans) is just a Gantt chart away! 😀
But in this case, I agree with you off the cuff. If you get poor data or it takes you longer to process, it’s probably a bad idea. Further, if you’re getting data through two different methods, you need to account for that in the analysis. Waiting is likely the best choice.
Of course, you can make spreadsheets, models, procedure sheets, etc. all while you’re waiting, so either you use less time later or you increase efficiency in the future. I don’t know your project well enough to provide specifics, but I’ve always, always loved dry runs. Saved me plenty of times showing up for meetings when I’d do the drive the day before and make sure I knew exactly where the room was.
LikeLiked by 2 people
May 31, 2022 at 4:07 pm
Haha I talk to my cats a lot like that. Writing it out helps a lot for me though, which is partly why I took on the blogging project I made.
Yeah, two different data collection methods would make twice the work when it comes time to write everything down. Plus, if the data turn out to be garbage, then I have basically dealt with all the headaches for nothing!
Going through some dry runs and planning a bit more would be a good use of time, I agree. Plus it would probably save me time on the other end when I’m writing my dissertation to have a good map of everything I’ve done, even if it’s not “formal” enough for the dissertation, it would be useful when I’m writing.
June 1, 2022 at 2:53 pm
For what it’s worth, I think I agree with this decision too. Since you don’t know for sure that the data is even usable yet, doing further experiments under the same conditions is a large risk. If you had to repeat them, you might even be unable to persuade one of your subjects to return for a second round. (I’m not sure how much of a problem this would be, since I assume all your near-term data collection is from the group with the larger participant pool, but …)
And Michael has a good point, I wasn’t even thinking about the effect on your writeup. At minimum, it’ll be more effort to document the two different kinds of data set and analysis process. That kind of thing can also make readers ask awkward questions.
And hopefully less total work (even if it gets spread out over a longer time) is better for your psyche. Even if you can’t do other prep while waiting, at least you can rest, and that is also worth something.
LikeLiked by 1 person
May 31, 2022 at 6:48 pm
Thank you, it makes me feel better about my choice since you both agree. I think it will be better for me to get a bit of a break from everything. I tend to prefer to do a sprint approach to data collection, so all in all, this wasn’t a bad outcome.
LikeLiked by 1 person
June 1, 2022 at 2:49 pm