The mysterious data
I sometimes miss the days when the answers were in the back of the book. At this point I would take even just having answers that are semi related to the questions I’m trying to work out. Being a researcher is a double edged sword. On one side, for a brief moment in history you will know something that no other person in the world knows. On the other, how do you know that it’s correct? Questioning your results is an important part of research and right now, there’s more questions than answers.
You learn as you go to school, it’s true, but some of the things you learn aren’t exactly the things you’re taught. I was taught that you do an experiment, you get a result, you add to the body of knowledge that exists. The idea is that others try to do what you did and it verifies that you got the correct (or close to the correct) answer. So we build a consensus in the direction of the correct answer while never actually hitting it exactly.
Sure certain things we can say with overwhelming certainty, man-made climate change is real, smoking dramatically increases your risk of cancer, black liquorice is disgusting, but some things are more limited. For example, do sugar substitutes harm the body? I tend to think no, but only because there’s no clear consensus one way or the other. This is partly because of confounding factors and the effect size is probably incredibly small, which is why I don’t worry about it. After all if the effect size is that small then there are other health issues that I should be worrying about!
That was what I was taught, but what I’ve learned is to question my results. I think that’s important to talk about and we don’t do it enough. Big idea for example (here) will need a lot of checking and rechecking before we’re sure of our result. That’s why we replicate the experiment with a large number of people (for our work anywhere from 10 – 20 people, but for things like fMRI it’s even higher). We’re building a mini-consensus within our little lab, but that doesn’t preclude systematic errors, code errors, or just general processing errors. We’re all only human after all, so mistakes happen even when we don’t want them to. I’m sure there are dozens of spelling errors on my blog despite having a built in spell check in my browser for example.
So having healthy skepticism about your result is important and over the years of doing research I’ve found I would rather be on the “too skeptical” side, which compared to people like hospital-PI is still not skeptical enough, but compared to school-PI I am far too strict. It really depends on the person and their aversion to risk, I certainly wouldn’t want to put big idea out into the world only to find out that it was a big flop after the fact. That doesn’t mean “little ideas” don’t deserve the same reverence and caution, but at the same time, it’s important to acknowledge that the stakes scale with the potential outcomes.
For the past few days I’ve been working with some of the data we’ve collected for “big idea” and hospital-PI yesterday let me know that he wants to show off our result at the beginning of next week. If you missed yesterday’s post, I’m running under the assumption that there are potential sources of funding for the project involved or we wouldn’t be showing results so preliminary. Hospital-PI is even more cautious than I am with claims so I assume he wants to basically demonstrate feasibility since that’s the level we’re at right now.
For the past two days I’ve been processing the data. Then reprocessing the data. Then reprocessing the data. You get the idea. Since the recordings are incredibly valuable (see: rare and hard to get), we need to be very thorough about how we process the data and work with it. With the rise of computer power we can use super complex algorithms to process the data and remove noise. For instance, I had a ton of line noise, 60 Hz (in the US) electrical noise that gets picked up by the sensors. It’s a common issue, we see it all the time and there are dozens of ways to remove it. For this project I’ve found a better way to do it than my school lab does it, so I’m going to be passing along that info soon so they can adopt it to the processing pipeline we use.
Most of these algorithms have variables that need to be tuned to get the best result. That’s not an automated process because it can’t be, at least not that I’m aware of, so there’s a lot of data processing, then manual inspection of the data after the fact. In our lab, before you run a step in MATLAB (our preferred software) you make a copy of your data in your workspace and then you can compare the filtered data to the previous step. You end up with about a dozen copies of the same data that are at various steps of the processing pipeline you use, but it makes life easier because you can go back one step, multiple steps, or back to the beginning without having to reload data.
Normally our pre-processing pipeline is 90% fixed. We have step 1, then step 2, then step 3, etc. and those steps aren’t changing. We do it that way because hundreds (if not thousands) of people have come before us and found the best way to clean the data we’re working with. Now while the pipeline is fixed, we still have to adjust those variables, but there are somewhat tight tolerances and “best” ranges that we know. One filter for example we use in our lab I barely ever have to adjust the values for because the values to filter the artifacts it filters are the same across subjects (typically).
With this new dataset, I not only have all those variables that I need to adjust, but I also need to see what happens when I mix up the pipeline to see if I get a better result. That was a realization I had this morning when I realized I had assumed the pipeline we use is the best for this new type of data. As someone who hates making assumptions, when I realized this I decided to switch two steps and sure enough it yielded a better result. I also dropped two of the steps and that solved some issues I was having as well (sometimes filters can be unstable and actually add in noise to your data, fun fact).
This is why it pays to be skeptical. If I hadn’t been, I would’ve gone through blindly, filtered my data until it looked correct, and moved on. But after looking at the power spectral density of the data, they did not look correct. That was the first hint that I had a problem. Now the data look somewhat better. Mostly better… still weird, but better if that makes sense. It’s closer to what I would’ve expected to see which is a relief, but now I’m questioning if that data look weird because that is just how they look (as in it’s correct and I’m just seeing something new and confusing to me).
There’s still a long road ahead for this project and I won’t be publishing my results anytime soon, even if I get more data, there’s still too many checks to do and things to look at. That’s the problem with working in a new space, there are no answered questions so every question you could think of could potentially be answered. So the question becomes, which question do you start with? I have a select few I would love to start out with that I think will make for an interesting splash into the field, but there are a lot of other questions we could answer too and I think that’s part of the fun of it. While I’m still a bit skeptical about what we have, I know we have something and that’s already a really big hurdle to overcome.
One thing’s for sure though, it’s been one hell of a week and the week hasn’t even started yet!
You’re funny. I think a lot of popular foodstuffs on the bitter end of the spectrum (coffee, alcohol, green bell peppers) are disgusting, but black licorice gets the seal of approval.
Self-skepticism is something a fair number of people around the AI forums could stand to learn a little more of. My direct experience is mostly with naive new hobbyists showing up and acting like their personal theories will produce human-level intelligence in a few months or years. But even some professionals get accused of over-hyping their results, claiming they show a greater degree of “understanding” or “thinking” than they really do – or implying that they’re ready to take over some task from humans when really they don’t have the needed level of accuracy. And I tend to be on the skeptical side of these debates … I’m not sure some of the state-of-the-art algorithms are as good as they’re made out to be. Though they’re also not using my approach, so I could just be prejudiced.
I hope you figure out what your data means!
LikeLiked by 1 person
June 5, 2022 at 9:06 pm
Wait, coffee? You’re an engineer, we live on coffee and sugar! I knew the black licorice thing was going to come back to bite me. haha It’s just so bad to me! I know taste is subjective, but that is the perfect example of it.
I think I’ve run into a lot of the same issues in my field, people overhype things a lot. Thankfully a lot of the work we do in the hospital is pretty obvious if it’s working or not, but in the stuff I do in my school-PI’s lab for example is more subtle so it can be easy to overblow what you’re doing or what can be done. That’s why I’m so careful about “super secret technique” and not making claims before I show it even does anything.
Thanks! I did manage to find some things that make me feel a bit better about the outcome, but there’s still a lot of work ahead. Hospital-PI wants something by Wednesday afternoon and I’m not sure I can pull that off, but we’ll see.
LikeLiked by 2 people
June 6, 2022 at 6:12 pm
“Engineers live by coffee” is an accurate stereotype I think; I’ve been hearing it since undergrad. And I’ve been the exception since undergrad. The most caffeine I usually get is a few cups of green tea per week. I figure my sugar consumption is on the low side too.
The advantage is that if I ever want caffeine for an emergency, it really works. The few times I took a large dose (e.g. three espresso shots) I practically lit up like a rocket.
My personal theory – which may or may not be scientifically accurate, I’m not an expert – is that taking stimulants all the time doesn’t get you anywhere. Because it’s not like they can magically create more energy for you, or reduce the amount of sleep maintenance your brain needs. All they’re actually doing is rebalancing your body’s budget, forcing it to tap its reserves. When the reserves run out, your body adjusts by trying to slow down, trying to recover them … and then you have to keep taking the stimulant just to be normal. All those engineers with a caffeine habit probably don’t get any more done on an average day than I do.
For someone like you who’s pathologically tired to begin with, the calculus might be different. Maybe you need to fool your body into using more energy because it’s so resistant to using any. For me, though, I have a hard time believing there’d be any advantage in taking up coffee.
Okay, rant over 🙂
June 6, 2022 at 7:03 pm
I think you’re pretty spot on with caffeine. Last time I quit it was horrific and terrible and eventually fine. After about two weeks I felt pretty good. Sadly, my strength was far lower and I eventually had to get back on it just to do my job. I don’t know about the mental effects, they’re all over the place, but I think there’s some benefit to vasodilation, and I’m not sure if that’s something that sticks around or had a tolerance. I am one data point with a very short study.
But keep doing what you’re doing! It’s almost always true that if you don’t need a chemical in your body, you probably shouldn’t put it in there.
LikeLiked by 1 person
June 6, 2022 at 7:24 pm
Having done what could charitably be called AI work (just because it’s a neural network doesn’t make it AI, IMO) I’ve thought about this.
My little minds were about as complicated as a round worm. No round worm has ever read a newspaper, but mine didn’t have to regulate homeostasis. It was a useful black box tool, but not intelligent by any means. When I was toying with writing a story in the field, I likened it to dogs. I love dogs, and we work with them so well, a symbiotic little state where . . . I’m going off on a tangent again. But we can train them to fetch birds, sniff bombs, whatever. We interface with them and they do their thing based on black box training. Strangely, I often refer to the trained reflexes of the midbrain as my “dog brain.”
AI is fundamentally alien to us. Every single living thing on this planet comes from the same mold, and yet communication with dogs and even cats are the exception. We have no innate communication with other primates, even down to smiling carrying the exact opposite meaning.
People are so wound up inside themselves that those who can’t intuit other’s emotions and must find a set of rules of conduct, not unlike AI, are feared like a bogeyman or demon. Each of these little clusters of nerves in our brain do a tremendous amount of pre-processing before our consciousness even touches it. Vision is an amazing example, with a small area actually having detailed vision, and the rest is stitched together to create a representation literally colored by our memories of what these things were before we looked away.
We all live in a world where this preprocessing makes a generally stable baseline experience, but the colorblind, aphantasic, deaf, autistic, or psychopathic, just to name a few, have a slightly skewed baseline and it creates a different reality, one which many people do not appreciate.
So, I doubt adding local memory to an ANN will produce human level intelligence over a weekend. 😉
I adore your work because it gives a toehold into a common communication method, and you’re the first person I’ve ever seen add “screw you I want to live!” as a core concept. I mean, AI could operate at speeds far above or below our notice, and has no need to be social, eat, sleep, reproduce, or react to any stimuli. We act like viruses and fungi are these strange, unknowable things, but true AI is beyond alien. I’m convinced the first true AI won’t even be recognized as intelligence.
LikeLiked by 1 person
June 6, 2022 at 7:17 pm
You adore it? Thank you so much.
The alien element is one reason why I’m somewhat more fascinated by abstract AIs that don’t have bodies in the traditional sense. An AI operating a humanoid robot is embodied in the same environment as us, so it’s compelled to have some similar senses; its mode of “understanding” would be at least a little like ours. An AI that is just a program living in a “world” of data, not even aware of the computer tower the whole thing runs on, is comparatively novel.
And yes, human (or animal) intelligence and its peripherals are vastly complicated. The deeper I get into my work, the more this is borne in on me. I compare it to manually opening a composite flower, say a dahlia. Every time I get a petal up, I can see three more underneath it.
These comments are going all over the place. I hope Alex doesn’t mind.
June 7, 2022 at 1:47 am
I believe the term is appropriate. Ignoring the feat of implementing natural language processing, you’re attempting to teach Acuitas semantics. While the truism “people think in words” is mostly true, it’s literally true in this case, and I firmly believe our memories work in a semantic network.
I loved my early warning system, I named it Laika, but I’m under no delusions about it being a keyword counter with extra features. You’re trying something, dare I say, novel, and a tremendous amount of work into interfacing with an AI. It’s currently doing very basic analysis, and it certainly doesn’t “understand” terribly much, but ivy climbs with simple stimuli, and this is a link in a chain I don’t think people appreciate. You can’t make a microprocessor with a stone axe.
Plus it produces surprising behavior; at least I’m surprised. I always assume that’s step one. Less so with mechanical designs. 😀
LikeLiked by 1 person
June 7, 2022 at 5:16 am
Доверяй, но проверяй (Trust but Verify)
I . . . uh . . . I went on a journey. Even for me. So I retooled it and put it somewhere that wouldn’t just chew on all the scenery of your blog.
Warning: it is a journey.
Short version: be sure to check those base assumptions on the filters. As you’ve seen, the situations they’re based on may not be analogous to yours. You have a good gut feeling. Trust your gut, it’s your operational antenna.
LikeLiked by 2 people
June 5, 2022 at 11:45 pm
Did Alex nerd snipe you? 😀
I like the result though.
LikeLiked by 2 people
June 6, 2022 at 10:39 am
Haha I like it! It was well written, better than a lot of my writings for sure. I agree base assumptions are important to double check, I’m honestly probably being overly cautious, but I’m in uncharted territory and I don’t want to miss something.
LikeLiked by 1 person
June 6, 2022 at 6:17 pm
Foundational work is the best time to double check. Just yesterday I was flying solo on some work for the first time, and I ran through every single step, since it was time sensitive and failure is losing the entire project. I was all sorts of detail oriented and belt and suspenders, but I get halfway through reassembly and I realize the bolts had no release agent. So I’m supporting three things, one toxic and one messy, while I coat and reassemble the bolts without gloves on.
I just hope I won’t make the same mistake again.
June 6, 2022 at 6:59 pm
I get nerd sniped so easily. I once got stuck in traffic behind a dump truck and was fascinated by watching the air brakes work.
So yes. 😉
LikeLiked by 2 people
June 6, 2022 at 6:13 pm