How did Arizona get so close to upsetting Arnon Mishkin's “t-stat of 4” statement defending his decision team's call?
On November 3rd, 2020, Arnon Mishkin, a statistician and director of the Fox News decision team, called Arizona for Biden with ~25% of votes still uncounted. When called upon to defend his decision, he explained on air to the Fox News desk that Trump would not be able to recover Biden's 7-point lead. He says,
It has been in the category that we call 'knowable but not callable' for about an hour. Um, we finally called it right now, ... [...]
and a little later adds,
This is a call that's sitting in our statistical models at sort of, at a t-stat of 4 or more in all the different ways [...] That means we're 4 standard deviations from being wrong [...]
I forgot about it since, but coming back to check out Arizona's results, I was pretty surprised to see what a comeback Trump did make (courtesy of New York Times' election tracker):
So I'm not sure if...
I'm misinterpreting the statement or statistics, i.e. it's not surprising at all, even mostly expected that this is how it would go, or
some unprecedented factor came into view regarding the last ~25% of uncounted votes (at the time of the call), or
this truly was a freak occurrence: the 0.01% tail of universes actually happened during one of the most controversial elections of modern day, and Trump regained nearly all of Biden's 7-point lead, which again Mishkin was not only so confident as to call it for Arizona with ~25% of votes still uncounted, but before any other network and on Fox at that.
I'm not at all disputing the election results by the way. It's because I am following experts that I want to know what I'm not getting.
This is an edited question, so post-emptively responding to the "doesn't matter, it's within range" line of argument: Yes, I do understand that Mishkin wasn't saying it was impossible. But if I were on the test flight of a wholly new aircraft and it nearly crashed, but the pilot tells me it was "within range" of their models, I'd still ask, "Hey um, guys? Is there something you wanna tell me?" Because at that point you have to wonder, what's more likely? On the virgin voyage of this totally new thing, you just had a stroke of bad luck—about 4 standard deviations off-center? Or that something unexpected happened? (I'm appealing to a Bayesian mode of thought here.) (Also, in an interview the next day, Mishkin tells a story of how they came up with a brand new model using phone calls and Internet surveys, particular targeting early voters, to try to get a better estimation regarding mail-in ballots.)