A Good Model and a Rubber-Stamp Produce the Same Override Rate

Ola Kolade·May 19, 2026

At the end of the quarter, a compliance team at a consumer lending company pulls its workflow metrics. The credit AI flagged 4,200 loan applications; underwriters upheld 4,074 of them. That's a 3% override rate. To the team, the number reads as reassurance: the system works, and the underwriters are engaged enough to push back when they need to. A regulator can look at the identical figure and see the reverse. The same 97% agreement, sustained across thousands of credit decisions, is also what a stack of unread applications produces.

Both readings fit the data equally well.

California's human involvement standard asks the reviewer to understand the model's output, weigh it against other information about the applicant, and keep the authority to decide differently. A 3% override rate breaks none of those requirements on its own. An underwriter really can read the income documentation, run the debt-to-income ratio against current criteria, agree with a well-built model on all but a handful of files, and be doing the job as intended. The catch is that an underwriter who reads nothing and confirms everything posts the same 3%. The rule was written to catch that second underwriter, the one rubber-stamping at scale, and the override rate can't pick them out of the lineup.

That would be a small problem if the override rate were a small metric. It isn't. Together with dwell time, it's one of the few numbers a lending compliance team can point to when it wants to argue its reviews are substantive. The weakness sits right under the number a team leans on most, and no dashboard change fixes it.

It also drifts the wrong way as the model gets better. Every improvement on the modeling side, better training data, tighter calibration, fewer false positives, nudges reviewers toward agreement and pushes the override rate down. A lender that does everything right on the model watches its compliance signal erode as a direct consequence. There's no sweet spot to aim for, no rate low enough to prove the model is good and high enough to prove the humans are independent. Model accuracy isn't a compliance defense, and a number that falls every time the model sharpens was never tracking human involvement to begin with.

Some teams try to get underneath the frequency of disagreement by measuring its depth instead: not how often reviewers override, but how much they examined before agreeing. The instinct is correct. An underwriter who pulls bank statements the model never ingested, weighs them, records a reason, and still upholds the denial is doing something real even when the result matches the recommendation. But that kind of work happens in places the system doesn't record. The review screen doesn't log which documents were opened; nothing captures whether the officer studied the employment history or just saw the score and moved on. Write isolation and controls like it can prove a human touched the file, but not that the human decided anything.

Section 7001(e)(1) isn't looking for a number in the first place. It sets no acceptable override rate and grants no safe harbor at 5% or 10% or any other line. It defines a standard for the quality of human involvement and treats the override rate as a side effect of that quality. An examiner reading the file isn't checking whether the rate lands in an approved band. The question is whether the rate, read alongside timing, training records, and written rationale, adds up to a person who actually made the call.

So a lender that leans on the override rate as its headline compliance metric is standing on evidence that can indict it about as readily as defend it. Picture two of them, both reporting 3%. The first also has median review times north of ninety seconds, training records covering the model's limitations, written rationales on a sample of files, and a handful of overrides with reasons attached. The second has eight-second reviews and nothing around them. The dashboards match. Everything the dashboards leave out is what separates the two.

The rate still earns its place, just not in the direction teams want it to. A true zero across thousands of decisions is hard to explain innocently, so an unusually low rate is worth a second look. A low-single-digit rate, though, clears a company of nothing; it just pushes the question down a level: what the reviewer saw, how long they spent, what they checked, and whether any of it can be reconstructed now. Most lenders never built that layer, having assumed the override rate was it.

Proof of Review is meant to be that layer: not a record of whether the reviewer agreed with the model, but of what they examined before they did. The Prenuvo case is what its absence looks like once a claim arrives.

All posts