Most ADME ML Models Are Confidently Wrong

By Blaise AI Team

Most ADME ML models are confidently wrong more often than people admit.

The failure mode is predictable. You train on a big “global” dataset, report flattering offline metrics, and deploy on molecules that sit outside the model’s applicability domain. The result is point predictions delivered with the emotional confidence of a weather app.

Then the program hits reality: distribution shift is the rule, not the exception.

The hidden tax: false certainty erodes trust

Wet-lab teams don’t hate models because they’re imperfect. They hate models because they sound certain when they’re guessing.

In drug discovery, trust is cumulative. Every overconfident miss burns political capital. Chemists stop listening, project teams stop budgeting time for modeling, and the model becomes a checkbox instead of a decision tool.

Why “global ADME models” break in practice

Small-molecule datasets are sparse, biased, and scaffold-skewed.

A solubility or clearance model trained on legacy chemotypes does not generalize to a novel scaffold just because you used a GNN and a million datapoints.

The real question is always: How similar is this molecule to what the model actually learned?

What Blaise does differently: uncertainty as a first-class output

At Blaise AI, we surface the context that actually drives decisions in the room: what data a model was trained on, how similar your molecule is to that data, and whether a prediction is actionable. Sometimes the right next step isn’t more modeling—it’s wet-lab work.

This is the difference between “an ML model exists” and “a model is usable in a live program.”

Why this matters even more for agentic design

We’re building a Small Molecule Agent: something that proposes designs, weighs evidence, and recommends actions.

An agent cannot be fed overconfident signals. If the property model lies with confidence, the agent optimizes the wrong objective. It converges on brittle chemotypes and recommends work that looks rational on paper but fails at the bench.

An agent needs calibrated opinions. It needs to distinguish “this is informative” from “this is guessing.”

The product stance: no incentive to oversell point predictions

Blaise doesn’t exist to impress users with a single predicted number. We exist to reason correctly about molecules.

That means down-weighting weak models, flagging uncertainty, and saying “we don’t know yet” before the bench forces the issue.

AI helps drug discovery only when it can admit uncertainty faster than the lab can expose it.