Drug Discovery Does Not Have an Intelligence Problem

By Blaise AI Team

The easiest way to misunderstand AI in drug discovery is to import intuitions from software.

In software, generation is impressive because verification is cheap. A coding model writes code, the tests run, the compiler complains, the linter shouts, the program crashes, the model tries again. The loop is fast enough that better generation turns into better outcomes almost immediately.

Small-molecule discovery does not work like that.

You can generate molecules all day. You can score them, cluster them, dock them, decorate them, and ask a language model to explain why they are clever. None of that tells you whether the idea survives contact with synthesis, assay noise, ADME, tolerability, or the rest of organism-level biology.

That is not an intelligence bottleneck.

It is a verification bottleneck.

Generation is already cheap enough to outrun the bench

This is the strange thing about a lot of current AI discussion in the field. People still talk as though the scarce resource is molecular imagination.

It is not.

There is no shortage of molecules. There is no shortage of plausible design suggestions. There is no shortage of candidate structures that can be made to look attractive inside a scoring system.

The scarce resource is finding out which of those suggestions matter before the project burns a month on the wrong chemistry.

That scarcity shows up everywhere. A route takes longer than expected. A purification becomes a mini-project. A compound clears beautifully in a model and then falls apart in a real assay system. A tidy potency prediction arrives just in time to be irrelevant because the team already learned the same lesson experimentally.

The problem is not that the model could not think of enough ideas.

The problem is that the world is expensive to ask.

Verification cost changes what useful AI should optimise for

Once you see the field through that lens, a lot of standard modelling behaviour starts to look misaligned.

If verification is slow, expensive, and capacity-constrained, then the useful model is not simply the one that proposes the most interesting structures. It is the one that helps spend verification budget where it buys the most learning.

That means tractable chemistry matters. Clear hypotheses matter. Assay ordering matters. Calibration matters. It matters whether the next compound will settle an uncertainty or merely satisfy a scoring function. It matters whether the idea can be falsified in a week instead of admired for a month.

A lot of molecular ML still acts as though better generation is the main event and verification is a downstream nuisance.

That gets the economics backwards.

In this field, easy-to-make compounds are partly a verification strategy. Cheap assays are partly a verification strategy. Human chemist objections are partly a verification strategy. Internal project memory is partly a verification strategy. All of them are ways of learning faster inside a world where truth is expensive.

This is why software analogies break down so fast

People love the comparison because it flatters the technology. AI wrote code. AI passed tests. Therefore AI will design drugs.

But code has unusually friendly verification infrastructure. The feedback loop is fast. The environment is instrumented. Failure is cheap and local. The model can learn from rejection in minutes.

Drug discovery sits at the opposite end of that spectrum.

Verification is delayed, partial, noisy, and often ambiguous. A compound can fail because the biology is wrong, because the route was too slow, because the assay was misleading, because the wrong endpoint was measured, because local project context changed, or because the model solved yesterday’s problem after the team had already moved on.

The field does not need one more grand claim that a bigger foundation model will think its way around that.

It needs systems designed for the fact that reality pushes back slowly and expensively.

Verification should shape the whole stack

If verification is the bottleneck, then we should build around it directly.

The best next molecule is often the one that verifies a hypothesis cheaply. The best assay is often the one that reduces uncertainty before expensive work begins. The best model update is often the one triggered by messy local feedback, not a pristine benchmark label. The best compound suggestion may be the one that saves three bad syntheses by revealing that a direction is dead.

This also changes how we should think about model architecture. A system that learns quickly from sparse local data, retrieves relevant precedents, understands synthesis burden, and expresses calibrated uncertainty may be far more valuable than a system with broader prior knowledge but weaker ties to the verification loop.

That is not a lesser ambition.

It is a more serious one.

The field still spends too much time flattering intelligence

There is a certain glamour to talking about chemistry models as if the main question is whether they are smart enough.

That frame is comfortable because it makes progress sound like a scale problem. More pretraining, better representation, richer modality fusion, stronger reasoning.

Some of that will help. None of it removes the basic fact that molecules are judged by experiments that are slow, expensive, and bottlenecked by physical work.

If you ignore that, you end up rewarding systems that look impressive at idea generation while doing little to improve project throughput.

The strongest AI system in this space may not be the one that appears most intelligent in the abstract. It may be the one that is most ruthless about where to spend scarce truth.

The hard part is not getting answers. It is getting the right questions tested

Drug discovery does not mainly lack hypotheses.

It lacks cheap, fast, decisive ways to kill or validate them.

That is why verification sits at the center of the problem. Better models matter because they can shape what gets verified, in what order, at what cost, under what uncertainty. They matter when they shorten the path from suggestion to credible decision.

The benchmark should not be how many attractive molecules the system can emit before lunch.

It should be how much expensive reality the system helps the team avoid wasting.

That is the real bottleneck. And it is the reason small-molecule AI will be won by systems that understand verification, not just systems that imitate intelligence.