Scaffold Splits Are Not the Gold Standard
Scaffold splits test one kind of claim. Real project value comes from rapid local adaptation, not zero-shot heroism on unseen chemistry.
LLMs have flooded the world with hypotheses.
That is not the hard part anymore.
Coding models work because they live inside fast verification loops. They can write code, run tests, get rejected, and try again in seconds.
Drug discovery does not have that luxury.
The real verifier is still biology. In vitro first. In vivo ultimately. Slow, expensive, capacity-limited.
So the challenge for a real small molecule agent is not just generation. It is finding intermediate sources of truth that let you check whether its judgment is actually any good.
Patent data stands out here.
Not as a substitute for prospective work. As a verifier of medicinal chemistry reasoning.
Patents are full of real series, real assay data, real trade-offs, and real project decisions. They are also badly structured and hard to extract, which means many foundation models probably have not absorbed them properly.
That makes them valuable.
The question is not whether an AI can produce plausible chemistry language.
The question is whether, faced with a real patent, it can recover and recapitulate the logic of the program: what mattered, what improved, what broke, what was worth advancing, and what was noise.
As hypothesis generation gets cheaper, that kind of verification matters more and more.
At that point, the frontier is not who can generate the most molecules.
It is who can tell which ideas deserve to survive.