Stop Collapsing Conformation Space
Single-pose reasoning is a convenience. Chemistry lives on conformational manifolds — and your tooling should treat ensembles as first-class objects.
Most molecular ML is still framed as if the job were passive.
Take a fixed dataset. Fit a model. Hold out a test set. Report the score.
That is fine if the world arrives fully measured and your job is only to interpolate within it.
Drug discovery does not arrive that way.
The real job is choosing what to measure next.
A live programme is not handed a complete table of compounds and labels. It is constantly deciding what to make, what to assay, what to ignore, and which uncertainty is worth buying down now versus later.
That makes discovery a sequential experimental design problem before it becomes a prediction problem.
The model does not just need to say what it thinks. It needs to help decide what the team should learn next.
That might mean proposing a compound. It might mean choosing between two assays. It might mean recommending a cheap measurement on a borderline compound because the information generalises across the series. It might mean telling the team not to spend money on the standard next test because the result would not change the decision.
A passive framing misses all of that.
Once a dataset has been cleaned, split, and benchmarked, it is easy to forget what generated it.
In practice every label has a cost, a latency, a bias, and an opportunity cost. Some assays are fast but noisy. Some are expensive but decisive. Some compounds are easy to make and therefore cheap to learn from. Some are difficult enough that the act of asking the question distorts the whole programme.
A model trained as though labels simply appear cannot tell you how to spend experimental budget well. It can only score the compounds that already happened to be measured.
That is why better prediction often underdelivers in deployment. The project did not need a cleaner fit to history. It needed help deciding what history to generate next.
Good project scientists already think this way, even if they do not call it active learning.
They ask which compound will tell us whether the vector is open. They ask which assay will separate two competing explanations. They ask whether one quick permeability measurement will save two weeks of chemistry. They ask whether a deliberately ugly compound is worth making because it clarifies the boundary of the series.
That is not static prediction.
It is controlled information gathering under budget and time constraints.
The role of the model should be to improve that control, not merely to decorate it with more accurate point estimates.
A top-scoring compound is not always the best next experiment. Sometimes it is too expensive to make, too ambiguous if it fails, or too similar to what is already known.
An actively chosen experiment may look modest in comparison. It may be selected because it is easy to synthesise, sits near a decision boundary, resolves a major uncertainty, or broadens the local training signal in a region where the model is still blind.
That often produces better project trajectories than greedy ranking.
The field talks a lot about sample efficiency. This is where it actually lives. Not only in fitting more from fewer labels, but in choosing labels that make the next round disproportionately more informative.
If the real task is sequential experimental design, then evaluation should stop pretending a single held-out score is enough.
We should want to know how well the system chooses the next compounds to make. How quickly uncertainty shrinks after each measured batch. Whether the model spends assay budget in a way that reduces regret. Whether the next ten experiments are better because the previous ten were selected intelligently.
That kind of benchmark is more work. It is also much closer to what a live project experiences.
A model that looks ordinary in a passive prediction setup may be very strong in a closed loop if it chooses experiments well. A model that dominates static datasets may still be poor at asking useful next questions.
Those are not edge cases. They are different kinds of intelligence.
One useful way to think about this is that every assay is a purchase.
You are buying a piece of information with money, time, compound, and queue capacity. The right purchase depends on what the project currently knows and what decision is blocked.
Some of the most valuable purchases are cheap. Some expensive purchases are still bad because they answer a question nobody urgently needs answered. Some measurements matter less for the compound in front of you than for the model you want to have next week.
This is a much better frame than the old idea that the pipeline simply advances compounds through a fixed ladder.
None of this is an argument against predictive modelling.
You still want the strongest possible prior. You still want calibrated ranking inside the current assay regime. You still want fast local adaptation.
But if the system never graduates from passive scoring, it will remain strangely detached from the real work of a project.
Real teams do not just need to know what the model believes.
They need to know what to do tomorrow.
That is why active learning matters so much in this field. It is not a nice-to-have extra on top of prediction. It is closer to the core task than prediction ever was.