The Project Is the Unit of Optimisation
A lot of molecular ML still behaves as though the core problem is obvious.
Take a set of molecules. Score them. Rank them. Pick the one at the top.
That framing is neat. It is also too small.
The real object being optimised in a live programme is not the molecule. It is the project.
That sounds like semantics until you watch how real teams actually work.
The same molecule can be a good idea or a bad one depending on when you see it
Imagine the exact same compound appearing twice in the same project.
In the first case, the team has not yet established whether a particular vector is open, synthesis from the current intermediate is straightforward, and one clean analogue would settle an important uncertainty. The compound is valuable.
In the second case, the team already knows that vector is open, the same perturbation has been sampled twice, and the real bottleneck has shifted to permeability. Now the compound is mostly noise.
Nothing about the structure changed.
What changed was the state of the project around it.
This is the point that molecule-first optimisation misses. A compound does not have one fixed value in isolation. It has move value given the current position.
Project state is the thing that makes suggestions meaningful
What sits inside that state?
Assay history. Failed motifs. Live route constraints. Intermediates on the shelf. Building blocks that have already arrived. Known liabilities. Open mechanistic questions. The queue at the bench. The assays that can run this week. The experiments that are too expensive to waste.
All of that shapes whether the next compound is useful.
That is why discovery decisions feel so different from leaderboard tasks. The dataset is not a pile of independent rows. It is a changing position with memory, momentum, and unresolved uncertainty.
Any model that ignores that will eventually start recommending moves that look strong in isolation and weak in context.
Scoring molecules one by one strips out the thing that matters most
A scalar compound score is attractive because it is easy to benchmark. You can regress against pIC50. You can rank by MPO. You can collapse multiple terms into a single objective and pretend the output is decision-ready.
But a project does not need a universal number. It needs the next move that best improves the state of the programme.
That may mean making a slightly worse-looking molecule because it resolves a key question faster. It may mean preferring a tractable series-expanding analogue over a high-scoring bespoke compound. It may mean sequencing an experiment that makes the following round sharper, even if the immediate molecule is not the headline winner.
Once you define the objective at the project level, these choices stop looking conservative.
They start looking rational.
The right objective is not compound quality. It is project improvement
That sounds abstract until you make it operational.
A good next move might reduce uncertainty about a liability. It might tell the team whether a direction is dead. It might improve the odds that the next round will be smaller and better. It might preserve chemistry bandwidth. It might rule out an attractive but expensive mistake.
Those are project outcomes.
They are often more valuable than a marginal gain in predicted potency on a single structure.
This is why many deployed systems disappoint. They were trained to optimise the wrong object. They learned to produce molecule-level answers for a project-level problem.
Benchmarks should reflect stateful decision-making
Most current evaluations still flatten the game.
Here is a molecule. Predict its property.
Here is another molecule. Rank its activity.
But real deployment is not a series of disconnected guesses. It is a closed loop where each new result changes the value of the next action.
If we took that seriously, benchmark design would change. We would measure sequential design quality, speed of local learning, information gained per synthesis round, compounds avoided, and how quickly the model improves project state after each new assay batch.
Those are harder benchmarks because they force the system to play the same game the team is playing.
That difficulty is not a flaw. It is the point.
This is also why simple local models keep surviving
People sometimes treat it as embarrassing that project-local models remain stubbornly competitive.
It should not be embarrassing at all.
If the real task is stateful and local, then models tightly coupled to current project context should be hard to beat. They are operating on the exact board position that matters. A broader model with better average performance elsewhere may still make worse decisions if it is less sensitive to current project state.
This is not a paradox. It is what happens when the project, not the molecule, is the thing being optimised.
A better question fixes a lot of downstream confusion
People spend endless energy arguing about architectures, representations, benchmarks, and generalisation regimes. Some of that matters.
But a lot of confusion would disappear if the field started with a cleaner question.
Not: what is the best molecule?
But: what is the next move that most improves the project from where it is now?
That shift is small on paper and enormous in practice. It changes what data matter, what models should remember, what counts as success, and why certain flashy suggestions are operationally useless.
Drug discovery is not a static ranking problem pretending to be complicated.
It is a stateful project problem pretending to be simple.