The Hardest Part of Molecular ML Is Target Definition

By Blaise AI Team

A lot of molecular ML discussions start too late.

They start at representation.

Graph network or transformer. 3D or 2D. Protein sequence or structure. Fine-tuning strategy, pretraining corpus, multimodal fusion, loss function, benchmark split.

By the time the conversation gets there, the hardest part may already have gone wrong.

The hardest part is often target definition.

Plenty of models fail because they solved exactly what they were asked

This is why so many disappointing systems look technically serious. The model may fit the stated task well. The architecture may be sound. The evaluation may be competent. The whole effort still underdelivers because the target itself was mis-specified.

Predicting a universal affinity when the real project needs within-series ranking in one assay regime is a target-definition problem. Predicting a pose when the real deployment question is which analogue to make next is a target-definition problem. Regressing a property value when the team mainly needs calibrated triage is a target-definition problem.

The model is not broken.

The aim was.

Target definition determines everything downstream

Once you choose the wrong target, a lot of later choices become theatre.

The benchmark can still look rigorous. The representation can still be elegant. The training data can still be large. But the system is now optimised for a convenient abstraction rather than an operational need.

Target definition picks the unit of optimisation, the baseline, the labels, the metrics, and the deployment claim. It decides whether the system is trying to replace docking triage, local SAR ranking, assay sequencing judgement, route-aware prioritisation, or something else entirely.

If that is left vague, the rest of the pipeline becomes oddly untethered from reality.

This is why architecture debates often feel strangely unsatisfying

People can spend months arguing about molecular representation while quietly assuming the target is obvious.

It often is not.

In small-molecule discovery, the question “what should the model predict?” is inseparable from “what job is the model supposed to do inside a live project?” Those are not separate design phases. They are the same design problem.

Once the task is defined properly, a lot of supposed complexity tends to collapse. The right baseline becomes obvious. The evaluation becomes more honest. In some cases the model itself gets simpler because the job was narrower and more local than the original framing implied.

That is not a loss of ambition.

It is what happens when the problem statement stops drifting.

Bad target definitions usually smell operationally vague

You can often detect the problem early.

The language gets fuzzy. “Drug discovery model.” “Molecule design system.” “General chemistry AI.” The target sounds large enough to impress and vague enough to avoid a real incumbent. The benchmark shifts between structural reasoning, property prediction, and compound prioritisation as needed. The supposed use case gets more commercial in the discussion than it was in the original task design.

That drift is not a communication issue. It is evidence that the target was never pinned down with enough precision.

A strong target definition is brutally specific

It should be possible to say, in one sentence, what the model is meant to replace or augment and what better looks like.

This model is intended to help rank close analogues inside an active potency series after the first ten local datapoints.

This model is intended to help choose whether permeability or microsomal stability should be assayed next on a constrained batch of compounds.

This model is intended to help retrieve relevant internal precedents before a chemist commits to a new route.

Once you can write a sentence like that, the job gets much clearer. Before that, a lot of molecular ML is just polishing the wrong target.

This is also why simple models remain stubbornly competitive

When the target is operationally precise and local, simple baselines often become embarrassingly hard to beat.

That is not because the field lacks sophistication. It is because once the problem is stated correctly, the relevant signal may already be concentrated in a narrow context where local methods have a natural advantage.

People sometimes read that as a failure of modelling.

It is more often a success of problem definition.

The target should be stated before the benchmark, not reverse-engineered from it

Too often the field works backwards. A public dataset exists. A model fits it. A benchmark is defined. Only afterward does the discussion start stretching toward a deployment story.

That sequence should be reversed.

Start from the operational question. State what the model is meant to do in a real workflow. Then decide what labels, evaluation, and representations make sense.

Otherwise the benchmark quietly becomes the target, and the system gets rewarded for doing something clean that no project urgently needed.

Representation matters. Architecture matters. Training data matter.

But they all matter after the field stops being vague about what the model is actually for.

That is the harder task. It is also the one that decides whether the rest of the work will ever become useful.