Why Federated Learning for ADME Is Doomed to Fail

By Blaise AI Team

Federated learning for ADME gets sold as this magic trick:

“Keep your data private, share the gradients, and somehow everyone walks away with a better model.”

Sounds great. But if what you actually want is an explainable ADME model — the kind you can use to make synthesis decisions without crossing your fingers — federated learning is basically doomed.

Not because the math is bad.

Because the incentives are.

ADME data isn’t “data”. It’s hard-won taste

Good ADME datasets aren’t just big. They’re clean: structures registered properly (salts, tautomers, stereochem — all the boring stuff), assays run consistently under SOP discipline, metadata kept across conditions and batches and instruments, results sanity-checked and relabelled when the lab messed up.

That’s expensive. It’s also a competitive edge.

So when a federation says “please contribute high quality, high quantity ADME”, what they’re really asking is:

“Please donate the thing you built that makes you better than other teams.”

Yeah… no.

Federated learning turns ADME into a classic “public goods” mess

Everyone benefits from the shared model.

But the cost of making your data useful (cleaning, standardising, annotating) is paid by you.

So the rational move is to contribute the minimum, keep your best stuff private, and take whatever improvements the federation gives you.

That’s not immoral. That’s just the equilibrium.

And it leads to the predictable outcome:

the federation model gets trained on everyone’s leftovers.

The best players don’t join

If you already have a great internal ADME platform, why would you join?

You’d be paying membership overhead, taking leakage risk (even if it’s “privacy-preserving”), and helping competitors catch up.

So who does join? Groups with sparse data, messy assay setups, inconsistent registration, weaker internal modelling.

Federations end up with a weird kind of adverse selection: the people most motivated to join are the ones whose data is hardest to learn from.

Everyone hides the juicy bits

The most valuable ADME data is the painful stuff:

  • permeability cliffs caused by one ring flip
  • P-gp/BCRP headaches that ruin a whole series
  • solubility weirdness that only shows up in one scaffold class
  • time-dependent CYP inhibition edge cases
  • metabolism switches you only learn after months of SAR

That’s the data that makes models genuinely useful.

It’s also the data that reveals what you’re working on, what failed, and what lessons you paid for.

So people don’t share it. They share the safe stuff — older datasets, generic chemical space, cleaned-down aggregates with metadata stripped, endpoints that won’t expose programme intent.

The federation model ends up generic and loses the nuance needed for hard cases.

Explainability is impossible if you can’t look at the data

Here’s the big one.

Explainable ADME modelling isn’t “SHAP says cLogP matters”.

Real explainability is being able to say:

  • “This is a protocol shift, not chemistry.”
  • “This lab’s Caco-2 is systematically higher.”
  • “This looks like a batch effect starting last summer.”
  • “These ‘different compounds’ are actually registration duplicates.”
  • “The model thinks lipophilicity matters here because it’s proxying a solubility method change.”

That requires cross-site QC and digging into rows and residuals and metadata.

Federated learning blocks exactly that by design. You’re not allowed to inspect the seams, so you can’t explain what the model learned from them.

So you get “explanations” that are basically:

a story about blended biases you’re not allowed to audit.

Even “good faith” federations create weird behaviour

Even if nobody is malicious, people optimise for optics: contribute “enough” to look collaborative, drop awkward endpoints, bucket labels to hide assay messiness, omit negatives or failed series because they’re politically risky.

And if anyone is malicious, yes, gradient poisoning is a thing — but even without that, the soft incentives already push the shared dataset toward mediocrity.

The unsexy truth: the real problem is standardisation

If you wanted a shared explainable ADME model, you’d spend your time on shared endpoint definitions, reference compounds and control charts, mandatory metadata schemas, proficiency testing across labs, constant recalibration.

That’s the real work.

But it’s expensive and annoying and slows teams down. And again: the cost is local, the benefit leaks globally.

So it never happens properly.

So when does federated ADME make sense?

Federated learning can be fine if you’re honest about what it is: a black box that nudges average metrics up, useful for rough filtering, not something you’ll trust mechanistically, and not something you’ll use to justify “make this molecule”.

If you want a model that a med chem can argue with in a meeting, federated learning won’t get you there.

What to do instead

If you want explainability, pick one of these:

  1. Standardise the measurement layer first Force every site to align on endpoints, controls, and metadata standards.

  2. Use a secure data enclave Let people share data in a controlled environment so you can actually run global QC.

  3. Keep explainability local Train and debug locally, and share validated rules (matched pairs, “don’t do this” motifs) rather than pretending pooled gradients produce truth.

The punchline

Federated ADME is trying to dodge the hard part: aligning incentives and measurement standards.

You can’t gradient-average your way out of protocol heterogeneity, competitive dynamics, and selective sharing.

So the likely end state is always the same:

noisy legacy data goes in, the good stuff stays private, and you get a model that looks great in a slide deck… and doesn’t actually explain anything.

You Might Also Like