Rosetta Energy Functions for Antibody-Antigen Docking: REF2015 vs. Talaris2014

When we started building our docking pipeline at Genolux, we made an assumption that most computational biologists make: the newer energy function is better. REF2015 replaced Talaris2014 as Rosetta's default around 2016–2017, with documented improvements in sidechain packing and solvent exposure modeling. Naturally, we initially ran everything under REF2015 and moved on.

That assumption held up reasonably well — until we started seeing unexpected score gaps on a set of tight-binding anti-cytokine complexes. When we traced the issue, we found that REF2015's electrostatics treatment was producing large score penalties on interface residues that, structurally, clearly should be favorable contacts. Switching to Talaris2014 on the same complexes improved interface recovery by about 15 percentage points.

That prompted a more systematic look. We benchmarked both energy functions against 200 antibody-antigen complex structures drawn from SAbDab, and what follows is what we found — including where each function outperforms, where they fail similarly, and the practical guidance we've built into our own scoring protocols.

What Actually Differs Between the Two Functions

Talaris2014 and REF2015 share a common lineage but diverge in several important ways. REF2015 introduced a revised solvation model (LK_BallWtd, replacing the original Lazaridis-Karplus solvation), a new electrostatics treatment via the fa_elec_dens term, and reweighted reference energies for each amino acid type. The stated goal was better performance on protein folding, loop modeling, and rotamer prediction across a broad benchmark set.

For general protein–protein interfaces, REF2015 does tend to produce tighter energy funnels during docking. But antibody–antigen interfaces are not general protein–protein interfaces. They are heavily CDR-dominated, with a disproportionate contribution from long CDR-H3 loops that often carry buried charged residues and unusual backbone conformations. REF2015's electrostatic model, which applies a distance-dependent dielectric, can over-penalize the burial of charged CDR residues in interfaces where those residues make multiple compensating contacts.

Benchmark Setup: 200 SAbDab Complexes

Our holdout set was drawn from SAbDab structures deposited before January 2024, filtered for resolution ≤ 2.5 Å, paired Fv + antigen present, and no crystal contacts within 4 Å of the CDR loops. After filtering, we had 200 structures spanning a range of antigen types: globular proteins (n=127), peptides (n=31), haptens (n=22), and two carbohydrate-containing antigens (excluded from the main analysis due to limited Rosetta carbohydrate support).

For each complex, we ran local docking (perturbation ±3 Å, ±8° rotation) using RosettaDock 4.0 starting from the native structure with 1,000 decoys per complex. We then evaluated energy funnels by computing the fraction of near-native decoys (RMSD ≤ 1.5 Å to native CDR-H3 position) that fell within the lowest 10% of total score — a standard interface recovery metric.

Where REF2015 Has the Edge

For globular antigen targets with hydrophobic binding epitopes — think enzyme active site occlusion, receptor blockade where the epitope is a structured loop — REF2015 consistently outperforms Talaris2014. On this subset (n=74 from the globular protein cohort with >60% apolar interface residues), REF2015 showed near-native decoy enrichment in the top 10% of 68.4% of cases vs. 61.2% for Talaris2014.

The improvement comes primarily from REF2015's solvation term. LK_BallWtd more accurately models the desolvation cost of burying apolar surface area, which is the dominant energetic driver at hydrophobic interfaces. Talaris2014 tends to slightly underestimate this penalty, producing a flatter energy landscape that's harder to discriminate near-native from non-native.

REF2015 also performs better on antibodies with short CDR-H3 loops (8–11 residues). These loops behave more like structured protein fragments, and the improved backbone sampling statistics in REF2015's reference energies translate to better recovery of their conformations.

Where Talaris2014 Holds Its Own — or Wins

The picture reverses on charged and mixed-character interfaces. When the binding epitope contains a high density of charged residues (Asp, Glu, Lys, Arg >30% of interface) or when CDR-H3 is long (>14 residues), Talaris2014 produces more accurate energy funnels in our benchmark. The difference is not small: on the charged epitope subset (n=38), Talaris2014 near-native enrichment was 54.7% vs. REF2015's 41.3%.

We traced this to two sources. First, REF2015's fa_elec term with its distance-dependent dielectric over-penalizes salt bridges at CDR-antigen interfaces where the local dielectric is genuinely lower than bulk solvent. The crystal structures confirm these are real, stable contacts — but REF2015 scores them as unfavorable. Second, on long CDR-H3 loops, the reference energy reweighting in REF2015 introduces systematic biases against the kink conformation that dominates CDR-H3 structures longer than 13 residues.

For peptide antigens, the results were more mixed and interface-specific, so we don't have a general recommendation for that class — we evaluate on a case-by-case basis.

Score Decomposition: What to Look At

Rather than treating the total Rosetta score as an opaque number, our scoring workflow decomposes the interface energy into constituent terms. The ones we watch most closely in antibody-antigen docking:

fa_atr / fa_rep: Lennard-Jones attractive and repulsive van der Waals. These are well-behaved in both energy functions and are the most reliable indicator of steric complementarity. A strongly negative fa_atr with controlled fa_rep (below +10 REU for a typical 1,400 Å² interface) is a prerequisite, not a sufficient condition.

fa_sol: Solvation. This is where the two functions diverge most. Under REF2015, a very negative fa_sol contribution at a predicted interface almost always corresponds to a real hydrophobic contact; under Talaris2014, the same value is slightly less discriminating. If your antigen is a hydrophobic pocket binder, trust REF2015's fa_sol signal more.

fa_elec: Electrostatics. If you're seeing large negative or positive fa_elec scores driving your overall ranking, that's a flag to check manually. REF2015's electrostatics model is sensitive to charge burial in ways that don't always correspond to real energetic penalties, especially when the charged residue makes compensating H-bonds. Under Talaris2014, electrostatics contributes less aggressively to the total score, which is conservative but more stable for CDR-rich interfaces.

hbond_sc: Sidechain hydrogen bonds. Both functions score H-bonds comparably in terms of geometric criteria, but the weighting differs. Hydrogen bond decomposition at the interface is useful when you're ranking CDR mutations — an extra H-bond to the antigen is a real signal, not noise.

Practical Guidance: Which to Use When

We're not saying Talaris2014 is better than REF2015 in general — it isn't. Outside of docking, REF2015 is the correct choice for most Rosetta tasks. What we are saying is that the antibody-antigen docking use case is specific enough that the generalization doesn't hold cleanly.

Our current practice is interface-type dependent. We query the sequence-level charge content of the predicted epitope region (from homology model or AlphaFold2 structure) before choosing an energy function. If the predicted epitope has >25% charged residues or if CDR-H3 is predicted to be longer than 13 residues, we run Talaris2014 as the primary scoring function with REF2015 as a cross-check. If the epitope is predominantly hydrophobic and CDR-H3 is short, REF2015 is primary.

A concrete example from our internal work: we ran a campaign on a GPCR N-terminal domain target with a highly charged extracellular region. REF2015 consistently ranked low-contact, peripheral binders above the correct near-native poses because the charged interface residues were penalized. Switching to Talaris2014 recovered the correct funneling, and the top-ranked pose was within 1.2 Å RMSD of the crystallographic structure obtained later. We didn't have the crystal structure at the time — we only confirmed retrospectively — but the energy function choice mattered for which CDR variants we advanced.

Score Cutoffs Are Not Portable

One thing that catches people out: score cutoffs developed for REF2015 don't transfer to Talaris2014, and vice versa. The raw REU values are not comparable across energy functions because the reference energies are different. If you've calibrated your pipeline to filter on, say, interface score ≤ −35 REU under REF2015, that threshold needs recalibration when you switch to Talaris2014 — typically by running both on a small validation set and finding the equivalent percentile.

We use a percentile-based approach internally rather than absolute cutoffs, which sidesteps this problem and makes the pipeline more portable across energy function choices. The top decile of interface score, regardless of which function produced it, is a more interpretable criterion than a hard REU threshold.

The practical implication for anyone building a Rosetta-based antibody docking pipeline: don't pick an energy function once and forget it. Understand which antigen class you're working with, and have a principled reason for your choice. The 15+ percentage point difference in near-native enrichment we observed on charged-epitope targets is large enough to meaningfully affect which CDR sequences you carry forward.