Predicting Binding Affinity Without an Experimental Structure

A co-crystal structure of your antibody bound to its target antigen is, without question, the best foundation for computational affinity prediction. You know the precise paratope-epitope contact geometry, which CDR residues are within hydrogen-bonding distance, and whether the CDR H3 loop has adopted a canonical or unusual conformation. The problem is that co-crystal structures are expensive to obtain, slow to produce, and in most discovery programs simply don't exist until well past the hit-to-lead stage.

What does exist, usually, is a sequence. Sometimes a Fv domain model from a close template. Occasionally a cryo-EM density at 3–4 Å resolution that tells you roughly where the antibody is binding but leaves CDR loop positions ambiguous. The question our team has been working through is: how much affinity prediction signal can you extract from a predicted or homology-modeled structure, and how does that vary depending on the quality of the prediction?

The short answer is: more than you'd expect in the regime of substitution ranking (which mutations make binding better or worse), and less than you'd want for absolute K_D prediction. Getting specific about that distinction is what this post is about.

The Two Regimes of Affinity Prediction

Affinity prediction tasks come in two distinct flavors, and confusing them leads to disappointment. The first is absolute binding affinity prediction: given a structure, predict K_D to within, say, a factor of 5. This is genuinely hard even with experimental structures. Free energy perturbation (FEP) methods approach this in ideal cases, but they require high-quality crystal structures, careful setup, and significant compute. Using predicted structures for this task introduces model errors that typically swamp the FEP signal. We don't attempt absolute K_D prediction from homology models.

The second regime is relative affinity prediction: given a panel of variants (e.g., CDR point mutations), rank them by predicted binding improvement over the parent. Here the modeling errors are more likely to be systematic — if the model is slightly wrong about a particular loop geometry, that error will be present in both the parent and variant calculations, and the difference (ΔΔG) can still carry useful signal. This is the regime where computational affinity prediction earns its keep in early discovery, and it's the regime where the structure quality requirements are more forgiving.

AlphaFold2 Fv Predictions: What They Get Right and Wrong

AlphaFold2 has meaningfully changed the starting-point quality for antibody structure prediction. For Fv domains with framework sequences closely related to PDB-deposited antibodies (which is most therapeutic antibody sequences given the clustering in VH/VL germline gene usage), AlphaFold2 typically produces models with backbone RMSDs of 1.0–1.5 Å over the full Fv, and ≤ 1.0 Å over CDR L1, L2, H1, and H2.

CDR H3 is the exception. For loops ≤ 10 residues, AlphaFold2 Fv predictions are competitive with dedicated loop modeling tools. For loops 11–14 residues, accuracy degrades noticeably, with RMSD to native often reaching 2–3 Å. Beyond 14 residues, the prediction is best treated as a reasonable starting conformation for Rosetta loop modeling rather than a stand-alone structure. We've documented this more systematically in our SAbDab holdout benchmark work — the short version is that H3 loop length is the single strongest predictor of AlphaFold2 model quality for antibody Fv domains.

For binding affinity calculations, what matters most is the conformation of the loop at the paratope-epitope interface, not the loop conformation in isolation. A 2 Å RMSD on CDR H3 relative to crystal structure translates to roughly ±1.5–2.0 REU of uncertainty in Rosetta interface score calculations. That's enough to misrank individual point mutations. It's not enough to completely obscure the pattern of which CDR positions are energetically accessible for mutation.

Homology Modeling as a Structure Source

For some programs, a close PDB template exists — meaning a previously crystallized antibody with ≥ 80% sequence identity to the query in the framework regions and ≥ 60% in the CDRs. In this case, homology modeling using a tool like ABodyBuilder2 or the Rosetta comparative modeling protocol, followed by CDR loop grafting and energy minimization, can produce models that are competitive with AlphaFold2 for ΔΔG calculations. The advantage of template-based modeling is that the framework geometry is anchored to an experimental observation, limiting propagation of energy function errors into the CDR base.

When no close template exists (framework identity < 70%), homology model quality degrades substantially, and we recommend running both AlphaFold2 and a template-based model, then taking the ensemble median ΔΔG across both structures as the reported value. The ensemble approach reduces the impact of any single model's idiosyncratic errors on the final ranking.

We validated this ensemble approach on the SKEMPI2 database, which contains experimentally measured ΔΔG values for 7,085 mutations across 319 protein-protein interfaces (not exclusively antibody-antigen). For the antibody-antigen subset of SKEMPI2 (roughly 600 mutations), Rosetta's cartesian_ddg protocol applied to experimental structures achieves a Pearson correlation of approximately 0.52–0.58 with measured ΔΔG values. Applied to AlphaFold2 Fv models (without antigen structure), the correlation drops to 0.38–0.44 — lower, but still informative for ranking purposes. The ensemble of AlphaFold2 + best homology model improves correlation back to approximately 0.45–0.50.

The Antigen Structure Problem

We've been discussing the antibody Fv structure in isolation, but binding affinity requires knowing the antigen structure too. For soluble protein antigens with PDB structures, the antigen side is well-constrained and contributes less to prediction error than the antibody side. For antigens without PDB structures — membrane proteins, novel targets, engineered constructs — the antigen must also be modeled, and the error contributions compound.

In a program we worked through in late 2024 targeting a GPCR-derived epitope peptide (the antigen was a short 18-residue linear peptide displayed in a conformationally constrained context), the antigen structure was the dominant source of prediction uncertainty. We used AlphaFold2 to model the epitope in its nearest plausible conformation and ran ΔΔG calculations, but were explicit with the client that the CDR scan results should be treated as hypothesis generation only — we could tell them which CDR positions were likely important and suggest substitution types, but the hit rate from synthesis was expected to be lower than in a program with a well-defined antigen structure. We don't say this to discourage the approach; we say it because knowing which error sources dominate helps calibrate how many variants to synthesize for validation.

Practical Workflow for Structure-Free Programs

Given these constraints, here's the decision logic we use internally when a program has no experimental structure:

Generate AlphaFold2 Fv model and check pLDDT scores per CDR. CDRs with per-residue pLDDT < 70 are low-confidence and should be treated with caution in ΔΔG calculations.
Find best PDB template by sequence search (BLAST against SAbDab). If identity ≥ 70% in framework, generate a homology model using ABodyBuilder2 or equivalent.
For CDR H3 ≤ 10 residues: proceed with ΔΔG scanning using both models; report ensemble ΔΔG. For H3 11–14 residues: run Rosetta loop refinement on both models before ΔΔG calculations. For H3 ≥ 15 residues: treat computational results as qualitative only, recommend focused 20–30 variant panel for experimental determination.
Dock antibody model to antigen structure (if antigen has PDB entry). If not, build antigen model and document as a major uncertainty source.
Report ΔΔG predictions with explicit confidence tiers: high (experimental structure + short H3), medium (AlphaFold2 model + medium H3 + PDB antigen), low (both sides modeled + long H3).

This workflow isn't conservative for the sake of caution — it's designed to make the best possible use of computational signal while being transparent about where that signal fades. In a therapeutic antibody program, synthesis and SPR time is expensive, and calibrated confidence intervals determine how aggressively you should synthesize based on computational output alone.

When to Invest in Experimental Structure

There's a practical break-even point. For hit-to-lead optimization of a single CDR loop with a short H3 and a known antigen structure, computational scanning from predicted models gives sufficient signal to run a focused experimental round. But once a program is at lead optimization stage, has 3–5 candidates being compared on multiple criteria, and needs to understand epitope contacts precisely enough to differentiate selectivity profiles, a co-crystal or cryo-EM structure becomes worth the investment. Computational methods from predicted structures can carry you through early-stage design decisions; they're not a permanent substitute for experimental structural biology.

We're not suggesting programs skip structural biology. We're saying the decision of when to invest in experimental structure should be driven by the specific information need at each stage, not by a reflexive assumption that computation is useless without crystal data. For a significant fraction of early design decisions, the predicted structure is good enough to generate a useful synthesis list — and that's the claim we're making, precisely.