Antibody Humanization: Computational Strategies for Immunogenicity Reduction

Humanization of murine antibodies has been practiced since the late 1980s, yet it remains one of the most judgment-intensive steps in therapeutic antibody development. The core problem hasn't changed: CDR loops derived from a mouse are carried on a framework region that a human immune system will see as foreign if the framework itself contains non-human residues — and even with human frameworks, the CDR sequences can present T-cell epitopes that drive anti-drug antibody responses.

The modern version of this problem is not just "how do we graft CDRs onto human frameworks" but "how do we minimize the immunogenicity of the entire Fv while preserving the affinity and stability of the original molecule." Computational approaches have been applied to both questions for decades, but the tools have genuinely matured in the past several years, and we've built our humanization workflow around what actually works — and what doesn't.

Framework Selection: Kabat, IMGT, Chothia, and Why the Numbering Scheme Matters

Before a single computational score is run, the first decision is framework selection and CDR definition scheme. This sounds administrative but has real consequences. Under Kabat numbering, CDR boundaries are defined by sequence variability patterns in the original Kabat database. Under Chothia, boundaries are defined by structural loop geometry. IMGT uses a standardized numbering scheme with slightly different CDR definitions. These schemes disagree at the edges of CDR loops — particularly CDR-H1, CDR-H2, and CDR-L2 — and the disagreements affect which residues are treated as CDR (transferred from mouse) versus framework (substituted to human).

When a framework residue under Chothia numbering is a CDR residue under Kabat numbering, and that residue is a critical affinity contact, you may end up humanizing away an essential position without realizing it. The canonical example is the VH CDR1 positions 26–30: under Kabat these are framework, under Chothia they're CDR. If a murine antibody has an unusual residue at position 27 that packs against an antigen sidechain, humanizing that position to the consensus human residue will cost you affinity.

Our approach is to use IMGT numbering as the canonical reference for framework selection (it's the most widely adopted in newer databases) but to compute CDR boundary definitions under all three schemes and flag any position where the definitions disagree. Those positions get individual attention during the affinity rescue modeling step.

Germline Framework Matching: OAS and the Human Antibody Landscape

For framework selection, we query the Observed Antibody Space (OAS) database to find the human VH and VL germline genes most similar to the murine sequence at framework positions (excluding CDR residues by IMGT definition). The goal is to select a human germline framework that minimizes the number of framework substitutions required — each substitution is a potential affinity loss risk and a potential new immunogenic motif.

The most similar germline gene is not always the right choice. We also score the candidate framework sequences for T-cell epitope load using a matrix-based prediction approach: each 9-mer peptide from the framework is scored against a set of MHC class II binding prediction matrices (commonly IEDB-derived position-specific scoring matrices for common HLA-DR alleles). Frameworks with lower predicted T-cell epitope scores are preferred even if they require one or two additional back-mutations relative to the closest germline match.

This tradeoff is real: the closest germline match minimizes back-mutations but may harbor a framework region with a high-scoring MHC-II epitope. The second-closest match with better immunogenicity predictions often outperforms on clinical risk — at the cost of additional affinity rescue work at the back-mutation stage. We evaluate this explicitly rather than defaulting to closest-germline-always.

T-Cell Epitope Prediction: What the Scores Mean

T-cell epitope prediction tools predict MHC class II binding affinity for peptide fragments from a protein sequence. High-predicted-affinity binding is a necessary but not sufficient condition for immunogenicity: the peptide still needs to be processed and presented, and naive T-cells must be activated in a context that breaks tolerance. These are additional steps the prediction tools don't model.

We're not saying T-cell epitope prediction scores are a definitive immunogenicity forecast. The correlation between predicted T-cell epitope load and observed anti-drug antibody rates in clinical trials is real but modest — probably an r² in the 0.3–0.4 range based on the published retrospective analyses, not a deterministic relationship. What these scores do well is identify obvious clusters of high-affinity binding potential that can be addressed in sequence design. Eliminating a strong predicted MHC-II binder from a CDR region by a conservative substitution that preserves affinity is a low-cost risk reduction measure, not a guaranteed immunogenicity fix.

When we present humanization results, we report predicted T-cell epitope scores for the murine parent, the initial grafted construct, and each back-mutation variant. The question is directional: are we increasing or decreasing the predicted immunogenic burden at each step? That trajectory matters more than any absolute score.

Back-Mutation Prioritization: When Human Frameworks Don't Fit Murine CDRs

After initial CDR grafting onto the selected human framework, affinity often drops. The standard rescue strategy is back-mutation: reintroducing murine residues at specific framework positions that support CDR loop geometry. The challenge is identifying which positions to back-mutate — each murine residue reintroduced reduces human character and potentially increases immunogenicity risk.

Our prioritization approach uses Rosetta ΔΔG calculations to evaluate each candidate back-mutation. For each framework position where the human consensus residue differs from the murine parent, we model the murine residue in the grafted structure and calculate the change in Fv stability (ΔΔG_stability) and interface energy (ΔΔG_binding). Positions that produce >1.0 REU improvement in either metric are flagged as high priority for back-mutation. Positions producing <0.3 REU improvement are deprioritized.

The structural intuition behind this: certain framework positions directly contact CDR loops ("Vernier zone" positions in the VH/VL interface and positions directly adjacent to CDR canonical structure-forming residues). These positions are disproportionately represented among high-ΔΔG back-mutations. The identification of Vernier zone residues by structural analysis alone is a useful heuristic but misses some cases — the ΔΔG calculation catches positions where the structural explanation isn't obvious but the energetic impact is real.

A Worked Example: Anti-IL-17A Program

To make this concrete: we ran a humanization campaign on a murine anti-IL-17A antibody with a measured K_D of 0.4 nM (SPR, 25°C) and a CDR-H3 of 14 residues containing an unusual VH junction deletion. The deletion complicates framework selection because standard IMGT numbering assumes a full-length VH sequence — so we had to manually adjust the framework alignment before germline matching.

After framework selection (IGHV1-46 VH, IGKV1-39 VL — both low predicted T-cell epitope burden), direct CDR grafting produced a construct with K_D of 3.8 nM, an approximately 9-fold affinity loss. Back-mutation analysis flagged six framework positions with ΔΔG > 1.0 REU — three in VH (positions 71, 73, 78 by Kabat numbering) and three in VL (positions 36, 46, 48). We introduced all six, which recovered K_D to 0.7 nM — still 1.7× weaker than the murine parent but acceptable for the program.

More importantly, the predicted T-cell epitope scores for the humanized construct were 42% lower than the murine parent measured by total predicted MHC-II binding score across the Fv. The back-mutation residues did not introduce new high-scoring epitopes in any of the six positions, which we verify explicitly as part of the workflow. The final sequence was flagged as having >85% human character at the framework level by IMGT V-QUEST alignment.

Sequence-Level Humanness Scores: Useful Benchmarks, Not Goals

Humanness scores — T20 score, z-score relative to human antibody sequence space, percent human identity — are commonly reported as part of humanization characterization. They're useful benchmarks, but chasing a high humanness score at the cost of affinity or stability is the wrong optimization target. The goal of humanization is to reduce clinical immunogenicity risk, and humanness scores are proxies for that goal, not the goal itself.

A humanized antibody with 88% framework identity to human germline and well-predicted affinity is preferable to one with 92% framework identity and a 50-fold affinity loss that will require extensive affinity maturation campaigns to rescue. The added rounds of engineering introduce their own sequence changes that may reduce human character anyway.

The most useful thing computational humanization does is create a structured shortlist of <10 variants to synthesize and test experimentally, rather than requiring teams to choose somewhat arbitrarily among the exponentially many possible back-mutation combinations. Getting that shortlist right — meaning: first-round synthesis hits the affinity target — is where the computational investment pays off.