Industry

The PDB in 2025: How Structural Coverage of Oncology PPI Complexes Has Expanded

Dr. Ravi Sundaram ·
The PDB in 2025: How Structural Coverage of Oncology PPI Complexes Has Expanded

Structural Coverage as a Foundation for Computational PPI Work

Computational PPI modeling is fundamentally dependent on structural data. Hot-spot identification, pharmacophore generation, docking, and scoring all require starting coordinates for the PPI complex. The quality, completeness, and diversity of PDB structural coverage for a target system directly determines the confidence and resolution of the computational predictions that can be made about it. Before assessing what the PDB contains for oncology PPI targets, it is worth being explicit about what "useful" structural coverage means for computational purposes.

A single crystal structure of a PPI complex — even at high resolution — is often insufficient for productive computational work. What is needed: the apo form of at least one binding partner (to understand the unbound conformation), the bound complex (to define the interface geometry), ideally multiple crystal structures with different bound peptides or small molecules (to characterize the flexibility envelope of the binding site), and structures representing different functional states of the protein if conformational change is known to accompany binding. For well-characterized targets like MDM2-p53, this level of coverage exists. For many other oncology PPI targets, the coverage is substantially thinner.

The expansion of the PDB over the past five years has been substantial. Total PDB entry count grew by approximately 25% between 2020 and 2024, driven by cryo-EM technical advances that have enabled structure determination of protein complexes that resisted crystallization. The growth in oncology-relevant PPI complex structures has been uneven — certain well-funded target families have accumulated rich structural datasets while others remain underrepresented.

Quantitative Coverage: The Well-Served Targets

The MDM2/MDM4-p53 interface is the best-served oncology PPI system in the PDB. The combined set of MDM2-p53 transactivation domain structures (peptide complexes and small-molecule-bound forms) runs to well over one hundred deposited entries as of 2025. The structural diversity covers: MDM2 with various p53 peptide variants, MDM2 with clinical-stage small-molecule inhibitors, MDMX (MDM4) structures for comparative analysis, and engineered variants that clarify structure-activity relationships. This structural richness is what enables high-confidence computational work on this target — the pharmacophore is consistently defined across many crystal conditions, and the structural uncertainty in hot-spot geometry is quantifiable.

The BCL-2 family is the second most structurally well-characterized PPI family in the oncology PDB. BCL-2, BCL-xL, MCL-1, BCL-W, and BFL-1/A1 are all represented with multiple structures including BH3 domain peptide complexes and small-molecule-bound forms. The venetoclax (ABT-199)-BCL-2 crystal structure (PDB 6O0K and related entries) has been particularly valuable for the field — it established the structural basis of selective BCL-2 inhibition over BCL-xL and rationalized the selectivity SAR observed in the clinical program. The MCL-1 family has accumulated substantial structural coverage more recently, driven by the interest in overcoming BCL-2 inhibitor resistance that occurs through MCL-1 upregulation.

BRD4 bromodomain structures, while technically a protein-ligand rather than protein-protein interaction target for BET inhibitors, have accumulated extraordinary structural coverage — over 400 PDB entries for BRD4 bromodomain structures in various states. For the BRD4-MED1/MED26 PPI interface relevant to transcriptional regulation in MYC-driven cancers, coverage is thinner, but growing as interest in direct PPI disruption at this interface has increased.

The Coverage Gaps That Matter for Computational Work

Several high-value oncology PPI targets remain structurally underserved. The gaps fall into identifiable categories.

Intrinsically disordered interaction partners. Many oncology PPI interfaces involve one partner that is intrinsically disordered outside the complex context. The MYC/MAX interaction involves the MYC basic helix-loop-helix leucine zipper domain; while the MYC-MAX heterodimer structure has been determined, the conformational landscape of the MYC dimerization surface outside the complex is not well-characterized by crystallography. This limits the ability to identify hot-spot sub-pockets in the apo state — a necessary step for designing disruptors that engage the interface before the complex forms. NMR structures and MD-derived ensembles partially compensate, but the structural uncertainty is higher than for rigid-domain interfaces.

Large, flexible multi-domain complexes. Several relevant oncology PPI targets are embedded in larger multi-protein assemblies where the biologically relevant complex is too large or too flexible for X-ray crystallography and has only partially been characterized by cryo-EM. The beta-catenin/transcription factor complex in the Wnt pathway, for instance, involves beta-catenin engaging multiple partners including TCF/LEF transcription factors through a large disordered transactivation domain. The specific hot-spot contacts within this complex are not fully resolved by available cryo-EM structures, and the computational characterization of this interface has correspondingly higher uncertainty than MDM2-p53.

KRAS effector interfaces. Despite the explosion of interest in KRAS-targeting strategies following the AMG 510 (sotorasib) approval, the structural coverage of KRAS-effector PPI interfaces in oncologically relevant states remains limited. The KRAS-RAF interface, the KRAS-PI3K interface, and the KRAS-SOS1 interface each have representative structures, but coverage of the KRAS-G12C versus KRAS-WT versus other mutant conformational states in complex with each effector partner is incomplete. Computational work on these interfaces requires careful attention to which mutant background is being modeled and whether the available structure is in the GDP-bound, GTP-bound, or nucleotide-free state — the hot-spot geometry differs across these functional states.

Cryo-EM's Contribution to PPI Coverage

Cryo-EM has been the most significant technical development for PPI structural coverage in the past five years. The technique enables structure determination of large, flexible, and heterogeneous protein complexes that resist crystallization — precisely the class of targets that has been structurally underrepresented in the PDB. Cryo-EM structures of the SWI/SNF chromatin remodeling complex, the mediator complex, and various GPCR-G protein assemblies have provided structural views of PPI interfaces that were previously accessible only through low-resolution electron microscopy or indirect biochemical evidence.

The limitation for PPI drug discovery is resolution. While cryo-EM routinely achieves 2–3 Å resolution for well-behaved, large homogeneous particles, the interface regions between proteins within a complex are often at the flexible periphery of the particle, where local resolution drops to 4–6 Å or worse. At 4–6 Å resolution, individual residue positions and side chain orientations are not reliably resolved — sufficient to assign secondary structure and domain contacts, insufficient for the precise hot-spot residue geometry needed for pharmacophore generation.

For cryo-EM-derived PPI interface structures, our computational workflow incorporates a model quality assessment step that evaluates local resolution at the interface using standard B-factor and real-space R-factor metrics, and flags interface regions where resolution is insufficient for confident hot-spot geometry prediction. In these cases, the cryo-EM structure provides the overall complex architecture and approximate contact geometry, while Rosetta-based protein-protein docking and molecular dynamics are used to generate higher-confidence interface models at the hot-spot level.

Where AlphaFold2-Multimer Fills Coverage Gaps

For PPI targets with thin experimental structural coverage, AlphaFold2-Multimer predictions have become an important starting point for computational characterization — subject to the confidence caveats described in an earlier post. The practical contribution is most significant for PPI pairs where one partner has abundant sequence homologs (providing evolutionary information for the structure prediction) and the interaction interface involves a structured domain rather than an intrinsically disordered region. For these cases, AlphaFold2-Multimer predictions approach near-experimental accuracy for the overall complex fold and provide a usable starting model for interface characterization.

The community has begun depositing AlphaFold2-Multimer predictions in the PDB under a specific annotation category, and the searchable AlphaFold database now includes predicted complex structures for thousands of known PPI pairs. This substantially expands the "structural coverage" of PPI systems, though users must apply different confidence standards to predicted versus experimental structures. For our purposes, predicted structures are treated as higher-uncertainty starting models requiring explicit MD validation before being used as the basis for pharmacophore generation or compound screening.

The current state of oncology PPI structural coverage is the best it has ever been, and the rate of improvement is accelerating. The significant coverage gaps are identifiable and addressable — either through targeted experimental structure determination for the highest-value targets, or through careful computational modeling workflows that acknowledge and propagate the structural uncertainty appropriately. The bottleneck for computational PPI modeling in 2025 is not structural coverage for the most-studied targets; it is the conversion of structural models into validated druggability predictions for the expanding set of targets where structural data is now available but pharmacological characterization has not yet followed.