Why Standard Docking Scores Fail at PPI Interfaces
Before describing what our disruption scoring function does, it is worth being precise about what it is replacing and why that replacement is necessary. The dominant scoring functions in commercial and open-source docking platforms — GlideScore, AutoDock Vina's semi-empirical potential, Gold's ChemScore implementation — all share a common ancestry in enzyme-ligand binding prediction. They were parameterized, tested, and refined against datasets of enzyme-inhibitor complexes where the binding geometry involves a ligand inserted into a concave protein pocket. Their energetic terms reward shape complementarity to concave surfaces, hydrogen bond formation in enclosed environments, and hydrophobic burial in well-defined cavities.
When these functions are applied to PPI interface docking, several failure modes are predictable from first principles. The interface is not a cavity — there is no enclosed volume for the compound to enter. Hydrophobic burial is partial and distributed rather than complete. Hydrogen bond geometry is less constrained by surface curvature. The shape complementarity term, calibrated on enzyme pockets, fires inconsistently on shallow interfaces. The result is that rank-ordering of compounds by standard docking scores at PPI interfaces shows poor correlation with experimental binding or displacement data — enrichment factors at cutoffs relevant to practical screening are frequently near unity.
The Genolux binding-pocket disruption score addresses this by replacing the enzyme-optimized scoring terms with a function built specifically for the energetic and geometric characteristics of PPI hot-spot sub-pockets.
The Architecture of the Disruption Score
The disruption score is a composite of four weighted terms, each targeting a different aspect of hot-spot engagement. We label the terms: hot-spot contact score (H), interface electrostatic complementarity (E), buried surface area contribution (B), and strain-adjusted binding pose penalty (S). The full score takes the form:
D_score = w₁·H + w₂·E + w₃·B − w₄·S
The weights w₁–w₄ are not fixed constants — they are interface-class-specific. Different PPI families have characteristically different hot-spot geometries: MDM2-class interfaces present deep hydrophobic sub-pockets; BH3-recognition interfaces combine hydrophobic contacts with backbone hydrogen bonds to the groove; STAT-class SH2 interfaces have a strong electrostatic component from phosphotyrosine mimicry requirements. Running the same weight set across all PPI classes is one of the recurring errors in adapted scoring approaches. Our weights are calibrated separately for each structural family in our target library using experimental ΔΔG datasets as the training reference.
Hot-Spot Contact Score (H)
The H term is computed by first identifying hot-spot residues through alanine scanning ΔΔG prediction using Rosetta's InterfaceAnalyzer application. Residues contributing more than 1.5 REU to the total interface score upon in-silico alanine substitution are classified as hot-spot positions. The H term then rewards compounds that achieve direct van der Waals contact with classified hot-spot side chains, weighted by the per-residue ΔΔG contribution of each hot spot.
This is mechanistically different from rewarding total contact surface area. A compound that makes extensive contact with non-hot-spot interface residues but misses the hot-spot positions will score well by total contact area and poorly by H. Empirically, the H term is the highest-weight component for most PPI targets — hot-spot engagement is the primary determinant of whether a compound can actually destabilize the interface or merely sits on the surface without displacing the native binding partner.
Interface Electrostatic Complementarity (E)
The E term uses a simplified Poisson-Boltzmann surface to characterize the electrostatic potential of the hot-spot sub-pocket. The compound's electrostatic potential surface is evaluated for complementarity with the pocket potential using a Pearson correlation metric on matched surface patches. This term primarily captures cases where charged or polar functional groups are required for hot-spot engagement — the F19/W23/L26 sub-pocket of MDM2 is predominantly hydrophobic and the E weight is correspondingly low, while the phosphotyrosine-mimicry requirement in some SH2-domain PPIs makes the E term dominant.
Buried Surface Area Contribution (B)
B is a conventional solvent-accessible surface area calculation, but evaluated only over the hot-spot sub-pocket region rather than the full interface. We use a reduced-radii probe (1.2 Å versus the standard 1.4 Å) to capture partial burial geometries that are common at PPI sub-pockets but poorly represented in enzyme-pocket parameterizations. The B term correlates reasonably well with the thermodynamic hydrophobic burial penalty for hot-spot displacement — compounds that fail to bury substantial hot-spot surface have systematically higher Kd values in fragment displacement assays.
Strain-Adjusted Pose Penalty (S)
The S term is a conformational strain correction that addresses one of the practical problems with PPI-targeting compounds: they tend to be larger, more flexible, and more lipophilic than typical kinase inhibitor scaffolds. Flexible compounds adopt high-strain conformations in flat binding sites that the standard docking workflow does not adequately penalize. We compute the strain energy of the docked pose relative to the lowest-energy conformer in the gas phase using a MMFF94s force field energy calculation. Poses where strain exceeds a target-class threshold are penalized, preventing large conformationally flexible compounds from generating artificially favorable scores through high-strain contacts.
Calibration and Validation Approach
The weight vector for each PPI structural class is derived by minimizing the rank-order disagreement between D_score rankings and experimental data. For hot-spot ΔΔG calibration, we use curated single-point alanine scanning datasets from the literature — the SKEMPI database is one reference set, though we supplement it with interface-specific experimental compilations for targets in our focus oncology panel. For compound ranking validation, we use the limited but informative sets of experimental fragment displacement IC₅₀ values available for well-characterized PPI targets such as MDM2-p53 and BCL-2/BH3 interfaces.
We want to be direct about the limitations of this calibration. Experimental ΔΔG datasets for PPI interfaces are substantially smaller than enzyme-ligand binding datasets — there are on the order of hundreds of well-curated data points for oncology PPI systems versus tens of thousands for kinase inhibitors. The weight optimization is therefore less statistically well-grounded than we would like, and we expect recalibration as more experimental data becomes available from high-throughput PPI disruption assays. The score is a ranking function, not a binding affinity predictor — it should be used to prioritize compounds for synthesis, not to predict Kd values.
We also benchmark enrichment factors explicitly: for a representative MDM2-p53 screening set with confirmed active and inactive compounds, D_score achieves an enrichment factor of approximately 4.5× at the top-10% cutoff versus approximately 1.2× for an unadapted standard docking score on the same set. This is not definitive validation — the test set is small and the actives/inactives are not a fully balanced experimental sample — but it demonstrates that the adapted function is performing directionally as intended.
What the Score Does Not Do
There are important things the disruption score does not and cannot predict from docking geometry alone. It does not predict membrane permeability or metabolic stability — those require separate ADMET modeling and are a distinct problem (PPI disruptors have their own ADMET challenges that we treat separately). It does not capture induced-fit effects at the interface: some compounds induce allosteric changes in the hot-spot sub-pocket geometry that a rigid-receptor docking calculation will miss. For targets where significant induced fit has been documented crystallographically — certain BCL-xL variants, for instance — we flag this in the interface characterization report and recommend follow-up with flexible receptor ensemble docking.
The score also does not predict selectivity across related PPI targets within the same structural family. A compound scoring well for MDM2-p53 disruption might also engage MDMX-p53, which has a related but structurally distinct interface. Cross-reactivity at related PPIs requires explicit counter-screening, and we surface this risk in the output by flagging compounds whose docked pose geometry is consistent with multiple family members.
Every scoring function embeds assumptions about which energetic terms matter most for the problem it was designed for. Those assumptions are visible and described here because a scoring function you can interrogate is more useful to a medicinal chemistry team than one that produces numbers without mechanistic grounding. The disruption score is a prioritization tool built with explicit mechanistic intent — it is designed to rank compounds by hot-spot engagement, not to simulate binding thermodynamics.