Virtual Screening vs. PPI-Focused Docking: Why Standard Libraries Fall Short

The Problem with Applying Standard Pipelines to Non-Standard Targets

Virtual screening as practiced in most discovery organizations was designed and optimized around a specific target geometry: the enzyme active site. SMILES-formatted compound libraries are filtered by Lipinski-derived rules, docked into receptor structures using grid-based potentials, scored by empirical or physics-based functions calibrated to enzyme-ligand datasets, and ranked. The workflow is fast, reproducible, and reasonably effective for enzyme targets with well-defined binding cavities.

When discovery teams apply this workflow to PPI targets without modification — and this happens more often than practitioners typically acknowledge publicly — the results are systematically worse than what the enrichment factors on enzyme targets would predict. The failure is not random; it is structured, and understanding the source of each failure mode is essential for designing a replacement workflow that is actually fit for PPI interface screening.

There are three distinct points at which the standard virtual screening pipeline breaks down for PPI targets: library design, docking geometry, and scoring function calibration. Each requires a separate fix, and addressing only one without the others produces partial improvement at best.

Library Design: The Wrong Chemical Space

Standard lead-like and drug-like virtual screening libraries are composed primarily of compounds that satisfy Lipinski's Rule of Five: molecular weight <500 Da, cLogP <5, hydrogen bond donors <5, hydrogen bond acceptors <10. These rules were derived from analysis of orally bioavailable drugs that were predominantly enzyme inhibitors — they describe the chemical space that works for enzyme pockets.

PPI disruptors occupy a different region of chemical space. Successful PPI-targeting small molecules tend to be larger (MW 450–700 Da), more lipophilic (cLogP 3–6), and more conformationally extended than typical kinase inhibitors. They need the molecular surface area to make contact across a broader, shallower binding surface, and they need the hydrophobicity to displace water from the hot-spot sub-pocket. The fragment-based lead discovery community recognized this earlier than the HTS community — fragments that progress against PPI targets tend to be more lipophilic and structurally more three-dimensional than typical enzyme-targeting fragments.

A standard commercial SMILES library of 1–5 million compounds screened against a PPI target will contain a systematically impoverished representation of the chemical space most likely to produce hits. The three-dimensional shape diversity (measured by PMI or SFP fingerprint distributions) in typical libraries is biased toward flat, aromatic compound populations that match enzyme-pocket geometry but poorly cover the extended shape descriptors that characterize PPI-interacting compounds. Using such a library as the input for PPI screening is analogous to searching for a large molecular key with a catalog of small keys — even a perfect scoring function cannot compensate for the absence of the right molecular scaffolds in the input.

A PPI-biased screening library needs: higher average MW than drug-like libraries, broader cLogP distribution extending to higher values, enrichment for compounds with 3D shape descriptors (PMI ratio plots shifted toward spherical/rod-shaped rather than disk-shaped), and pharmacophore features consistent with hot-spot sub-pocket occupation. Fragment libraries designed for PPI targets additionally need to favor MW 200–350 Da fragments with sufficient hydrophobicity to engage hydrophobic PPI sub-pockets — a different design criterion from the typical fragment library MW <300 Da, cLogP <3 specification.

Docking Geometry: Pocket Definition Failures

Standard docking protocols require defining a binding box or grid that encompasses the binding site. For enzyme pockets, this is straightforward — the pocket is concave, clearly delimited, and the docking grid can be placed inside the cavity with reasonable confidence that productive binding geometries will be sampled.

For PPI interfaces, the binding site definition problem is non-trivial. The interface is typically a large, relatively flat surface. Without explicit hot-spot sub-pocket identification, the docking box will be placed over the entire interface region, giving the docking algorithm enormous degrees of freedom to place compounds in orientations that cover any part of the interface surface. The result is that the top-scoring docked poses are not necessarily those that engage the hot-spot positions — they are those that optimize whatever the scoring function rewards, which, without PPI-specific calibration, means low-energy contact with non-hot-spot interface regions.

PPI-focused docking protocols constrain the binding box to the hot-spot sub-pocket region identified through computational alanine scanning. This is a qualitative change in the docking problem: instead of asking "where does this compound prefer to sit on this interface surface," the protocol asks "how well does this compound fit into this specific sub-pocket that has been identified as functionally critical." The docking search space is smaller, the pose discrimination is more accurate, and the resulting ranks are more meaningful.

A practical test of this difference: consider screening a compound set against MDM2 using an interface-wide docking box versus a hot-spot-constrained box focused on the F19/W23 sub-pocket. With the interface-wide box, compounds distribute across the entire MDM2 binding surface and the top-scored compounds are disproportionately those that engage non-hot-spot peripheral regions where the surface topography happens to be more receptive to standard docking geometries. With the hot-spot-constrained box, the same compounds are ranked by their ability to occupy specifically the sub-pocket that matters for p53 displacement. The enrichment of known MDM2 disruptors at the top of the ranked list improves substantially with the constrained approach — not because the scoring function changed, but because the binding site definition changed.

Scoring Functions: Calibration Mismatch

The third failure mode is scoring. Standard docking scores were parameterized on enzyme-ligand complexes and reward terms that are predictive for enzyme binding but inconsistently predictive for PPI binding. The hydrophobic burial term, which is one of the strongest contributors to standard scoring function performance on enzyme targets, fires differently on PPI sub-pocket geometries because the burial is partial rather than complete. The shape complementarity term, which rewards snug fit into a concave cavity, generates noise rather than signal on relatively flat surfaces.

We're not saying standard scoring functions are wrong — they are correctly calibrated for the data they were trained on. The problem is applying a scoring function outside its calibration domain. The remedy is not to create a novel scoring function from scratch, but to recalibrate the energetic weights and terms to PPI-specific training data. This requires experimental reference data for PPI interfaces — specifically, displacement IC₅₀ values or binding affinities for known PPI disruptors — which is a smaller dataset than enzyme-ligand data but is sufficient for recalibration when target-class-specific tuning is applied.

One specific term that requires particular attention is the desolvation penalty. Displacing water from a PPI hot-spot sub-pocket contributes favorably to binding free energy in a way that standard solvation models, calibrated on enclosed enzyme pockets, often underestimate for partially exposed PPI surface geometries. The Poisson-Boltzmann and generalized-Born solvation models used in most docking pipelines treat the solvent environment as a continuum that is inappropriate for the highly localized, partially exposed hot-spot sub-pocket geometries typical of PPI targets.

Fragment-Based Approaches as a Complement, Not a Replacement

Fragment-based lead discovery (FBLD) has a better track record at PPI targets than HTS or standard virtual screening, for reasons that are mechanistically related to the failure modes described above. Fragment screens implicitly overcome the library design problem by sampling smaller, more diverse molecular scaffolds that can explore the accessible PPI sub-pocket geometry without being constrained by the size and rigidity of lead-like compounds. Fragment binding affinities at PPI hot spots tend to be weak (Kd in the 1–10 mM range) but detectable by biophysical assay, and the fragment binding mode provides direct structural information for lead elaboration.

The limitation of experimental fragment approaches at PPI targets is throughput: biophysical fragment screening against challenging targets requires significant resource investment, and the fragment-to-lead optimization phase is slow without a reliable computational model of the binding site. Computational fragment docking, constrained to the characterized hot-spot sub-pocket with PPI-calibrated scoring, can extend the fragment approach to larger virtual libraries and accelerate the initial prioritization step before committing to experimental synthesis.

The optimal workflow for most PPI targets combines computational pre-screening of a PPI-biased virtual library to generate a prioritized set for synthesis, followed by biophysical characterization of a smaller, focused experimental compound set. This is a fundamentally different resource allocation than standard HTS: the computational screening is doing more discriminating work upfront, the experimental synthesis investment is smaller and better-justified, and the resulting hit-to-lead progression is grounded in a coherent structural model of the interface.