SAR Without Synthesis: Building Structure-Activity Relationships Computationally for PPI Targets

The Synthesis Bottleneck in PPI Lead Optimization

Structure-activity relationships in traditional drug discovery are built empirically: synthesize a compound, test it, note the activity change relative to the parent, synthesize the next analog. This cycle is the backbone of lead optimization, and it works — but it is resource-intensive. For enzyme targets, the cycle has been accelerated by structure-based design (use the crystal structure to predict which substitutions will improve binding before synthesizing) and by QSAR modeling (use activity data from a training set to predict activity of untested analogs). Neither of these accelerators works as well for PPI targets, for reasons that matter for how in-silico SAR should be approached.

Structure-based design for PPI targets is hindered by the induced-fit and ensemble issues described elsewhere in this series: the binding site is dynamic, the scoring functions are less precise than for enzyme pockets, and experimental structure determination of a small molecule bound to a PPI interface is harder than crystallizing an enzyme-inhibitor complex (PPI interfaces are less conformationally stable and ligands bound to them often do not produce crystals of sufficient quality). Classical QSAR modeling is limited by data availability: PPI compound activity datasets tend to be smaller than enzyme inhibitor datasets, reducing the statistical power of regression-based SAR models.

In-silico SAR approaches — specifically matched molecular pair (MMP) analysis, Free-Wilson decomposition, and scaffold-hopping based on 2D and 3D similarity — address this by asking different questions. Instead of predicting absolute activity from structure, they characterize the activity landscape around a confirmed hit: which structural changes increase, decrease, or preserve activity? Which chemical features are obligatory for hot-spot engagement? Where is the SAR flexible enough to allow modification for ADMET improvement without losing interface disruption activity?

Matched Molecular Pair Analysis for PPI Series

Matched molecular pair analysis (MMPA) identifies pairs of compounds that differ by a single well-defined structural transformation — for example, a chloro-to-fluoro substitution, a methyl-to-ethyl elaboration, or a ring nitrogen scan — and measures the consistent activity change associated with that transformation across all pairs in a dataset. The analysis was originally developed for ADMET optimization (identifying transformations that consistently improve solubility or metabolic stability), but it is equally applicable to potency SAR when a dataset of sufficient size is available.

For PPI targets, MMPA faces a data availability constraint: most early PPI series have 20–100 confirmed compounds, which is on the small side for classical MMPA statistical analysis. The adapted approach we use at this scale is prospective MMP enumeration: for a confirmed PPI hit scaffold, computationally enumerate the set of all single-transformation analogs within a defined transformation library (a set of medicinal chemistry-appropriate transformations verified to be synthetically feasible in 1–3 steps from the parent), and predict the structural consequence of each transformation for hot-spot engagement using docking and disruption scoring.

This gives a prospective SAR map: for each position on the scaffold, which transformations are predicted to maintain or improve hot-spot contact, and which are predicted to displace functional groups from the binding pharmacophore. The map is a hypothesis, not a measurement — it will have errors in proportion to the limitations of the scoring function and the accuracy of the binding mode prediction. But a hypothetical SAR map generated in 2 days computationally, directing synthesis toward the 20 most promising analogs out of 200 enumerated, is more resource-efficient than synthesizing all 200 and measuring the SAR empirically.

An Example Application

Consider a MDM2-p53 disruptor hit with a confirmed binding mode where the F19-contact position is occupied by a chloroindole moiety and the W23 contact position is a cyclopentyl group. Prospective MMP enumeration for this scaffold generates analogs that vary: the chloro substitution pattern on the indole ring, the ring size of the W23-contact cycloalkyl, the linker geometry between the two contact fragments, and the nature of the solubilizing group attached to the scaffold periphery.

Computational scoring of these analogs against the MDM2 hot-spot pharmacophore identifies a subset where the W23-contact cycloalkyl can be substituted by a cyclobutyl without significant loss of predicted hot-spot contact (the sub-pocket volume accommodates the smaller ring), and where the chloroindole F19-contact survives several ring substitution variants. The solubilizing group variants that maintain acceptable disruption score while improving predicted aqueous solubility (by adding a polar ionizable group at a scaffold position predicted to be solvent-exposed) are flagged as high-priority synthesis candidates. The synthesis list contracts from 200 enumerated analogs to approximately 25 high-priority compounds, with explicit structural rationale for each selection.

Free-Wilson Decomposition for PPI Series

Free-Wilson analysis is a mathematical decomposition that partitions the observed activity of a compound into additive contributions from each substituent position. The Free-Wilson model assumes that substituent effects are additive (i.e., the activity contribution of substituent A at position R1 is independent of which substituent is at R2). This assumption is imperfect — there are well-documented cases of cooperativity and anti-cooperativity between substituent positions — but it is useful as a first-approximation framework for understanding which positions dominate SAR and which positions show the greatest activity sensitivity per structural change.

For computational Free-Wilson, the observed activity values are replaced by predicted disruption scores or predicted ΔΔG values from our computational models. The resulting Free-Wilson coefficients identify which positions in the scaffold have the largest predicted effect on PPI disruption activity — in other words, where the computational SAR is most sensitive. These are the positions that should be prioritized for experimental synthesis and measurement, because they have the highest information content for refining the SAR model.

The limitation of Free-Wilson at PPI targets is the additivity assumption. PPI hot-spot contacts are often geometrically coupled: the W23-contact group and the F19-contact group in an MDM2 scaffold occupy parts of the sub-pocket that are physically adjacent, and substitutions at one position can affect the accessible geometry at the other. For positions where we predict coupling — identified by calculating the per-position disruption score variance as a function of changes at adjacent positions — we flag the Free-Wilson coefficients as lower confidence and recommend pairwise synthesis to characterize the interaction term directly.

Scaffold Hopping: Extending Computational SAR Beyond One Chemotype

Matched molecular pair analysis and Free-Wilson decomposition operate within a single scaffold series. A complementary computational SAR approach is scaffold hopping: using the pharmacophore and binding mode information from a confirmed hit series to identify structurally distinct compounds that reproduce the key hot-spot contacts with a different molecular framework.

For PPI targets, scaffold hopping is particularly valuable because PPI disruptor hits often have ADMET liabilities tied to specific scaffold-level properties (a piperidinyl group contributing hERG risk, an indole contributing autofluorescence interference in cellular assays, a linker conferring metabolic lability). Switching scaffolds while preserving the pharmacophore features provides a path to improved ADMET profile without requiring residue-by-residue optimization of the problematic scaffold.

Computational scaffold hopping for PPI targets uses the characterized hot-spot pharmacophore as the query for searching alternative scaffolds. 3D pharmacophore searching in virtual libraries, fragment-based scaffold growth from confirmed hot-spot contacts, and shape-based similarity searching against the docked pose of the confirmed hit all provide distinct but complementary methods for identifying scaffold alternatives. The output is a set of structurally diverse scaffold options, each with a predicted binding mode that satisfies the pharmacophore, suitable for synthesis and biophysical testing to validate whether the scaffold hop achieves the target ADMET improvement.

What In-Silico SAR Cannot Replace

We're not saying that in-silico SAR eliminates the need for experimental SAR — it accelerates and focuses it. Computational predictions are hypotheses; the direction of SAR exploration they recommend needs experimental validation. The value is in reducing the number of compounds synthesized before the SAR map is clear enough to direct medicinal chemistry decisions with confidence.

There is a feedback loop that makes this work best in practice: early experimental data from a small, computationally prioritized compound set is used to recalibrate the computational SAR model. If the observed activity for the first 20 synthesized analogs correlates well with predicted disruption scores (r² > 0.55 is a realistic threshold for early SAR), the model is confirmed as useful and subsequent rounds of synthesis selection can rely more heavily on computational prioritization. If the correlation is poor, that itself is information — typically indicating that the binding mode prediction is incorrect or that an experimental artifact (aggregation, solubility) is confounding the activity data. In either case, the early experimental feedback informs how heavily to weight the computational SAR in subsequent cycles.