Compressing the Hit Identification Timeline: From 18 Months to Weeks

The Traditional Hit Identification Timeline for PPI Targets

To understand what computational pre-screening changes, it is necessary to be specific about what the traditional timeline looks like for a PPI hit identification campaign. The numbers are not hypothetical — they reflect the documented resource requirements of PPI programs run at mid-size biotechs and larger discovery organizations over the past decade.

A PPI target hit identification campaign using high-throughput screening (HTS) typically begins with assay development: adapting a biochemical or biophysical assay (usually a competition binding assay for PPI) to high-throughput plate format, optimizing signal-to-noise, establishing controls, and validating with known positive and negative compounds. For PPI targets, assay development often takes longer than for enzyme targets because competition displacement assays against protein-protein interactions require more careful optimization than enzyme substrate conversion assays. Assay development: 3–6 months.

Library acquisition and quality control for a 500K–1M compound HTS set, including vendor negotiation, QC by LC-MS, and reformatting into assay-ready plates, runs 2–3 months in parallel with assay development, but integration delays typically mean 1–2 months of added time in practice. The HTS campaign itself runs 4–8 weeks for a 500K compound library in 384-well format, generating roughly 0.1–0.5% hit rates (PPI targets are typically at the lower end of this range). Hit confirmation and deconvolution of compound aggregators, promiscuous binders, and fluorescence artifacts runs another 6–10 weeks. First validated confirmed hits: 9–12 months from project initiation.

At this point, the confirmed hit set is typically 50–200 compounds with rough SAR information from the primary screen but no structural understanding of binding mode, no knowledge of which compounds engage the hot-spot positions, and no prioritization framework for synthesis investment. Lead series identification from this hit set — ADMET filtering, initial structural characterization, preliminary SAR, selection of 2–3 series for optimization — adds another 3–6 months. Total to lead series identification from project start: 12–18 months.

What the Computational Pre-Screening Workflow Changes

The computational approach does not simply replace a slow step with a fast one — it reorders the sequence of decisions in a way that concentrates the expensive steps on a much smaller, better-justified compound set. The key insight is that computational screening is cheap in resources and fast in time, while experimental synthesis and biological characterization are expensive and slow. A workflow that maximizes computational pre-filtering before committing to synthesis gets the economics right in a way that HTS-first workflows do not.

The Genolux workflow for a new PPI target starts with interface characterization: hot-spot mapping, pocket geometry extraction, and MD validation. For a target with good structural data (multiple PDB entries, known pharmacology), this runs 2–3 weeks. For a target with only predicted structural data (AlphaFold2 model, limited experimental validation), 4–6 weeks, incorporating the additional uncertainty characterization steps. Interface characterization output: hot-spot residue map, binding pocket pharmacophore, confidence-weighted disruption scoring parameters.

Virtual library screening against the characterized interface runs next. A PPI-biased library of 50,000–200,000 compounds (from our curated fragment and lead-like sets, or from a partner company's compound library in SMILES format) is docked against the hot-spot-constrained binding site and scored with the disruption score function. For a 100K compound library, this runs approximately 3–5 days on a standard compute cluster — sub-week turnaround. Output: a ranked list of ~500–1,000 compounds in the top percentile, with per-compound structural rationale and flagged selectivity concerns.

The prioritized computational hit set then goes through a triage step: ADMET prediction filtering (solubility, membrane permeability, CYP liability, hERG risk), synthetic accessibility scoring, and clustering to ensure chemical diversity in the final synthesis list. This takes 1–2 days computationally. The result is a synthesis recommendation list of 50–150 compounds, each with a computational rationale for why it merits synthesis — specific hot-spot contacts predicted, estimated disruption score, ADMET risk flags, and suggested synthetic analogs for initial SAR exploration.

Synthesis of 50–150 compounds, from confirmed vendor availability or in-house synthesis, runs 4–8 weeks. Biophysical confirmation (SPR, ITC, or fluorescence displacement against the PPI target) on the synthesized set runs 3–4 weeks in parallel with ongoing synthesis. First confirmed binders, with structural information from binding mode prediction: 10–14 weeks from target characterization initiation.

A Concrete Scenario

Consider an early-stage oncology biotech team evaluating a KRAS/SOS1 PPI disruption program. The team has access to published structural data on the KRAS-SOS1 interface (multiple PDB entries are in the public domain), a proprietary compound library of approximately 80,000 SMILES, and in-house synthesis capacity for 100–200 compounds per quarter.

Running the traditional HTS approach on an 80K compound library is a smaller campaign than a typical pharmaceutical HTS, but still requires assay development, plate preparation, and the full confirmation workflow — estimating 9–12 months to first confirmed hit series. Running the computational pre-screening approach: KRAS/SOS1 interface characterization using published structures and MD validation takes 3 weeks. Virtual screening of the 80K library takes 2 days computationally plus 1 week for ADMET triage. Synthesis recommendation list of 120 compounds delivered at week 4. Synthesis and biophysical confirmation completed by week 14. First confirmed KRAS/SOS1 PPI disruptors identified, with binding mode predictions and initial SAR vectors, at approximately 3.5 months from project initiation.

The compression from 12–18 months to 3–4 months is not achieved by skipping the experimental confirmation — those experiments still happen. The compression comes from entering the experimental phase with 120 computationally prioritized compounds rather than screening 80,000 compounds to find the same 120. The synthesis investment is 120 compounds instead of 80,000 assay points. The hit rate in the 120-compound experimentally tested set will be higher (10–25% is a realistic target for a well-executed computational screen versus 0.1–0.5% for unbiased HTS) because the compounds were selected for interface engagement, not tested randomly.

What the Compression Does Not Do

We're not saying the computational workflow eliminates the need for experimental chemistry — it does not. Confirmed computational hits still need synthesis, biophysical characterization, structural confirmation where possible, and the full lead optimization cycle. ADMET challenges specific to PPI disruptors (a topic we treat in depth separately) are not resolved computationally — they must be addressed in the chemistry. The timeline compression applies to the hit identification phase, not to the lead optimization and preclinical development phases that follow it.

There is also a compound quality caveat. Computational screening identifies structural matches to the pharmacophore, not drug candidates. The compounds that emerge from pre-screening are starting points: they have predicted hot-spot engagement and acceptable computational ADMET profiles. Some will confirm experimentally. Some will not, due to factors the scoring function cannot capture — induced-fit effects, solubility artifacts in the biophysical assay, or scaffold-specific reactivity. The false positive rate for computational PPI screening is higher than for enzyme-targeted screening, in part because the experimental ΔΔG training data is sparser and the force field terms are less precisely calibrated for PPI interface geometries.

For discovery teams evaluating whether the computational-first approach makes sense for their PPI program, the key question is resource constraint: if the constraint is synthesis capacity and biophysical assay throughput (typical of early-stage biotechs), then computational pre-screening delivers maximum utility by focusing limited synthesis resources. If the constraint is time-to-data and the organization has access to large-scale HTS infrastructure, the calculus is different. For most pre-clinical PPI programs in resource-constrained settings, the computational-first approach compresses the timeline in a way that is meaningful for program timelines and funding decisions.