Clinical Validation Study

Clinical Validation of CN-Suite

Machine Learning-Based Directed Network Analysis
for Drug-Resistant Epilepsy Surgery

A Retrospective, Multicenter, Observational Study
across Four U.S. Level-4 Epilepsy Centers

N = 60 patients 4 validation sites Pre-locked algorithm 12-month Engel outcomes

Background

The Clinical Challenge

Drug-resistant epilepsy (DRE)[20]Kwan P et al. Definition of drug resistant epilepsy: consensus proposal by the ILAE. Epilepsia. 2010;51(6):1069-1077. DOI affects ~20 million people worldwide[1,2]World Health Organization. Epilepsy Fact Sheet. Geneva: WHO; 2024. LinkKwan P, Brodie MJ. Early identification of refractory epilepsy. N Engl J Med. 2000;342(5):314-319. DOI. Surgical intervention is the only curative option, but success rates range from only 30-70% depending on etiology and localization accuracy[3,4]Wiebe S et al. A randomized, controlled trial of surgery for temporal-lobe epilepsy. N Engl J Med. 2001;345(5):311-318. DOITellez-Zenteno JF et al. Long-term seizure outcomes following epilepsy surgery: a systematic review and meta-analysis. Brain. 2005;128(5):1188-1198. DOI. Failures carry continued seizure burden, elevated SUDEP risk[6]Harden C et al. Practice guideline summary: SUDEP incidence rates and risk factors. Neurology. 2017;88(17):1674-1680. DOI, and multi-million-dollar societal costs per patient[7,8,9]Langfitt JT et al. Health care costs decline after successful epilepsy surgery. Neurology. 2007;68(16):1290-1298. DOIBegley CE, Durgin TL. The direct cost of epilepsy in the United States. Epilepsia. 2015;56(9):1376-1387. DOIChoi H et al. Epilepsy surgery for pharmacoresistant temporal lobe epilepsy: a decision analysis. JAMA. 2008;300(21):2497-2505. DOI.

30-70%

Surgery success range

~65M

People with epilepsy globally[1]

⅓

Are drug-resistant

Core bottleneck: Surgical planning relies on subjective visual analysis of intracranial EEG (iEEG), which is unstandardized and fails to capture the complex network dynamics underlying seizure generation[10].

Background

Existing Computational Approaches

Contemporary surgical decision-making increasingly views epilepsy as a network disorder[10,21]Kramer MA, Cash SS. Epilepsy as a disorder of cortical network organization. Neuroscientist. 2012;18(4):360-372. DOISpencer SS. Neural networks in human epilepsy: evidence of and implications for treatment. Epilepsia. 2002;43(3):219-227. DOI, and several computational approaches have been developed to augment the interpretation of iEEG data:

Undirected Methods

High-frequency oscillations (HFOs, 80-500 Hz)[16,17]Jacobs J et al. High-frequency oscillations (HFOs) in clinical epilepsy. Prog Neurobiol. 2012;98(3):302-315. DOIFrauscher B et al. High-frequency oscillations: the state of clinical research. Epilepsia. 2017;58(8):1316-1329. DOI - the most extensively validated iEEG biomarker for epileptogenic zone localization. HFO resection rates correlate with seizure outcomes.

Functional connectivity metrics - eigenvector centrality, degree centrality[22]Bullmore E, Sporns O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci. 2009;10(3):186-198. DOI, and phase-locking values[23]Lachaux JP et al. Measuring phase synchrony in brain signals. Hum Brain Mapp. 1999;8(4):194-208. DOI identify highly connected nodes but cannot infer directionality.

Directed Methods

Granger causality[18]Blinowska KJ et al. Granger causality and information flow in multivariate processes. Phys Rev E. 2004;70(5):050902. DOI, partial directed coherence[24]Baccala LA, Sameshima K. Partial directed coherence: a new concept in neural structure determination. Biol Cybern. 2001;84(6):463-474. DOI, and information-theoretic approaches[19]Wilke C et al. Graph analysis of epileptogenic networks in human partial epilepsy. Epilepsia. 2011;52(1):84-93. DOI have been used to infer causal relationships in seizure networks.

EZTrack[13]U.S. FDA. 510(k) Premarket Notification K201910: EZTrack. Silver Spring, MD: FDA; 2021. FDA is the only FDA-cleared computational tool that analyzes iEEG network dynamics. It computes undirected eigenvector centrality from broadband iEEG correlation matrices, identifying highly connected network hubs without inferring causal directionality.

Gap: Undirected methods cannot distinguish primary "driver" regions from secondary "responder" regions. No existing platform has combined directed connectivity analysis with a locked, pre-specified machine learning classifier validated in multiple independent centers.

Platform Overview

How CN-Suite Works

CN-Suite is a Software as a Medical Device (SaMD) that computes quantitative "criticality scores" for each brain region sampled by intracranial electrodes. Instead of relying on visual pattern recognition, it quantifies directed information flow between neural signals to distinguish seizure "drivers" from passive "responders."

Signal Processing

Delay-adjusted wavelet-based transfer entropy (dWTE)[11,25]Schreiber T. Measuring information transfer. Phys Rev Lett. 2000;85(2):461-464. DOIGourevitch B, Eggermont JJ. Evaluating information transfer between auditory cortical neurons. J Neurophysiol. 2007;97(3):2533-2543. DOI - a nonlinear, information-theoretic measure that captures directional causality between neural signals across time-frequency scales. Produces a pairwise directed connectivity matrix for each seizure epoch.

Directed Nonlinear Information-theoretic

Classification

XGBoost classifier[12]Chen T, Guestrin C. XGBoost: A scalable tree-boosting system. ACM SIGKDD. 2016:785-794. DOI - ensemble machine learning that maps connectivity features to a criticality score (0-1) per contact per seizure. Scores are rescaled to 0-10 for clinical use; above the threshold (1 on the clinical scale), a contact is classified as a causal "driver." Model weights were locked before validation.

Pre-locked Reproducible Hash-verified

Beyond surgery: Because the underlying network causality framework quantifies directed information flow rather than simply localizing seizure onset, it extends naturally to neuromodulation targeting for responsive neurostimulation (RNS) and deep brain stimulation (DBS), where optimal electrode placement within distributed seizure networks remains an unresolved clinical challenge.

Methods

Study Design

Retrospective, multicenter, observational, single-arm performance study using sEEG recordings[26]Isnard J et al. French guidelines on stereoelectroencephalography (SEEG). Neurophysiol Clin. 2018;48(1):5-13. DOI. The algorithm was trained on an independent cohort (N=37) from 3 centers and hash-locked before validation on 60 patients from 4 different centers. Zero overlap between training and validation data.

Role	Centers	N
Training	BCH, NIH, UMMC	37
Validation	HUP, JHH, UMF, TCH	60

Inclusion Criteria

Age ≥ 3 years - Focal/multifocal DRE[20]Kwan P et al. Definition of drug resistant epilepsy: ILAE consensus proposal. Epilepsia. 2010. DOI - Curative-intent resection or ablation - ≥ 3 stereotyped seizures captured during iEEG - ≥ 12-month follow-up with Engel classification[5]Engel J Jr. Outcome with respect to epileptic seizures. In: Surgical Treatment of the Epilepsies. 2nd ed. Raven Press; 1993:609-621. - Detailed operative notes mapping the surgical zone

Outcome Measure

Standardized effect size (Cohen's d)[28]Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Lawrence Erlbaum Associates; 1988. DOI of the patient-level interpretability ratio between favorable (Engel I-II)[5]Engel J Jr. Outcome with respect to epileptic seizures. In: Surgical Treatment of the Epilepsies. 2nd ed. Raven Press; 1993:609-621. and unfavorable (Engel III-IV) outcome groups.

Results - Cohort

Patient Demographics

75 records screened; 60 met all eligibility criteria. 15 excluded (5 by criteria, 10 by incomplete data). All 60 were successfully processed - zero processing failures.

60

Complete cases analyzed

42

Favorable (Engel I/II)

18

Unfavorable (Engel III/IV)

51 / 9

Adult / Pediatric

Site	Favorable	Unfavorable	Total	%
HUP (Penn)	34	13	47	78.3%
TCH (Texas Children's)	7	2	9	15.0%
JHH (Johns Hopkins)	0	3	3	5.0%
UMF (Miami)	1	0	1	1.7%

Results - Effect Size

Effect-Size Analysis

The interpretability ratio asks: does the algorithm assign higher criticality to tissue the surgeon actually treated? Favorable-outcome patients showed a mean ratio of 4.64 vs. 1.83 for unfavorable, confirming the algorithm reliably distinguishes driver tissue in successful surgeries. Ratios were winsorized[29]Dixon WJ, Tukey JW. Approximate behavior of the distribution of Winsorized t. Technometrics. 1968;10(1):83-98. DOI at p95 and bootstrapped[30]Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall/CRC; 1993. DOI with 10,000 resamples.

d = 0.74

Bootstrapped Cohen's d

0.39-1.06

95% CI

p = 0.003

One-sided p-value

0.205

Cohen's threshold exceeded

Subgroup	Fav. Mean	Unfav. Mean	Boot. d	95% CI	p
All Subjects	4.64	1.83	0.74	0.39, 1.06	0.003
Adults	5.05	2.00	0.73	0.37, 1.08	0.006
Pediatrics	2.58	0.54	1.87	1.17, 3.24	< 0.0001

Key finding: The algorithm assigned significantly higher criticality scores to tissue the surgeon actually targeted in patients with favorable outcomes (mean ratio 4.64 vs. 1.83, p = 0.003) - confirming it reliably identifies the tissue that matters.

Results - Threshold Characterization

Sensitivity & Specificity Across Thresholds

This figure sweeps the criticality threshold from 0 to 1, showing how patient-level macro-averaged sensitivity, specificity, and Youden's J[31]Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32-35. DOI change. The pre-locked operating point at 0.15 (red marker) was derived from the independent training cohort by maximizing Youden's J over seizure-free patients.

Note that this threshold was not optimized on the validation set - it was fixed prior to any validation analysis. The validation-set optimum may differ, but using a training-derived threshold prevents overfitting and ensures generalizability.

At the 0.15 threshold: macro sensitivity ≈ 67%, macro specificity ≈ 83%, Youden's J ≈ 0.50.

Figure 1. Patient-level macro-sensitivity, specificity, and Youden's J across thresholds 0-1 (Engel I/II, n=42). Pre-locked operating point (0.15) in red.

Results - Contact-Level Metrics

Diagnostic Performance at the Contact Level

686 treated and 3,277 untreated contacts were evaluated in the Engel I/II cohort. Three estimation approaches address different aspects of the clustered data structure, using GEE[32]Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13-22. DOI:

Sensitivity & Specificity

Group	Sens (GEE)	Spec (GEE)
All (N=42)	64.3%	84.3%
Adults (n=35)	59.6%	86.4%
Pediatrics (n=7)	91.1%	76.0%

PPV & NPV

Group	PPV (GEE)	NPV (GEE)
All (N=42)	49.0%	88.0%
Adults (n=35)	56.4%	85.6%
Pediatrics (n=7)	10.8%	99.35%

Clinical interpretation: High specificity (~84%) and NPV (~88%) mean the platform excels at ruling out tissue - if CN-Suite does not flag a contact, clinicians can be confident it's a "responder," not a driver. The PPV of 49% reflects its role as a candidate-identification tool: it flags a superset for multidisciplinary deliberation.

Pediatric NPV = 99.35% - In developing brains, safely defining what not to remove is especially valuable. Low pediatric PPV (10.8%) is driven by base-rate prevalence (2.9% treated-contact ratio vs. 17.1% in adults), not discriminative failure.

Results - Surgery Size

Performance Scales with Surgery Size

The strongest finding: per-patient sensitivity is inversely correlated with surgery size (ρ = -0.62, p < 0.0001). The algorithm is most informative for small focal procedures - the clinical scenario most relevant to minimally invasive ablation (LITT)[14,34]Youngerman BE et al. Long-term outcomes of mesial temporal LITT for drug-resistant epilepsy. J Neurol Neurosurg Psychiatry. 2023;94(11):879-886. DOIChen J et al. MR-guided LITT for drug-resistant epilepsy: a systematic review and IPD meta-analysis. Epilepsia. 2023;64(8):1957-1974. DOI.

Figure 2. Per-patient sensitivity (blue) & specificity (orange) vs. treated-contact count. Each point = one patient. Spearman ρ and trend lines shown.

Figure 3. Mean sensitivity by surgery-size bucket (10-contact bins). Error bars = SE. Patient counts above each point. Trend excludes the single-patient ≥80 bucket.

≤ 10 treated contacts (n=21, 50% of cohort): mean sensitivity = 80%. This is the LITT-sized target range where a compact driver cluster (~9 mm spacing) fits within a single ablation trajectory (15-25 mm major axis)[14,34].

Results - Spatial Analysis

High-Critical Contacts Form Compact 3D Clusters

For each patient (n=34 with coordinates), we computed the mean nearest-neighbor distance among high-critical contacts and compared it to a null distribution of 2,000 random permutations of electrode labels. Points below the diagonal indicate spatial clustering beyond chance. The ~9 mm mean spacing is compatible with a single LITT ablation trajectory[14,34]Youngerman BE et al. Long-term outcomes of mesial temporal LITT. J Neurol Neurosurg Psychiatry. 2023. DOIChen J et al. MR-guided LITT for drug-resistant epilepsy: IPD meta-analysis. Epilepsia. 2023. DOI.

8.96 mm

Observed mean NN dist.

17.21 mm

Expected by chance

85%

Patients significant

p = 1.2e-10

Wilcoxon test

Clinical implication: The ~9 mm mean spacing is compatible with a single LITT ablation trajectory (15-25 mm major axis), confirming that high-critical contacts form surgically targetable focal clusters rather than diffuse scatter.

Figure 4. Observed vs. expected mean nearest-neighbor distance per patient. Each point represents one patient; points below the diagonal indicate spatial clustering of high-critical contacts beyond chance (permutation test, 2,000 iterations).

Results - Score Ranking

Criticality Scores Carry Rank-Order Information

Beyond the binary threshold, do continuous scores carry rank information? Yes - among the 946 high-critical contacts (≥ 0.15, Engel I/II, n=42), the proportion inside the surgical zone increases monotonically with criticality score (Spearman ρ = 0.28, p = 6.8 × 10⁻¹⁹).

26%

Bottom-decile PPV

61%

Top-decile PPV

Surgical teams can triage flagged contacts by predicted clinical relevance rather than treating all above-threshold contacts as equivalent.

Figure 5. Surgical-zone enrichment (PPV) by criticality-score decile among 946 high-critical contacts (≥ 0.15, Engel I/II, n=42). Dashed red line: linear trend.

Results - Failure Mechanism

Why Surgery Fails: Untreated Driver Tissue

Each patient is plotted along two axes: the delta-mean (mean criticality inside the surgical zone minus outside, x-axis) and the fraction of high-critical contacts left outside the surgical zone (y-axis). Favorable patients (blue) cluster lower-right; unfavorable (red) cluster upper-left. Median delta-mean: 0.24 favorable vs. 0.05 unfavorable (Mann-Whitney p = 0.002).

Prospective implication: Before surgery, clinicians could use this two-dimensional failure map to assess coverage adequacy - a high fraction of untreated critical contacts with low inside-outside separation would flag incomplete resection planning.

Figure 6. Left: distribution separation by outcome. Right: failure-pattern map - delta-mean (x) vs. fraction outside SZ (y). Blue = favorable (Engel I/II), Red = unfavorable (III/IV).

Results - Interpretability

What Drives Criticality? Network Topology, Not Waveform

SHAP analysis (19,257 observations; 4,235 high-critical) reveals a three-tier feature hierarchy. The top four features are all network-topology measures (|SHAP| 0.27-0.39): Δ betweenness centrality (BC), Δ eigenvector centrality, Δ clustering, and time-averaged BC. Waveform features first appear at rank 5 - after a gap twice the within-tier step.

A critical contact becomes a routing bridge (betweenness ↑), connects to influential hubs (eigenvector ↑), and anchors a cohesive subnetwork (clustering ↑). Although the raw measures are negatively correlated (ρ = -0.49), their SHAP contributions are independent (ρ = 0.06). The model recognizes each as a separate predictor of criticality.

Critical insight: topology features predict not just whether a node is critical, but how much. Δ Eigenvector correlates strongly with the criticality score (ρ = 0.80); waveform features do not.

Full interactive SHAP analysis →

Network transformation at seizure onset. C* bridges both clusters (betweenness), connects to hubs (eigenvector), and forms a local clique (clustering). Below: SHAP hierarchy by tier.

Results - Temporal Dynamics

Critical Nodes Lead Information Flow at Seizure Onset

Using directed connectivity and lag estimates from the dWTE network, we tested whether higher criticality scores correspond to earlier position in the inferred seizure propagation sequence. The relationship was phase-dependent: pre-ictally, high-critical contacts were not earlier in flow order; around and after seizure onset, the direction reversed, with high-critical contacts arriving a median 10 ms earlier than low-critical contacts post-ictally (vs. 16 ms, p = 2.9 × 10^-41).

What distinguishes a driver contact is therefore not that it is chronically upstream, but that it becomes upstream at seizure onset - consistent with the SHAP finding that criticality reflects seizure-induced network reorganization rather than static electrophysiological properties.

Convergent signature: At seizure onset, a critical contact simultaneously shifts its topological role in the network - becoming a routing hub and consolidation point - and its temporal position in the propagation sequence, moving from downstream recipient to upstream source.

Median earliest-arrival delay (ms) by seizure phase for high-critical (CS >= 0.15) vs low-critical contacts. Pre-ictally, critical contacts arrive later; peri/post-ictally, they arrive earlier. All contrasts significant (Mann-Whitney p < 10^-9).

Caveats

Limitations

Retrospective design - validates correlation with outcomes but cannot measure real-time impact on surgical decision-making. Prospective integration into Epilepsy Surgery Conferences is the logical next step. Data collected from 2015-2024; temporal confounds mitigated by the algorithm's reliance on fundamental time-frequency neural dynamics.

Site imbalance - HUP contributed 78% of subjects. JHH contributed only 3, all unfavorable. An excluding-HUP analysis (n=13) yielded d = 0.68 (CI: 0.12-1.24), directionally consistent but underpowered.

Missing demographic data - Race, ethnicity, and sex unavailable for some patients due to IRB de-identification. No known biological mechanism links demographics to iEEG network connectivity.

Pediatric sample - Only 7 patients from a single center (TCH). The 99.35% NPV and 91.1% sensitivity are promising but preliminary, and require multicenter replication.

No head-to-head comparisons - This study evaluated standalone CN-Suite performance without direct comparison to HFO detectors[16,17]Jacobs J et al. High-frequency oscillations (HFOs) in clinical epilepsy. Prog Neurobiol. 2012;98(3):302-315. DOIFrauscher B et al. High-frequency oscillations: the state of clinical research. Epilepsia. 2017;58(8):1316-1329. DOI, directed connectivity methods[18,19]Blinowska KJ et al. Granger causality and information flow in multivariate processes. Phys Rev E. 2004;70(5):050902. DOIWilke C et al. Graph analysis of epileptogenic networks in human partial epilepsy. Epilepsia. 2011;52(1):84-93. DOI, or EZTrack[13]U.S. FDA. 510(k) K201910: EZTrack. 2021. FDA.

U.S. Level-4 centers only - Applicability to lower-volume centers, non-U.S. settings, subdural grids, or non-resective interventions (neuromodulation, RNS, DBS) requires separate validation.

Conclusion

Four Convergent Findings Define Clinical Value

Compact Drivers

High-critical contacts form spatially compact 3D clusters (~9mm NN distance) compatible with minimally invasive LITT ablation volumes. Sensitivity reaches 80% at ≤ 10 treated contacts.

Rank-Order Info

Criticality scores carry meaningful rank ordering - PPV climbs from 26% (bottom decile) to 61% (top decile), enabling surgical teams to prioritize among flagged contacts.

Failure Warning

When high-critical tissue is left outside the surgical zone, outcomes are unfavorable - providing both a mechanistic explanation and a potential prospective coverage-adequacy check.

Interpretable

SHAP analysis confirms criticality is driven by the change in network topology from baseline to seizure onset - not static position or waveform shape. These features predict not just whether a node is critical, but how much.

FIND Neuro - FDA 510(k) Submission - Multicenter Clinical Validation