Confidence Scoring
5-factor model. 0–100 scale. Reproducible.
Every claim extracted from an expert call receives a confidence score calculated across five independently weighted factors. The same claim, from the same call, always produces the same score.
Specificity
How precise and concrete is the claim? Numerical estimates score higher than directional assertions.
Evidence Quality
Is the claim supported by direct experience, data, or inference? Direct experience scores highest.
Linguistic Certainty
Hedging language ('I think', 'probably') reduces score. Definitive language ('We saw', 'The data shows') increases it.
Expert Seniority
C-suite and VP-level experts score higher than junior practitioners for strategic claims.
Cross-Call Corroboration
Claims corroborated by 2+ independent experts in separate calls receive a corroboration boost.
Cross-Call Synthesis
How claims from multiple experts are synthesized.
Semantic Clustering
Claims with cosine similarity >0.78 across calls are grouped into semantic clusters. Each cluster represents a research theme.
Consensus Detection
When 3+ independent experts in the same cluster make directionally consistent claims, the cluster is classified as Consensus.
Contradiction Flagging
When expert claims in the same cluster are directionally inconsistent, the cluster is flagged as Contradiction — surfaced immediately in the thesis.
Unique Signal
Claims with no similar cluster-mates are classified as Unique Signal — potentially valuable outlier information that warrants follow-up.
Research Gap Detection
Three types of gaps. Flagged in real time.
Research Question Gap
A specific question raised in a call that was never answered. Flagged when a question mark is detected without a subsequent expert response addressing it.
Thesis Dimension Gap
A dimension of the project's investment/research thesis that has zero claim coverage after 3+ calls. Flagged when claims don't address a defined thesis pillar.
AI-Surfaced Gap
Topics semantically adjacent to your thesis that appear in industry discourse but are absent from your call set — surfaced by comparing your claim vocabulary to industry corpus.
Multilingual Benchmarks
10 languages. Published accuracy benchmarks.
| Language | WER | Claim Precision |
|---|---|---|
| English | <4% | 92% |
| Mandarin Chinese | <6% | 91% |
| Hindi | <8% | 87% |
| Bahasa Indonesia | <7% | 88% |
| Spanish | <5% | 90% |
| Portuguese | <6% | 89% |
| Arabic | <9% | 85% |
| German | <5% | 90% |
| French | <5% | 91% |
| Japanese | <7% | 86% |
WER = Word Error Rate vs. human-transcribed gold standard. Updated quarterly. Last updated: Q1 2026.
Rolling Thesis
A thesis that updates with every call.
The Rolling Thesis is a structured, living document that synthesizes all claims across all calls into a coherent, confidence-scored intelligence report. Unlike a static memo, it updates within seconds of each new call upload — recalculating confidence scores, surfacing new contradictions, and flagging newly resolved gaps.
The thesis is organized by user-defined dimensions (e.g., “Market Size”, “Competitive Position”, “Regulatory Risk”). Each dimension shows: coverage score (% of thesis dimensions addressed), claim count, average confidence, open gaps, and source experts.
Methodology questions.
Ready to see the methodology in action?
We'll walk you through confidence scoring, cross-call synthesis, and gap detection on your actual research.