How Athena Measures Credibility
This document explains every component of the Athena Index: what is measured, how it is computed, what data is used, and — critically — what Athena does not and cannot claim. It is written for investors, journalists, subjects under review, and enterprise buyers.
What the Athena Index Measures
The Athena Index is a behavioral credibility score from 0–100. It measures how a public figure behaves when making verifiable claims over time — not whether they are likeable, influential, or popular.
The score is computed from six independent pillars, each representing a distinct behavioral dimension. Prediction accuracy is one input among several — the majority of the score comes from content behavior, disclosure practices, consistency, calibration, and accountability over time.
The Six Credibility Pillars
Each pillar is computed from a weighted set of factors. Pillar scores are normalized to 0–1 before being combined into the Athena Index. Factors with insufficient data are excluded rather than penalizing subjects for missing information.
Outcome Alignment
22%How often an entity's explicit predictions match the actual market outcome after the stated timeframe.
Calibration
13%Whether expressed confidence levels match actual accuracy. A well-calibrated entity who says "80% confident" should be right ~80% of the time.
Consistency
15%Behavioral stability over time: does the entity take the same position before and after moving prices, and avoid unexplained position reversals?
Specificity Quality
20%How precise and falsifiable an entity's claims are. Vague sentiment scores lower than specific, verifiable claims with price targets and timeframes.
Transparency
15%Disclosure behavior: does the entity reveal paid partnerships, portfolio positions, and potential conflicts of interest?
Accountability
15%How the entity handles being wrong: do they acknowledge missed predictions, correct the record, or deflect and reframe?
Factor Visibility Tiers
Athena factors are distributed across three visibility tiers. This protects both the integrity of the scoring model and the ability of subjects to understand their scores.
Public Factors
Fully disclosed in profile pages and the "Why This Score?" section. Examples: prediction accuracy, disclosure rate, citation quality.
Used in: UI, dispute process, subject appealsInternal Factors
Used in scoring but only summarized (not enumerated) in subject-facing views. Examples: sub-components of calibration, NLP confidence signals.
Used in: Scoring engine, audit trailHidden Factors
Never disclosed. Used exclusively for fraud and manipulation detection. Disclosure would enable gaming.
Used in: Fraud detection onlyHow Predictions Are Extracted
Athena identifies verifiable claims from YouTube transcripts, video titles, and captions using a structured LLM extraction pipeline. Claims are not manually labeled.
Extraction pipeline
- 1YouTube transcript fetched via caption API where available
- 2Deepgram audio transcription applied where captions are unavailable
- 3Gemini extracts structured predictions from the full transcript
- 4Deduplication removes near-identical claims within the same video
- 5Asset symbol resolved against a known asset registry
- 6Timeframe normalized to an end date for later evaluation
What counts as a claim
Athena extracts multiple types of verifiable claims — not just explicit price targets. Directional calls, implied positions, conviction statements, and conditional predictions are all captured and scored against outcomes where possible.
Each claim is classified by type before evaluation. Type classification affects which scoring pillars are updated — for example, a vague sentiment call contributes to Specificity Quality but not to Outcome Alignment.
How Outcomes Are Evaluated
Predictions are evaluated against historical price data once their stated timeframe expires. Athena uses free public APIs (Coinbase, Binance) — no paid data vendor dependency.
100 ptsDirection correct AND (target hit OR no target specified with direction match AND timeframe met).
50 ptsDirection correct but target not met, or timeframe missed but >50% progress toward target made.
0 ptsDirection incorrect, or target clearly not reached with <50% progress, or timeframe expired with negligible movement.
AwaitingTimeframe has not yet passed. Claim is on the ledger and will be evaluated when evidence becomes available.
- Direction accuracy: Was the stated direction (up/down/neutral) correct at evaluation time?
- Target accuracy: Was the stated price target reached? (only applies to claims with a numeric target)
- Timeframe accuracy: Did the outcome occur within the stated timeframe? (only applies to time-specific claims)
Under the Hood
Four agents. One specimen.
End-to-end provenance.
Every credibility score is produced by a pipeline of purpose-built AI agents — each with a single responsibility, each traceable to documented evidence.
Reads transcripts and source content. Identifies verifiable claims — direction, asset, conditions, timeframe — and writes them to the ledger.
When evidence becomes available, evaluates each claim against observable outcomes. Classifies as HIT, MISS, PARTIAL, or OPEN. Never premature.
Tracks behavioral patterns over time — revisions, deletions, corrections, confidence calibration. Maintains an append-only history that cannot be rewritten.
Computes 100+ behavioral factors across 6 pillars. Produces a score that is fully traceable — every number maps back to documented evidence.
Athena Index
0 – 100 · fully traceable · methodology-versioned
Data Pipeline
How Confidence Language Is Factored In
Each extracted claim is assigned a confidence level based on the language used by the entity. This feeds into the Calibration pillar — an entity who expresses certainty and misses is penalized more than one who hedges. Inflated confidence language relative to actual outcomes is a key signal in Athena’s behavioral analysis.
confidence
reward
penalty
confidence
bonus
penalty
The Dispute Process
Scores are contestable. Any subject, viewer, or researcher can file a dispute if they believe a claim was extracted incorrectly, an outcome was evaluated against the wrong data, or a factor was computed from bad inputs.
What can be disputed
- A prediction was extracted that was not actually a prediction
- The evaluation used the wrong price or wrong timeframe
- A disclosure was present but not detected
- A claim was attributed to the wrong speaker (host vs. guest)
- A factor was computed from a corrupted transcript
What cannot be disputed
- The weighting of pillars (methodology decision, not a data error)
- Scores computed from correct data, even if unfavorable
- Hidden factor computations (fraud detection is never disclosed)
- Historical evaluations where market data is unambiguous
What Athena Does Not Claim
Not financial advice
The Athena Index does not recommend buying or selling any asset. A high score does not mean an entity's current predictions are correct.
Not a measure of intent
Athena measures behavior, not intent. An entity with low transparency may not have known disclosure was expected — not necessarily acting in bad faith.
Not a complete record
Athena scores based on indexed content. Claims in private groups, newsletters, unindexed interviews, or platforms not yet covered are not included.
Not real-time
Scores are recomputed weekly. Recent behavior changes take time to be reflected. New transcripts may take days to weeks to be processed.
Not definitive at low sample sizes
Entities with fewer than 20 evaluated claims have Low or Minimal confidence ratings. Their scores are directionally informative, not statistically significant.
Not fraud detection
A low score does not mean an entity is committing fraud. Hidden factors identify manipulation signals, but Athena reports behavioral patterns — not legal violations.
How Claims Are Scored: Examples
The following examples illustrate how Athena scores specific types of claims — correctly captured hits, correctly captured misses, and the limits of vague predictions.
"BTC will reach $90,000 by end of Q1 2025 — I'm very confident."
Outcome: BTC closed at $92,400 on March 31, 2025.
outcome alignment: +Highspecificity: +Highcalibration: +GoodSpecific price target + clear timeframe + stated confidence. All three accuracy dimensions verified.
"ETH will hit $10,000 in 2024. This is absolutely guaranteed."
Outcome: ETH peaked at $3,983 in 2024.
outcome alignment: −Missspecificity: +High (benefits calibration)calibration: −Severe overconfidence penaltyHigh specificity is noted correctly. The overconfident language ("guaranteed") multiplies the calibration penalty.
"Crypto is going to explode soon. Just trust me."
Outcome: Market rose 12% over the next 30 days.
outcome alignment: Weak — direction only, no targetspecificity: −Lowcalibration: NeutralNo price target, no timeframe. Extracted as sentiment-type only. The market rise does not count as a hit — the claim is not specific enough to evaluate.
"I love this exchange and recommend everyone use it." [4-minute paid segment, no disclosure]
Outcome: N/A — evaluated as behavior, not prediction.
transparency: −Undisclosed promotion riskAthena detects absence of disclosure language in identified sponsored segments. Reduces Transparency pillar score regardless of whether the recommended product performed well.
Score Confidence Levels
Every Athena Index score is accompanied by a confidence level reflecting the size and quality of the scoring sample. Low-confidence scores are displayed with appropriate caveats.
50+ evaluated claimsStatistically robust — score is reliable
20–49 evaluated claimsDirectionally reliable — watch for changes
5–19 evaluated claimsIndicative only — treat with caution
< 5 evaluated claimsScore is a prior estimate, not a measurement