Athena Methodology · v1.23

How Athena Measures Credibility

This document explains every component of the Athena Index: what is measured, how it is computed, what data is used, and — critically — what Athena does not and cannot claim. It is written for investors, journalists, subjects under review, and enterprise buyers.

6 behavioral pillarsOutcome-verifiedContestableExplainable
Athena Index - Good
Example Athena Index output
72 / 100
Good — Above average credibility
847 behavioral signals6 pillars evaluated94 claims indexed
Overview

What the Athena Index Measures

The Athena Index is a behavioral credibility score from 0–100. It measures how a public figure behaves when making verifiable claims over time — not whether they are likeable, influential, or popular.

The score is computed from six independent pillars, each representing a distinct behavioral dimension. Prediction accuracy is one input among several — the majority of the score comes from content behavior, disclosure practices, consistency, calibration, and accountability over time.

Scoring

The Six Credibility Pillars

Each pillar is computed from a weighted set of factors. Pillar scores are normalized to 0–1 before being combined into the Athena Index. Factors with insufficient data are excluded rather than penalizing subjects for missing information.

Outcome Alignment

22%

How often an entity's explicit predictions match the actual market outcome after the stated timeframe.

0%max 22%

Calibration

13%

Whether expressed confidence levels match actual accuracy. A well-calibrated entity who says "80% confident" should be right ~80% of the time.

0%max 13%

Consistency

15%

Behavioral stability over time: does the entity take the same position before and after moving prices, and avoid unexplained position reversals?

0%max 15%

Specificity Quality

20%

How precise and falsifiable an entity's claims are. Vague sentiment scores lower than specific, verifiable claims with price targets and timeframes.

0%max 20%

Transparency

15%

Disclosure behavior: does the entity reveal paid partnerships, portfolio positions, and potential conflicts of interest?

0%max 15%

Accountability

15%

How the entity handles being wrong: do they acknowledge missed predictions, correct the record, or deflect and reframe?

0%max 15%
Transparency

Factor Visibility Tiers

Athena factors are distributed across three visibility tiers. This protects both the integrity of the scoring model and the ability of subjects to understand their scores.

Public Factors

Fully disclosed in profile pages and the "Why This Score?" section. Examples: prediction accuracy, disclosure rate, citation quality.

Used in: UI, dispute process, subject appeals

Internal Factors

Used in scoring but only summarized (not enumerated) in subject-facing views. Examples: sub-components of calibration, NLP confidence signals.

Used in: Scoring engine, audit trail

Hidden Factors

Never disclosed. Used exclusively for fraud and manipulation detection. Disclosure would enable gaming.

Used in: Fraud detection only
Data Pipeline

How Predictions Are Extracted

Athena identifies verifiable claims from YouTube transcripts, video titles, and captions using a structured LLM extraction pipeline. Claims are not manually labeled.

1
YouTube
Captions / Audio
2
Transcription
Deepgram / ASR
3
Extraction
Gemini LLM
4
Deduplication
Near-match filter
5
Resolution
Asset registry
6
Evaluation
Price data

Extraction pipeline

  1. 1YouTube transcript fetched via caption API where available
  2. 2Deepgram audio transcription applied where captions are unavailable
  3. 3Gemini extracts structured predictions from the full transcript
  4. 4Deduplication removes near-identical claims within the same video
  5. 5Asset symbol resolved against a known asset registry
  6. 6Timeframe normalized to an end date for later evaluation

What counts as a claim

Athena extracts multiple types of verifiable claims — not just explicit price targets. Directional calls, implied positions, conviction statements, and conditional predictions are all captured and scored against outcomes where possible.

Each claim is classified by type before evaluation. Type classification affects which scoring pillars are updated — for example, a vague sentiment call contributes to Specificity Quality but not to Outcome Alignment.

Evaluation

How Outcomes Are Evaluated

Predictions are evaluated against historical price data once their stated timeframe expires. Athena uses free public APIs (Coinbase, Binance) — no paid data vendor dependency.

Hit
100 pts

Direction correct AND (target hit OR no target specified with direction match AND timeframe met).

Partial
50 pts

Direction correct but target not met, or timeframe missed but >50% progress toward target made.

Miss
0 pts

Direction incorrect, or target clearly not reached with <50% progress, or timeframe expired with negligible movement.

Open
Awaiting

Timeframe has not yet passed. Claim is on the ledger and will be evaluated when evidence becomes available.

Three accuracy dimensions scored independently:
  • Direction accuracy: Was the stated direction (up/down/neutral) correct at evaluation time?
  • Target accuracy: Was the stated price target reached? (only applies to claims with a numeric target)
  • Timeframe accuracy: Did the outcome occur within the stated timeframe? (only applies to time-specific claims)

Under the Hood

Four agents. One specimen. End-to-end provenance.

Every credibility score is produced by a pipeline of purpose-built AI agents — each with a single responsibility, each traceable to documented evidence.

Extraction AgentActive

Reads transcripts and source content. Identifies verifiable claims — direction, asset, conditions, timeframe — and writes them to the ledger.

IN
transcriptmetadata
OUT
structured claims
claims written to ledger
Verification AgentActive

When evidence becomes available, evaluates each claim against observable outcomes. Classifies as HIT, MISS, PARTIAL, or OPEN. Never premature.

IN
claim ledgerprice history
OUT
HIT · MISS · PARTIAL · OPEN
verdicts recorded
Accountability AgentActive

Tracks behavioral patterns over time — revisions, deletions, corrections, confidence calibration. Maintains an append-only history that cannot be rewritten.

IN
verdictsbehavior events
OUT
append-only history
behavioral signals aggregated
Explainability AgentActive

Computes 100+ behavioral factors across 6 pillars. Produces a score that is fully traceable — every number maps back to documented evidence.

IN
all signals
OUT
100+ factors → 6 pillars
score emitted

Athena Index

0 – 100 · fully traceable · methodology-versioned

0–100

Data Pipeline

1. Collect
2. Transcribe
3. Extract
4. Evaluate
5. Score
Confidence

How Confidence Language Is Factored In

Each extracted claim is assigned a confidence level based on the language used by the entity. This feeds into the Calibration pillar — an entity who expresses certainty and misses is penalized more than one who hedges. Inflated confidence language relative to actual outcomes is a key signal in Athena’s behavioral analysis.

Outcome: HIT
Outcome: MISS
High
confidence
Calibrated
reward
Severe
penalty
Low
confidence
Slight
bonus
Standard
penalty
Calibration principle: The score rewards entities whose expressed confidence matches their historical accuracy — not those who always sound certain or always hedge. An entity with a strong track record who uses measured language scores better on Calibration than one who constantly claims certainty regardless of outcome.
Contestability

The Dispute Process

Scores are contestable. Any subject, viewer, or researcher can file a dispute if they believe a claim was extracted incorrectly, an outcome was evaluated against the wrong data, or a factor was computed from bad inputs.

What can be disputed

  • A prediction was extracted that was not actually a prediction
  • The evaluation used the wrong price or wrong timeframe
  • A disclosure was present but not detected
  • A claim was attributed to the wrong speaker (host vs. guest)
  • A factor was computed from a corrupted transcript

What cannot be disputed

  • The weighting of pillars (methodology decision, not a data error)
  • Scores computed from correct data, even if unfavorable
  • Hidden factor computations (fraud detection is never disclosed)
  • Historical evaluations where market data is unambiguous
Acknowledged within 3 business daysEvidence review: 5–7 business daysConfirmed corrections visible within 24 hours
File or view a dispute
Limitations

What Athena Does Not Claim

Not financial advice

The Athena Index does not recommend buying or selling any asset. A high score does not mean an entity's current predictions are correct.

Not a measure of intent

Athena measures behavior, not intent. An entity with low transparency may not have known disclosure was expected — not necessarily acting in bad faith.

Not a complete record

Athena scores based on indexed content. Claims in private groups, newsletters, unindexed interviews, or platforms not yet covered are not included.

Not real-time

Scores are recomputed weekly. Recent behavior changes take time to be reflected. New transcripts may take days to weeks to be processed.

Not definitive at low sample sizes

Entities with fewer than 20 evaluated claims have Low or Minimal confidence ratings. Their scores are directionally informative, not statistically significant.

Not fraud detection

A low score does not mean an entity is committing fraud. Hidden factors identify manipulation signals, but Athena reports behavioral patterns — not legal violations.

Examples

How Claims Are Scored: Examples

The following examples illustrate how Athena scores specific types of claims — correctly captured hits, correctly captured misses, and the limits of vague predictions.

Correctly scored: Specific Hit
"BTC will reach $90,000 by end of Q1 2025 — I'm very confident."

Outcome: BTC closed at $92,400 on March 31, 2025.

outcome alignment: +Highspecificity: +Highcalibration: +Good

Specific price target + clear timeframe + stated confidence. All three accuracy dimensions verified.

Correctly scored: Specific Miss
"ETH will hit $10,000 in 2024. This is absolutely guaranteed."

Outcome: ETH peaked at $3,983 in 2024.

outcome alignment: −Missspecificity: +High (benefits calibration)calibration: −Severe overconfidence penalty

High specificity is noted correctly. The overconfident language ("guaranteed") multiplies the calibration penalty.

Vague claim: limited scoring
"Crypto is going to explode soon. Just trust me."

Outcome: Market rose 12% over the next 30 days.

outcome alignment: Weak — direction only, no targetspecificity: −Lowcalibration: Neutral

No price target, no timeframe. Extracted as sentiment-type only. The market rise does not count as a hit — the claim is not specific enough to evaluate.

Transparency: Undisclosed promotion
"I love this exchange and recommend everyone use it." [4-minute paid segment, no disclosure]

Outcome: N/A — evaluated as behavior, not prediction.

transparency: −Undisclosed promotion risk

Athena detects absence of disclosure language in identified sponsored segments. Reduces Transparency pillar score regardless of whether the recommended product performed well.

Score Quality

Score Confidence Levels

Every Athena Index score is accompanied by a confidence level reflecting the size and quality of the scoring sample. Low-confidence scores are displayed with appropriate caveats.

High
50+ evaluated claims

Statistically robust — score is reliable

Medium
20–49 evaluated claims

Directionally reliable — watch for changes

Low
5–19 evaluated claims

Indicative only — treat with caution

Minimal
< 5 evaluated claims

Score is a prior estimate, not a measurement

Methodology version: v1.23
Scoring model last updated: May 2026