Athena Methodology · v1.27.0

How Athena Measures Credibility

This document explains every component of the Athena Index: what is measured, how it is computed, what data is used, and — critically — what Athena does not and cannot claim. It is written for investors, journalists, subjects under review, and enterprise buyers.

6 behavioral pillarsOutcome-verifiedContestableExplainable

Example Athena Index output

72 / 100

Good — Above average credibility

847 behavioral signals6 pillars evaluated94 claims indexed

Overview

What the Athena Index Measures

The Athena Index is a behavioral credibility score from 0–100. It measures how a public figure behaves when making verifiable claims over time — not whether they are likeable, influential, or popular.

The score is computed from six independent pillars, each representing a distinct behavioral dimension. Prediction accuracy is one input among several — the majority of the score comes from content behavior, disclosure practices, consistency, calibration, and accountability over time.

Scoring

The Six Credibility Pillars

Each pillar is computed from a weighted set of factors. Pillar scores are normalized to 0–1 before being combined into the Athena Index. Factors with insufficient data are excluded rather than penalizing subjects for missing information.

Outcome Alignment

22%

How often an entity's explicit predictions match the actual market outcome after the stated timeframe.

0%max 22%

Calibration

13%

Whether expressed confidence levels match actual accuracy. A well-calibrated entity who says "80% confident" should be right ~80% of the time.

0%max 13%

Consistency

15%

Behavioral stability over time: does the entity take the same position before and after moving prices, and avoid unexplained position reversals?

0%max 15%

Specificity Quality

20%

How precise and falsifiable an entity's claims are. Vague sentiment scores lower than specific, verifiable claims with price targets and timeframes.

0%max 20%

Transparency

15%

Disclosure behavior: does the entity reveal paid partnerships, portfolio positions, and potential conflicts of interest?

0%max 15%

Accountability

15%

How the entity handles being wrong: do they acknowledge missed predictions, correct the record, or deflect and reframe?

0%max 15%

Transparency

Factor Visibility Tiers

Athena factors are distributed across three visibility tiers. This protects both the integrity of the scoring model and the ability of subjects to understand their scores.

◎

Public Factors

Fully disclosed in profile pages and the "Why This Score?" section. Examples: prediction accuracy, disclosure rate, citation quality.

Used in: UI, dispute process, subject appeals

◑

Internal Factors

Used in scoring but only summarized (not enumerated) in subject-facing views. Examples: sub-components of calibration, NLP confidence signals.

Used in: Scoring engine, audit trail

◉

Hidden Factors

Never disclosed. Used exclusively for fraud and manipulation detection. Disclosure would enable gaming.

Used in: Fraud detection only

Data Pipeline

How Predictions Are Extracted

Athena identifies verifiable claims from YouTube transcripts, video titles, and captions using a structured LLM extraction pipeline. Claims are not manually labeled.

YouTube

Captions / Audio

→

Transcription

Deepgram / ASR

→

Extraction

Gemini LLM

→

Deduplication

Near-match filter

→

Resolution

Asset registry

→

Evaluation

Price data

Extraction pipeline

1YouTube transcript fetched via caption API where available
2Deepgram audio transcription applied where captions are unavailable
3Gemini extracts structured predictions from the full transcript
4Deduplication removes near-identical claims within the same video
5Asset symbol resolved against a known asset registry
6Timeframe normalized to an end date for later evaluation

What counts as a claim

Athena extracts multiple types of verifiable claims — not just explicit price targets. Directional calls, implied positions, conviction statements, and conditional predictions are all captured and scored against outcomes where possible.

Each claim is classified by type before evaluation. Type classification affects which scoring pillars are updated — for example, a vague sentiment call contributes to Specificity Quality but not to Outcome Alignment.

Evaluation

How Outcomes Are Evaluated

Predictions are evaluated against historical price data once their stated timeframe expires. Athena uses free public APIs (Coinbase, Binance) — no paid data vendor dependency.

Hit

100 pts

Direction correct AND (target hit OR no target specified with direction match AND timeframe met).

Partial

50 pts

Direction correct but target not met, or timeframe missed but >50% progress toward target made.

Miss

0 pts

Direction incorrect, or target clearly not reached with <50% progress, or timeframe expired with negligible movement.

Open

Awaiting

Timeframe has not yet passed. Claim is on the ledger and will be evaluated when evidence becomes available.

Three accuracy dimensions scored independently:

Direction accuracy: Was the stated direction (up/down/neutral) correct at evaluation time?
Target accuracy: Was the stated price target reached? (only applies to claims with a numeric target)
Timeframe accuracy: Did the outcome occur within the stated timeframe? (only applies to time-specific claims)

Under the Hood

Four agents. One specimen.
End-to-end provenance.

Every credibility score is produced by a pipeline of purpose-built AI agents — each with a single responsibility, each traceable to documented evidence.

Extraction AgentActive

Reads transcripts and source content. Identifies verifiable claims — direction, asset, conditions, timeframe — and writes them to the ledger.

transcriptmetadata

OUT

structured claims

claims written to ledger

Verification AgentActive

When evidence becomes available, evaluates each claim against observable outcomes. Classifies as HIT, MISS, PARTIAL, or OPEN. Never premature.

claim ledgerprice history

OUT

HIT · MISS · PARTIAL · OPEN

verdicts recorded

Accountability AgentActive

Tracks behavioral patterns over time — revisions, deletions, corrections, confidence calibration. Maintains an append-only history that cannot be rewritten.

verdictsbehavior events

OUT

append-only history

behavioral signals aggregated

Explainability AgentActive

Computes 100+ behavioral factors across 6 pillars. Produces a score that is fully traceable — every number maps back to documented evidence.

all signals

OUT

100+ factors → 6 pillars

score emitted

Athena Index

0 – 100 · fully traceable · methodology-versioned

0–100

Data Pipeline

CollectYouTube · X

TranscribeAudio → text

ExtractIdentify claims

EvaluateVerify outcomes

Score100+ factors → Index

1. Collect

2. Transcribe

3. Extract

4. Evaluate

5. Score

Confidence

How Confidence Language Is Factored In

Each extracted claim is assigned a confidence level based on the language used by the entity. This feeds into the Calibration pillar — an entity who expresses certainty and misses is penalized more than one who hedges. Inflated confidence language relative to actual outcomes is a key signal in Athena’s behavioral analysis.

Outcome: HIT

Outcome: MISS

High
confidence

✓

Calibrated
reward

✗

Severe
penalty

Low
confidence

↑

Slight
bonus

—

Standard
penalty

Calibration principle: The score rewards entities whose expressed confidence matches their historical accuracy — not those who always sound certain or always hedge. An entity with a strong track record who uses measured language scores better on Calibration than one who constantly claims certainty regardless of outcome.

Contestability

The Dispute Process

Scores are contestable. Any subject, viewer, or researcher can file a dispute if they believe a claim was extracted incorrectly, an outcome was evaluated against the wrong data, or a factor was computed from bad inputs.

What can be disputed

A prediction was extracted that was not actually a prediction
The evaluation used the wrong price or wrong timeframe
A disclosure was present but not detected
A claim was attributed to the wrong speaker (host vs. guest)
A factor was computed from a corrupted transcript

What cannot be disputed

The weighting of pillars (methodology decision, not a data error)
Scores computed from correct data, even if unfavorable
Hidden factor computations (fraud detection is never disclosed)
Historical evaluations where market data is unambiguous

Acknowledged within 3 business daysEvidence review: 5–7 business daysConfirmed corrections visible within 24 hours

File or view a dispute

Limitations

What Athena Does Not Claim

Not financial advice

The Athena Index does not recommend buying or selling any asset. A high score does not mean an entity's current predictions are correct.

Not a measure of intent

Athena measures behavior, not intent. An entity with low transparency may not have known disclosure was expected — not necessarily acting in bad faith.

Not a complete record

Athena scores based on indexed content. Claims in private groups, newsletters, unindexed interviews, or platforms not yet covered are not included.

Not real-time

Scores are recomputed weekly. Recent behavior changes take time to be reflected. New transcripts may take days to weeks to be processed.

Not definitive at low sample sizes

Entities with fewer than 20 evaluated claims have Low or Minimal confidence ratings. Their scores are directionally informative, not statistically significant.

Not fraud detection

A low score does not mean an entity is committing fraud. Hidden factors identify manipulation signals, but Athena reports behavioral patterns — not legal violations.

Examples

How Claims Are Scored: Examples

The following examples illustrate how Athena scores specific types of claims — correctly captured hits, correctly captured misses, and the limits of vague predictions.

Correctly scored: Specific Hit

"BTC will reach $90,000 by end of Q1 2025 — I'm very confident."

Outcome: BTC closed at $92,400 on March 31, 2025.

outcome alignment: +Highspecificity: +Highcalibration: +Good

Specific price target + clear timeframe + stated confidence. All three accuracy dimensions verified.

Correctly scored: Specific Miss

"ETH will hit $10,000 in 2024. This is absolutely guaranteed."

Outcome: ETH peaked at $3,983 in 2024.

outcome alignment: −Missspecificity: +High (benefits calibration)calibration: −Severe overconfidence penalty

High specificity is noted correctly. The overconfident language ("guaranteed") multiplies the calibration penalty.

Vague claim: limited scoring

"Crypto is going to explode soon. Just trust me."

Outcome: Market rose 12% over the next 30 days.

outcome alignment: Weak — direction only, no targetspecificity: −Lowcalibration: Neutral

No price target, no timeframe. Extracted as sentiment-type only. The market rise does not count as a hit — the claim is not specific enough to evaluate.

Transparency: Undisclosed promotion

"I love this exchange and recommend everyone use it." [4-minute paid segment, no disclosure]

Outcome: N/A — evaluated as behavior, not prediction.

transparency: −Undisclosed promotion risk

Athena detects absence of disclosure language in identified sponsored segments. Reduces Transparency pillar score regardless of whether the recommended product performed well.

Score Quality

Score Confidence Levels

Every Athena Index score is accompanied by a confidence level reflecting the size and quality of the scoring sample. Low-confidence scores are displayed with appropriate caveats.

High

50+ evaluated claims

Statistically robust — score is reliable

Medium

20–49 evaluated claims

Directionally reliable — watch for changes

Low

5–19 evaluated claims

Indicative only — treat with caution

Minimal

< 5 evaluated claims

Score is a prior estimate, not a measurement

Methodology version: v1.27.0

Scoring model last updated: July 2026

View Rankings Dispute a Score Data Access

How Athena Measures Credibility

What the Athena Index Measures

The Six Credibility Pillars

Outcome Alignment

Calibration

Consistency

Specificity Quality

Transparency

Accountability

Factor Visibility Tiers

Public Factors

Internal Factors

Hidden Factors

How Predictions Are Extracted

Extraction pipeline

What counts as a claim

How Outcomes Are Evaluated

Four agents. One specimen. End-to-end provenance.

How Confidence Language Is Factored In

The Dispute Process

What can be disputed

What cannot be disputed

What Athena Does Not Claim

Not financial advice

Not a measure of intent

Not a complete record

Not real-time

Not definitive at low sample sizes

Not fraud detection

How Claims Are Scored: Examples

Score Confidence Levels

Four agents. One specimen.
End-to-end provenance.