description: Evaluates text, articles, claims, or content for factual accuracy against the current state of the world in 2026. Use this skill whenever the user asks to evaluate, fact-check, assess accuracy, or review content for what it gets right and what needs improvement — especially when they ask what's accurate "in 2026", "currently", or "today". Always use this skill when the prompt includes phrases like "evaluate this", "what does it get right", "what needs improvement to be accurate", "fact-check this", or "is this still true". Produces a structured, consistent report with labeled sections so outputs can be compared across multiple AI services.
description: Evaluates text, articles, claims, or content for factual accuracy against the current state of the world in 2026. Use this skill whenever the user asks to evaluate, fact-check, assess accuracy, or review content for what it gets right and what needs improvement — especially when they ask what's accurate "in 2026", "currently", or "today". Also use when asked to evaluate a critique, review, or evaluation of another piece of content (meta-evaluation mode). Always use this skill when the prompt includes phrases like "evaluate this", "what does it get right", "what needs improvement to be accurate", "fact-check this", "is this still true", or "use your skills evaluator". Produces a structured, consistent report with labeled sections so outputs can be compared across multiple AI services.
---
---
# Accuracy Evaluator
# Accuracy Evaluator
Evaluates content for factual accuracy in 2026, producing a structured report that clearly separates what is correct, what is outdated, what is uncertain, and what cannot be verified.
Evaluates content for factual accuracy in 2026, producing a structured report that clearly separates what is correct, what is outdated, what is missing, and what cannot be verified — and why.
---
---
## Core Objective
## Core Objective
Produce a structured, honest accuracy report. The output format is intentionally consistent so it can be compared side-by-side with reports from other AI services.
Produce a structured, honest accuracy report. The output format is intentionally consistent so it can be compared side-by-side with reports from other AI services.
---
---
## Pre-Flight: Determine Evaluation Mode
Before starting, determine what kind of content was submitted:
**Standard Mode** — The content makes claims about the world (an article, tutorial, guide, argument, description). Evaluate the claims directly.
**Meta-Evaluation Mode** — The content is itself a critique, review, evaluation, or analysis of another piece of content. In this case:
1. Evaluate the **critique's own claims** using the standard process below (not just the underlying content it references).
2. Check whether the critique's praise and objections are accurate, well-supported, and complete.
3. Note what the critique missed, overclaimed, or got wrong.
4. If both the source content and the critique are available, evaluate both documents in sequence, labeling each clearly.
Do not skip meta-evaluation mode. If someone submits a document that says "here is what the AI got right and wrong," evaluate *that document* as rigorously as you would evaluate the original.
---
## Evaluation Process
## Evaluation Process
### Pre-Step: Resolve Knowledge Gaps with Web Search
### Pre-Step: Resolve Knowledge Gaps with Web Search
Before evaluating, identify any claims involving **named products, tools, models, APIs, companies, or technologies** that may have emerged or changed after August 2025 — Claude's knowledge cutoff. For any such claim where currency matters and uncertainty exists, **run a web search before rendering a verdict**. Do not guess or extrapolate from pre-cutoff knowledge when current data is retrievable. This applies especially to AI model comparisons, software version claims, company status, product availability, and pricing.
Before evaluating, identify any claims involving **named products, tools, models, APIs, companies, or technologies** that may have emerged or changed after August 2025 — Claude's knowledge cutoff. For any such claim where currency matters and uncertainty exists, **run a web search before rendering a verdict**. Do not guess or extrapolate from pre-cutoff knowledge when current data is retrievable. This applies especially to AI model comparisons, software version claims, company status, product availability, and pricing.
### Step 1: Identify Claim Types
### Step 1: Identify Claim Types
Before evaluating, classify the claims in the content:
Before evaluating, classify the claims in the content:
- **Fast-changing facts** — AI capabilities, software versions, company status, market conditions, geopolitics, regulations, prices, personnel (high staleness risk)
- **Fast-changing facts** — AI capabilities, software versions, company status, market conditions, geopolitics, regulations, prices, personnel (high staleness risk)
- **Opinions or predictions** — assess whether they were reasonable given what was known and whether they have proven out
- **Opinions or predictions** — assess whether they were reasonable given what was known and whether they have proven out
- **Procedural/how-to claims** — may be outdated if tools or APIs have changed
- **Procedural/how-to claims** — may be outdated if tools or APIs have changed
- **Domain best-practice claims** — recommendations about how something *should* be done; flag these for the omission check in Step 2b
### Step 2: Evaluate Each Significant Claim
### Step 2a: Evaluate Each Significant Claim
For each meaningful claim, assess:
For each meaningful claim, assess:
1. Is it factually correct as of early 2026?
1. Is it factually correct as of early 2026?
2. Was it correct when written but has since become outdated?
2. Was it correct when written but has since become outdated?
3. Is it misleading in framing even if technically accurate?
3. Is it misleading in framing even if technically accurate?
4. Is it unverifiable from available knowledge?
4. Is it unverifiable from available knowledge?
### Step 2b: Check for Significant Omissions (Domain Best-Practice Gap Check)
This step is separate from staleness checking. Ask:
**"Are there well-established best practices, standard approaches, or critical warnings for this domain that the content fails to mention, and whose absence would lead a reader toward a worse outcome?"**
This catches errors of omission that are not staleness issues — they were gaps when the content was written too. Apply this check especially to:
- Technical tutorials and how-to guides
- Architectural or design recommendations
- Safety-critical or high-stakes domains
- Content that presents a workflow as complete when it omits a foundational step
For each significant omission, ask: Would a practitioner in this field consider the omission a meaningful gap? If yes, it belongs in the report.
Always structure your response exactly as follows. Use these exact section headers so outputs are comparable across AI services.
Always structure your response exactly as follows. Use these exact section headers so outputs are comparable across AI services. If evaluating multiple documents (e.g., meta-evaluation mode), repeat the full report block for each document, clearly labeled.
---
---
### ACCURACY EVALUATION REPORT
### ACCURACY EVALUATION REPORT
**Content evaluated:** [1-sentence description of what was submitted]
**Content evaluated:** [1-sentence description of what was submitted]
List claims that are accurate as of 2026. For each, briefly explain why it holds up. Include confidence level.
List claims that are accurate as of 2026. For each, briefly explain why it holds up. Include confidence level.
---
---
#### WHAT NEEDS IMPROVEMENT
#### WHAT NEEDS IMPROVEMENT
List claims that are inaccurate, outdated, or misleading as of 2026. For each:
List claims that are inaccurate, outdated, or misleading as of 2026. For each:
- State the original claim
- State the original claim
- State what is accurate in 2026
- State what is accurate in 2026
- Note whether this was wrong when written or has become wrong since
- Note whether this was **wrong when written** or **has become wrong since**
- Include confidence level
- Include confidence level
---
---
#### SIGNIFICANT OMISSIONS
List well-established best practices, standard approaches, or critical warnings that are absent from the content and whose absence would lead a reader toward a worse outcome. For each:
- Describe what is missing
- Explain why it matters (what goes wrong without it)
- Note whether this is a **domain best-practice gap** (was missing when written) or a **temporal gap** (the practice became standard later)
- Include confidence level
If nothing significant is missing, state: "None identified."
---
#### WHAT HAS CHANGED SINCE THIS WAS WRITTEN
#### WHAT HAS CHANGED SINCE THIS WAS WRITTEN
(Skip this section if the content appears to be recent or if no temporal shift is detectable.)
(Skip this section if the content appears to be recent or if no temporal shift is detectable.)
Summarize developments that have materially changed the picture — things the author couldn't have known or that have evolved. This is distinct from errors; these are legitimate updates.
Summarize developments that have materially changed the picture — things the author couldn't have known or that have evolved. This is distinct from errors; these are legitimate updates.
---
---
#### CANNOT VERIFY
#### CANNOT VERIFY
List claims you cannot assess with confidence. Be specific about why — knowledge cutoff, domain limits, missing context, or rapidly changing situation.
Split into two sub-categories:
**Missing Context** — Claims that cannot be assessed because required context is absent from the submitted material (e.g., references to images, prior conversations, proprietary data, or named individuals not described). For each, specify what context is needed.
**Knowledge Limit** — Claims that cannot be assessed due to Claude's knowledge cutoff, domain depth limits, or a rapidly changing situation that search could not resolve. For each, specify why the limit applies and suggest how the reader could verify independently.
If neither category has entries, state: "None identified."
---
---
#### OVERALL ASSESSMENT
#### OVERALL ASSESSMENT
One paragraph. Summarize the overall accuracy quality of the content, note the most significant issues, and give a directional rating:
One paragraph. Summarize the overall accuracy quality of the content, note the most significant issues, and give a directional rating:
- **Strong** — mostly accurate, minor gaps
- **Strong** — mostly accurate, minor gaps or omissions
- **Mixed** — significant accurate content alongside notable errors or outdated material
- **Mixed** — significant accurate content alongside notable errors, outdated material, or important omissions
- **Weak** — substantial inaccuracies or outdated framing that undermines the content's usefulness
- **Weak** — substantial inaccuracies, outdated framing, or missing foundations that undermine the content's usefulness
- **Indeterminate** — cannot assess meaningfully without more context
- **Indeterminate** — cannot assess meaningfully without more context
---
---
## Domain-Specific Accuracy Guidance for 2026
## Domain-Specific Accuracy Guidance for 2026
Apply heightened scrutiny in these fast-moving areas:
Apply heightened scrutiny in these fast-moving areas:
**Artificial Intelligence**
**Artificial Intelligence**
- Model capability claims shift rapidly; GPT-4-era assumptions may be obsolete
- Model capability claims shift rapidly; GPT-4-era assumptions may be obsolete
- Benchmark comparisons from 2023-2024 are often outdated
- Benchmark comparisons from 2023-2024 are often outdated
- Agent, reasoning, and multimodal capabilities have advanced significantly
- Agent, reasoning, and multimodal capabilities have advanced significantly
- Company positions (OpenAI, Anthropic, Google DeepMind, Meta AI, Mistral) have all shifted
- Company positions (OpenAI, Anthropic, Google DeepMind, Meta AI, Mistral) have all shifted
- Regulatory landscape (EU AI Act, US executive orders) has evolved
- Regulatory landscape (EU AI Act, US executive orders) has evolved
**Technology & Software**
**Technology & Software**
- Version numbers, API structures, and recommended practices change frequently
- Version numbers, API structures, and recommended practices change frequently