# The retrieval and grounding test: what each system must answer

This is the question set for outside retrieval/research systems being tested against this dataset. It is a test of **retrieval accuracy and hallucination containment**, not scoring: Yardstick's scoring methodology is proprietary and deliberately excluded. Your system answers factual questions and proves every answer with an exact quote.

## The input: name and description only

For each test company, the system under test receives exactly two fields: the **company name** and a **one-line description** of what the company does. No website, no category metadata, no hints. Identifying the right company is part of the retrieval test.

## The ten questions, per company

1. What year was the company founded?
2. Where is the company's headquarters?
3. What is the company's funding or valuation, and who are its named investors? Cite at least one source independent of the company.
4. What is the company's approximate headcount?
5. Who currently leads the company (CEO plus key executives)? Use sources no older than 12 months.
6. What is the most significant company news item from the past 12 months?
7. What are the company's exact published pricing tiers? If pricing is not published, answer "quote-only" and support it with a web-archive check showing whether pricing was published in the past.
8. Which native integrations does the company name on its own integrations or partners page?
9. Which named customers does the company claim? Cross-check at least one with a source independent of the company.
10. Which security or compliance certifications and attestations does the company document (for example SOC 2, ISO 27001, HIPAA)?

## The output contract (every answer, no exceptions)

For each question, return four fields:

1. **Summary**: one or two sentences answering the question, from your own research.
2. **Supporting quote**: a verbatim sentence from a real source that backs the answer. Copied exactly. Never paraphrased.
3. **Source URL**: the live URL where that quote appears.
4. **Provenance tag**: `VENDOR-CLAIMED` if the source is the company's own site or press release; `THIRD-PARTY` if independent (news, analyst, regulator, review); `ESTIMATED` if the value is your own inference.

If no public source exists for a question, leave the quote and URL blank and write "no public source found" in the summary. That is a correct answer. An invented quote, URL, integration, price, or ownership detail is a hallucination.

## How answers are graded

Grading is deterministic, not model-judged. Every returned quote is string-matched at its cited URL and compared against a fact-checked answer key of validated quotes and URLs that Yardstick already holds for these companies. A quote that is missing, altered, or paraphrased counts as a hallucination. A provenance tag that does not match the source tier (for example, a company's own page tagged `THIRD-PARTY`) counts as a mistag. Your hallucination rate is hallucinations divided by total claims.

## Why these companies

The test set is deliberately hard: thin-evidence companies where retrieval systems are most tempted to fill gaps by inventing. Rates measured here are an upper bound; typical companies run cleaner.
