Methodology

How we build a cohort. The same way, every time.

Yardstick is an independent software research and consulting agency. We publish guides cohort by cohort on the protocol below.

2-min walkthrough / cohort-agnostic

How Yardstick scores any cohort

The three principles, in detail

The video above covers the protocol at a glance. The sections below add the auditable detail: what we explicitly do not do, what is auditable versus editorial, and the in-scope / out-of-scope lines for vendor right-of-reply.

01

Same evidence stack, every vendor in the cohort.

Every claim is labelled. Every cell on every tear-sheet carries one of six provenance labels so the buyer can see exactly where each fact came from:
- MEASURED. Observed first-hand on the free tier or trial seat, or graded by us against a sample prospect (e.g., personalization output quality, setup-time stopwatch on the free tier, UI heuristics from the trial seat).
- ESTIMATED. Derived by us from the vendor's own pricing page and feature limits (e.g., cost-per-seat efficiency).
- VENDOR-CLAIMED. Sourced from the vendor's own materials (docs, pricing page, feature pages, vendor-published benchmarks). Linked to the source.
- THIRD-PARTY. Sourced from a third-party (G2, Capterra, customer case study, practitioner discussion, news). Linked to the source.
- VIDEO-OBSERVED. Observed by us in a recorded vendor demo or walkthrough when no free tier or trial seat is available.
- UNKNOWN. The vendor does not disclose, and we could not verify externally. Recorded as a gap rather than guessed at.
Auditable vs. editorial. The evidence-intake template, the rubric weights, and the per-tear-sheet evidence log (every claim, its label, and the URL or screenshot it traces to) are auditable. They live in the published Yardstick Report appendix. The "best for" call at the end of each tear-sheet is editorial judgement, sourced from the scored data, and labelled as such.
02

Score on observable outcomes, with weights we publish.

One rubric, every vendor. Every vendor in the Yardstick catalog is scored on the same six dimensions with the same weights, regardless of cohort. A buyer comparing an AI SDR to a call-center agent to a clinical-documentation tool sees the same yardstick under each. The dimensions and weights:
- Data integration & accuracy: 25%. Does the tool connect cleanly to the buyer's stack, and are its outputs factually correct on the buyer's data? The frontier requirement: an AI tool that can't be trusted with the buyer's data, or can't produce accurate output once it has it, fails on this dimension no matter what else it does.
- Core output quality: 25%. How well the tool performs its primary job, graded against the cohort's standard task. Sales cohorts grade against a sample prospect's personalization output; CX cohorts grade against a sample ticket resolution; call-center cohorts grade against a sample transcript and agent-assist suggestion; clinical cohorts grade against a sample encounter note. The grading rubric is universal; the standard task is cohort-specific.
- Cost efficiency: 15%. $/output-unit at the buyer's expected scale, including setup, integrations, and the licensing tier the buyer actually needs. Calculated from the vendor's published pricing under documented assumptions; formula in the appendix.
- Trust & security posture: 15%. SOC 2 Type II, ISO 27001, ISO/IEC 42001, HIPAA/PCI/FedRAMP where applicable, data-residency options, model-training opt-out, breach history. Scored from the vendor's trust page plus independent audit-report verification.
- Time-to-value: 10%. Days from purchase to first useful production output. Stopwatched on the free tier or trial seat where available; estimated from documentation plus practitioner reports otherwise.
- Operator usability: 10%. UI heuristics, error states, controllability for the day-to-day operator. Scored by two graders on the trial seat against the same usability rubric every other vendor sees.
Auditable vs. editorial. Every raw input is in the appendix with its evidence label. The weights are public. The rubric anchors that turn raw observations into a 0-4 dimension score are published. What is editorial: the choice of which dimensions to include and at what weight. We invite arguments on that choice. See the section at the bottom of this page.
03

Publish. Vendors check facts. Affiliate links disclosed.

What's in scope. Factual errors only. If we listed a HubSpot integration as native and it's actually via Zapier, we fix it. If we listed a price as $400 a seat and it's $450 a seat, we fix it. If we cited a vendor benchmark and the vendor has a more recent published figure, we update the citation.

What's out of scope. Rankings. Rubric weights. Editorial language. The decision to include or exclude the vendor. The decision to mark a dimension as out-of-scope on the rubric.

Affiliate disclosure. Where the guide or this site links to a vendor's product, that link may earn Yardstick a commission. Disclosed on every page where the link appears. The commission rate is not a factor in the score, the ranking, or the cohort.

The evaluation protocol. What each vendor walks through

Every vendor in the cohort goes through the same five-step protocol. The protocol is fixed before any vendor is loaded into the workbook, and it does not change once evaluation opens. Each step writes to the per-vendor evidence log that ships in the Yardstick Report appendix.

A

Public-information sweep

Vendor docs, pricing page, feature pages, integrations page, security/trust page, recent blog posts, recent funding news, and the vendor's own published benchmarks are pulled into the workbook. Every URL is captured. Every claim sourced from this step is labelled VENDOR-CLAIMED on the tear-sheet, with the source URL linked.
B

Free-tier or trial-seat hands-on

Where the vendor offers a free tier or a self-serve trial seat, we sign up under standard terms and exercise the product against the same sample prospect every other vendor sees. Setup time is stopwatched; UI heuristics graded; email reputation and integration screens screenshotted; data accuracy sampled where the tier exposes enrichment. Outputs from this step are labelled MEASURED. Where no free tier or trial seat exists, we say so on the tear-sheet and fall back to step C for the affected dimensions.
C

Third-party evidence

G2, Capterra, public customer case studies, recent video walkthroughs, and practitioner discussion (LinkedIn, Reddit, Pavilion). Used to triangulate setup time, UI heuristics, and integration depth on dimensions we cannot exercise on a free tier. Outputs from this step are labelled THIRD-PARTY with the source URL on the tear-sheet. Where the only evidence is a recorded vendor demo or walkthrough, the label is VIDEO-OBSERVED. Where the vendor does not disclose and we cannot verify externally, the cell is labelled UNKNOWN rather than guessed at.
D

Sample-prospect output grading

A single sample prospect, full ICP profile, public LinkedIn, public company page, is loaded into each tool that can produce outbound copy. Two graders score the output blind against the personalization rubric. Output and grade live in the appendix. Outputs from this step are labelled MEASURED.
E

Cost-per-seat estimation

From the vendor's published pricing and feature limits, a $/month-per-output-unit figure is calculated under documented assumptions. The formula and assumptions are printed in the appendix. Outputs from this step are labelled ESTIMATED. Where no list pricing is published, the dimension is scored "not disclosed" and the tear-sheet says so.

Cohorts we publish

Each cohort's rubric is published on its own page. The protocol on this page is the same across all of them.

AI sales tools: outbound sequencers, AI SDR agents, conversation intelligence, embedded CRM AI.
Open the AI sales cohort
AI customer experience tools: agentic CX, AI-first CRM, ticketing-native AI agents, FinServ digital CX.
Open the AI CX cohort
AI call-center tools: CCaaS, AI agent-assist, post-call analytics, voicebot containment.
Open the AI call-center cohort
AI recruiting tools: talent intelligence, conversational hourly hiring, video-interview AI, mid-market ATS+CRM.
Open the AI recruiting cohort
AI BD & partnerships tools: ecosystem-led growth, account mapping, co-sell automation, partner-relationship management.
Open the AI BD cohort
AI customer success tools: health-score signals, churn prediction, agentic CSMs, retention and expansion automation.
Open the AI customer success cohort
AI tax & compliance tools: sales-tax engines, tax-prep workflows, accounting AI.
Open the AI tax cohort
AI ERP / operations tools: Microsoft, SAP, Oracle, NetSuite copilots and agentic ERP.
Open the AI ERP cohort
AI social-media-marketing tools: content generation, publishing, listening, brand governance.
Open the AI social cohort
More cohorts in flight. Got a vendor we should score next?
Submit a vendor

Vendor right-of-reply

A factual claim should survive contact with the vendor's own team. A ranking is editorial and is not subject to appeal. The right-of-reply protocol enforces both.

How it works. Seven calendar days before a guide is published, every vendor in the cohort receives their scored tear-sheet by email. They have those seven days to flag factual errors. If they reply with corrections, we verify the correction against our raw data and amend the tear-sheet if the correction is upheld. If they don't reply, the tear-sheet publishes as-is.

In scope. Pricing tier listed incorrectly. Feature misquoted. Integration listed as native that's actually via a third party (or vice versa). Customer count off. Founding year off. ICP mis-stated.

Out of scope. The score. The ranking. The rubric. The weights. The decision to mark a dimension as out-of-scope on the rubric. The editorial language in the tear-sheet. The decision to include or exclude the vendor from the cohort.

The full right-of-reply email template is in the brand kit and reproduced in every Yardstick Report appendix.

Conflicts of interest

Every revenue line that touches a vendor is disclosed here, on every relevant page, and in the Yardstick Report footer.

Affiliate links. Where the guide or this site links to a vendor's product, that link may earn Yardstick a commission. The presence or rate of an affiliate program does not affect the score, the ranking, or whether the vendor is included in the cohort. A vendor without an affiliate program is treated identically to one with the largest commission in the category.

Paid placements. None. We do not accept sponsored content, paid placements, vendor research contracts, or "thought leadership" trades.

Vendor advisory roles. None. Yardstick staff do not hold paid advisory positions, formal or informal, at any vendor in any cohort we cover.

Equity. Yardstick holds no equity in any vendor in any cohort we cover. If that ever changes, the holding is disclosed in the affected tear-sheet, in the appendix, and in the footer of every page where the vendor appears.

Argue with our methodology

Find a flaw. Tell us.

Every methodology is wrong somewhere. The way it stops being wrong is by surviving public critique from people who know more than we do about a corner of it. If you spot a flaw, a weight that's miscalibrated, a dimension we should add or drop, an inclusion criterion that's screening out the wrong vendors, a baseline number we should update, write to us.

hello@yardstickresearch.app

Email a critique

Or read more about who we are / take the audit

Take the free 4-minute readiness audit.

Get your score, peer benchmarks, and three tailored vendor recommendations. No email required to see your results.

Take the audit

How we build a cohort. The same way, every time.

How Yardstick scores any cohort

The three principles, in detail

Same evidence stack, every vendor in the cohort.

Score on observable outcomes, with weights we publish.

Publish. Vendors check facts. Affiliate links disclosed.

The evaluation protocol. What each vendor walks through

Public-information sweep

Free-tier or trial-seat hands-on

Third-party evidence

Sample-prospect output grading

Cost-per-seat estimation

Cohorts we publish

Vendor right-of-reply

Conflicts of interest

Find a flaw. Tell us.

Take the free 4-minute readiness audit.