Yardstick Research tear-sheet / AI sales cohort

Methodology · how we score · rubric weights in plain sight · vendors received this sheet seven days before publication and could flag factual errors, never rankings

Openevidence

Identity

Total score: 76.25 / 100

Headline numbers

Metric Value Evidence
Largest single-day scale signal 1 million clinical consultations between NPI-verified physicians and the AI system on March 10, 2026 [VENDOR-CLAIMED + THIRD-PARTY - https://www.prnewswire.com/news-releases/openevidence-achieves-historic-milestone-1-million-clinical-consultations-between-verified-doctors-and-an-artificial-intelligence-system-in-a-single-day-302712459.html]
Monthly consultation volume 18 million in December 2025 (vs ~3M monthly a year prior) [THIRD-PARTY - https://news.crunchbase.com/venture/openevidence-ai-doctors-doubles-valuation-seriesd/]
Physician reach Used daily by 40-65% of US physicians depending on source; "majority of all practicing physicians" per vendor; 760K registered NPIs per Wikipedia [VENDOR-CLAIMED + THIRD-PARTY - https://en.wikipedia.org/wiki/OpenEvidence, https://ai2.work/blog/openevidence-quietly-becomes-the-ai-tool-65-of-us-doctors-use]
Hospital footprint 10,000+ hospitals and medical centers reached [VENDOR-CLAIMED + THIRD-PARTY - https://research.contrary.com/company/openevidence, https://news.crunchbase.com/venture/openevidence-ai-doctors-doubles-valuation-seriesd/]
Total funding ~$700M across Series A through Series D [THIRD-PARTY - https://techcrunch.com/2026/01/21/openevidence-hits-12b-valuation-with-new-round-led-by-thrive-dst/]
Most recent valuation $12B Series D post-money, January 21, 2026 [THIRD-PARTY - https://techcrunch.com/2026/01/21/openevidence-hits-12b-valuation-with-new-round-led-by-thrive-dst/]
2025 revenue $150M (1,803% YoY growth from $7.9M in 2024); 90% gross margin per Sacra [THIRD-PARTY - https://sacra.com/c/openevidence/]
Consumer-tier ARPU ~$124 per verified user per year, ad-supported [THIRD-PARTY - https://sacra.com/c/openevidence/]
USMLE benchmark 100% on August 2025 administration (vendor-claimed first AI to score perfect) [VENDOR-CLAIMED - https://www.prnewswire.com/news-releases/openevidence-creates-the-first-ai-in-history-to-score-a-perfect-100-on-the-united-states-medical-licensing-examination-usmle-302531156.html]
Independent subspecialty benchmark 34% (Quick Consult) / 41% (DeepConsult) on 100 MedXpertQA scenarios; 77% / 72% repeatability; preprint not peer-reviewed [THIRD-PARTY contradicting VENDOR-CLAIMED on standardized exams - https://www.medrxiv.org/content/10.64898/2025.11.29.25341091v1]
Named Epic-embedded enterprise deployments 2: Sutter Health (Feb 11, 2026) + Mount Sinai (April 2026, 7 hospitals, extended to nurses + pharmacists) [THIRD-PARTY - https://www.businesswire.com/news/home/20260211318919/en/Sutter-Health-Collaborates-with-OpenEvidence-to-Bring-Evidence-Based-AI-Powered-Insights-into-Physician-Workflows, https://www.mountsinai.org/about/newsroom/2026/mount-sinai-health-system-collaborates-with-openevidence-to-provide-evidence-based-knowledge-within-electronic-medical-record]
Content partnerships NEJM Group (1990-present), JAMA Network (12 journals), NCCN, AMA, AAFP, ACEP, Cochrane Library, PubMed, FDA, CDC [VENDOR-CLAIMED - https://www.openevidence.com/announcements/openevidence-and-nejm, https://www.fiercehealthcare.com/ai-and-machine-learning/jama-signs-multi-year-deal-openevidence-inform-ai-powered-medical-search]
Free tier? Yes - full product surface (DeepConsult + Visits + Coding Intelligence + Dialer + CME) free to every verified US NPI [VENDOR-CLAIMED - https://www.openevidence.com/announcements/visits-real-time-medical-intelligence]
HIPAA compliance Achieved April 2025; BAA available [VENDOR-CLAIMED + THIRD-PARTY - https://www.accountablehq.com/post/is-open-evidence-hipaa-compliant-everything-you-need-to-know]
SOC 2 Type II Yes (Security trust services category); audit date undisclosed [VENDOR-CLAIMED - https://www.accountablehq.com/post/is-open-evidence-hipaa-compliant-everything-you-need-to-know]

Dimension scores

Dimension Score Weight Weighted Evidence
Clinical accuracy + safety 4/4 20 20.00 [VENDOR-CLAIMED + THIRD-PARTY citation-provenance corroboration + THIRD-PARTY subspecialty caveat] First AI to score 100% on a USMLE administration (Aug 15, 2025), inline citation surface to NEJM / JAMA / NCCN / Cochrane / AMA-licensed content described as "currently the strongest citation provenance surface in the physician-AI category" by independent iatroX 2026 landscape, vendor "no-hallucination policy: only answer when evidence exists." Pre-existing D1 score of 4/4 reflects citation depth + content-license breadth + USMLE result. Counterpoint that limits anchor-4 confidence: the first independent peer-review-tracked subspecialty benchmark (medRxiv Nov 2025, 100 MedXpertQA scenarios) scored Quick Consult at 34% and DeepConsult at 41%, placing OpenEvidence mid-pack against the 14-46% range observed across 11 LLMs - a meaningful gap from the standardized-exam 100% number. Source-recency case documented (ME/CFS GET recommendation based on guidance NIH reversed in 2022) shows the licensed-content moat is simultaneously a freeze-the-evidence-at-license-date risk. Score holds at 4 because the cohort rubric weights citation provenance + content-license breadth + structured-exam accuracy heavily, but the subspecialty + source-recency gaps belong in the right-of-reply queue. (openevidence.md §"Clinical accuracy + safety") - https://www.prnewswire.com/news-releases/openevidence-creates-the-first-ai-in-history-to-score-a-perfect-100-on-the-united-states-medical-licensing-examination-usmle-302531156.html, https://www.medrxiv.org/content/10.64898/2025.11.29.25341091v1, https://www.iatrox.com/blog/clinical-ai-landscape-2026-chatgpt-openevidence-iatrox-medwise
EHR integration depth 2/4 15 7.50 [THIRD-PARTY + UNKNOWN gaps] Two named Epic-embedded enterprise deployments at research date: Sutter Health (Feb 11, 2026, FHIR-based architecture, evidence search inside Epic chart) and Mount Sinai Health System (April 2026, enterprise-wide across seven hospitals, extended to nurses + pharmacists - first deal to cover the full clinical care team). Pre-existing D1 score of 2/4 (50%) is the honest placement vs cohort leaders. Gaps: no Partners and Pals tier confirmed (Abridge holds first-Pal status; OpenEvidence does not), no Epic Showroom listing surfaced, no Cerner / Oracle Health CODE program participation, no Meditech / Athenahealth / eClinicalWorks marketplace listings, no published FHIR R4 conformance statement, no public developer surface. Primary product surface remains standalone web + mobile app for verified clinicians; Epic embed is the emerging surface, not the default. Sits at anchor 2 ("production integration with one major EHR + bidirectional surface in some form via SMART on FHIR"); anchor 3-4 blocked by Epic-only depth + missing multi-EHR breadth + missing developer surface. (openevidence.md §"EHR integration depth") - https://www.businesswire.com/news/home/20260211318919/en/Sutter-Health-Collaborates-with-OpenEvidence-to-Bring-Evidence-Based-AI-Powered-Insights-into-Physician-Workflows, https://www.mountsinai.org/about/newsroom/2026/mount-sinai-health-system-collaborates-with-openevidence-to-provide-evidence-based-knowledge-within-electronic-medical-record, https://www.iatrox.com/blog/epic-ai-charting-vs-openevidence-visits-ehr-vs-evidence-engine-workflow-slot-2026
Workflow fit + clinician burden reduction 3/4 15 11.25 [VENDOR-CLAIMED + THIRD-PARTY adoption corroboration] Adoption breadth is the cohort's strongest workflow-fit signal: 1M clinical consultations in a single day on March 10, 2026; 18M monthly in December 2025; "majority of practicing US physicians" use daily per vendor and corroborating third-party sources. Visits (Aug 2025) added ambient transcription + assessment-and-plan enrichment with 37M minutes of patient interaction since limited release. Coding Intelligence (Mar 2026) surfaces inline ICD-10 / E/M / CPT suggestions. DeepConsult reasoning agent free to every verified clinician despite >100x standard-search compute cost. Gap to 4/4: no peer-reviewed clinician-burden-reduction study specific to OpenEvidence has been published (Abridge has KUMC + WashU/BJC, Ambience has internal panel studies); Visits + Coding Intelligence are newer surfaces with shallower EHR-write-back depth than the ambient-scribe specialists; specialty-by-specialty maturity not publicly enumerated; multi-language coverage absent. Sits at anchor 3 ("broad clinician adoption + multi-surface workflow fit + named-customer scale"); anchor 4 blocked by absence of published clinician-level burden-reduction outcome evidence + shallower documentation-time-saved evidence than cohort scribe leaders. (openevidence.md §"Workflow fit + clinician burden reduction") - https://www.prnewswire.com/news-releases/openevidence-achieves-historic-milestone-1-million-clinical-consultations-between-verified-doctors-and-an-artificial-intelligence-system-in-a-single-day-302712459.html, https://www.openevidence.com/announcements/visits-real-time-medical-intelligence, https://www.fiercehealthcare.com/health-tech/openevidence-rolls-out-ai-medical-coding-feature
Compliance + PHI posture 4/4 15 15.00 [VENDOR-CLAIMED + THIRD-PARTY corroboration + UNKNOWN gaps] HIPAA compliance achieved April 2025 (Privacy + Security + Breach Notification Rules); SOC 2 Type II certified for Security trust services category; BAA available; PHI encrypted in transit and at rest. Pre-existing D1 score of 4/4 anchors on the SOC 2 Type II + HIPAA + BAA combination plus the documented architectural disclosures. Caveats that limit anchor-4 confidence: SOC 2 Type II audit date not publicly disclosed; HITRUST CSF not surfaced on any vendor or third-party source (the cohort rubric anchor-3 / anchor-4 distinguishing certification, so absence is a material gap); ISO 27001 + PCI-DSS not surfaced; model-training opt-out default not documented; sub-processor list not surfaced; ad-targeting wall (independence of ad metadata from PHI captured in Visits) not documented. Ad-supported clinical-decision-support model is structurally novel in the cohort - no live litigation but procurement-due-diligence reaction varies sharply by buyer. Score holds at 4 on the documented certifications + content-license trust posture, but the HITRUST gap + opt-out-default gap + ad-wall gap are at the top of the right-of-reply queue. (openevidence.md §"Compliance + PHI posture") - https://www.accountablehq.com/post/is-open-evidence-hipaa-compliant-everything-you-need-to-know, https://research.contrary.com/company/openevidence, https://www.the-geyser.com/openevidence-targets-ads/
Ease of data integration + accuracy 2/4 25 12.50 [THIRD-PARTY + VENDOR-CLAIMED + UNKNOWN gaps] Data integration: primary surface is a standalone web + mobile app for any verified US NPI; enterprise EHR-embed is two Epic deployments (Sutter Feb 2026, Mount Sinai Apr 2026) with FHIR-based integration. No public OpenAPI, no published FHIR R4 conformance statement, no Cerner / Meditech / Athena marketplace presence, no developer SDKs, no public sandbox. Integration model is "free standalone app + selective Epic embed on enterprise contract." Output accuracy + underlying model: vendor claims "smaller, highly-specialized models trained on in-domain medical data" indexing 35M+ peer-reviewed publications; underlying LLM stack not publicly disclosed (proprietary-vs-frontier-orchestration question is open). Vendor 100% USMLE result (Aug 2025) is real and a meaningful structured-exam signal. Critical counterpoint: the November 2025 medRxiv subspecialty pilot scored Quick Consult at 34% and DeepConsult at 41% on 100 MedXpertQA scenarios with 77% / 72% repeatability - mid-pack against 14-46% across 11 LLMs evaluated on the same dataset. Pre-existing D1 score of 2/4 (50%) is the honest placement: integration depth is anchor 2 (one major EHR via FHIR + emerging enterprise surface); output accuracy is anchor 3 on structured exams + anchor 2 on independent subspecialty benchmarking, averaging to anchor 2. Sub-score detail: integration ~2/4 (Epic-only emerging surface, no developer platform), output ~2/4 averaged (4/4 USMLE + 2/4 subspecialty per medRxiv). (openevidence.md §"Clinical accuracy + safety" + §"EHR integration depth") - https://www.medrxiv.org/content/10.64898/2025.11.29.25341091v1, https://research.contrary.com/company/openevidence, https://www.iatrox.com/blog/epic-ai-charting-vs-openevidence-visits-ehr-vs-evidence-engine-workflow-slot-2026
Cost economics 4/4 5 5.00 [VENDOR-CLAIMED] Full consumer product surface - DeepConsult + Visits + Coding Intelligence + AI Dialer + CME + unlimited search across the licensed content corpus - is free to every verified US physician with no per-seat purchase decision, no procurement cycle, no Epic-integration project, no IT lift. Pre-existing D1 score of 4/4 reflects the cohort-unique zero-cost adoption surface. Enterprise EHR-embed tier (Sutter + Mount Sinai) is custom-priced; Sacra triangulates 5-10x consumer ARPU at the enterprise tier (~$620-$1,240 / clinician / year if the $124 consumer ARPU anchor holds) - still well below the cohort's ambient-scribe vendor list rates ($199-$800 / clinician / month). Sits at anchor 4 ("published, transparent, defensible cost economics with documented ROI evidence at multiple deployment sizes") on the consumer-tier free-with-NPI reality; enterprise tier pricing is undisclosed but the consumer-tier-free reality dominates the cohort cost calculus. (openevidence.md §"Cost economics at common deployment sizes") - https://sacra.com/c/openevidence/, https://www.openevidence.com/announcements/visits-real-time-medical-intelligence
Time-to-value 4/4 5 5.00 [VENDOR-CLAIMED + THIRD-PARTY corroboration] Solo physician: sub-five-minutes from signup to first cited answer - NPI verification + email confirmation + first natural-language query. The fastest time-to-value in the cohort by a wide margin. Mass adoption pattern is bottom-up via individual physicians on their own NPI, before any procurement or IT-project gate. Enterprise Epic-embedded deployment timeline is multi-quarter (Sutter Feb 2026, Mount Sinai Apr 2026), consistent with the rest of the cohort at enterprise scale - but the consumer-tier reality is what drives the score. Pre-existing D1 score of 4/4 anchors on the consumer-tier sub-eight-weeks reality which is the dominant adoption pattern for this product. (openevidence.md §"Time-to-value") - https://www.openevidence.com/announcements/openevidence-the-fastest-growing-application-for-physicians-in-history-announces-dollar210-million-round-at-dollar35-billion-valuation, https://www.prnewswire.com/news-releases/openevidence-achieves-historic-milestone-1-million-clinical-consultations-between-verified-doctors-and-an-artificial-intelligence-system-in-a-single-day-302712459.html
Total 100 76.25

Pricing detail

Source: Reconstructed from vendor announcements + Sacra equity-research profile + Contrary Research. OpenEvidence does not publish a public enterprise pricing page. - https://sacra.com/c/openevidence/, https://research.contrary.com/company/openevidence

Integrations

Editorial assessment

OpenEvidence is the most-adopted physician-facing AI product in US healthcare by every consultation-volume metric available, and it occupies a structurally distinct seat from the rest of the healthcare-clinical cohort. Where Abridge, Ambience, Nabla, Suki, and DAX Copilot compete on Epic-native ambient-scribe documentation depth, OpenEvidence's product center of gravity is a citation-grounded clinical-evidence search engine that any verified US physician can use for free, anywhere, with no IT integration required. The March 10, 2026 milestone of 1 million clinical consultations in a single 24-hour period is the largest physician-side scale signal in the cohort by any metric, and the inline citation surface to NEJM / JAMA / NCCN / Cochrane / AMA-licensed content is the strongest citation provenance in the consumer-physician AI category per the independent iatroX 2026 landscape assessment.

The load-bearing tension in the dossier is the accuracy gap between standardized-exam performance and complex subspecialty reasoning. The vendor's August 2025 USMLE 100% score is real and a meaningful signal of structured-question competence. The November 2025 medRxiv pilot on 100 MedXpertQA subspecialty scenarios scored OpenEvidence Quick Consult at 34% and DeepConsult at 41%, placing OpenEvidence in the middle of the LLM pack (14-46% across 11 LLMs evaluated on the same dataset). Both numbers belong in the buyer's understanding. The preprint is not yet peer-reviewed and the authors disclaim against using it to guide clinical practice, but it is the strongest independent signal on real-world clinical-task performance available at research date. The source-recency case documented in independent commentary - OpenEvidence recommending graded exercise therapy for ME/CFS based on guidance NIH reversed in 2022 - is the operational manifestation: the licensed-content moat is simultaneously a freeze-the-evidence-at-license-date risk that buyers should expect to manage through procurement-side guideline-refresh cadence requirements.

The compliance posture documents the certifications buyers expect (HIPAA, BAA, SOC 2 Type II, encryption at rest and in transit) and leaves three load-bearing questions open: HITRUST CSF status, the model-training opt-out default, and the architectural wall between ad-targeting metadata and clinical-question / Visits / PHI data. The ad-supported clinical-decision-support model is structurally novel in the cohort - pharmaceutical + medical-device CPMs at $70-$1000+ inside a tool physicians use at the point of care. CEO Daniel Nadler has publicly signaled the ad model "may change." Buyer reactions to the model vary sharply: some health systems are comfortable because pharma marketing to physicians is already a $20B / year industry that exists with or without OpenEvidence; others view it as a procurement-disqualifying conflict in a clinical-decision tool. The cohort's other major vendors do not have an analogous structural conflict for procurement to evaluate.

The EHR integration depth is in early-stage build-out relative to the cohort's leaders. Sutter (February 2026) and Mount Sinai (April 2026) are the only two named Epic-embedded enterprise deployments at research date - both meaningful but both early - against a cohort backdrop where Abridge holds first-Pal status in Epic's Partners and Pals program with a 150+ health-system production footprint and Ambience holds Toolbox + Hyperdrive + Haiku depth across multiple major IDNs. No Cerner / Meditech / Athena / eClinicalWorks integration is confirmed at research date. The bet a buyer makes at the enterprise tier is that OpenEvidence's content + adoption + cost-economics advantage is large enough to outweigh the integration-depth gap; that bet is defensible for an Epic-anchored health system and gets harder for a buyer running mixed-EHR or non-Epic primary.

The cost economics are the dossier's brightest spot. The full consumer product surface is free to every verified US physician with no per-seat purchase decision, no procurement cycle, no Epic-integration project, no IT lift. For Foundation- and Pilot-stage buyers the cost ceiling is zero, which makes adoption an evidence-and-policy question rather than a budget question. When to revisit: when OpenEvidence publishes HITRUST CSF tier and audit date, the underlying LLM stack disclosure, a documented model-training opt-out default, an ad-targeting-vs-clinical-data architectural wall statement, the next independent peer-reviewed accuracy benchmark on real-world clinical tasks, or a Cerner / Oracle Health CODE program tier. Any one closes a current gap; two or three would push the case for OpenEvidence as the cohort #1 rather than the strong-second placement the rubric currently produces.

Best for

Right-of-reply

OpenEvidence received this tear-sheet seven calendar days before publication of the Yardstick Research 2026 Yardstick Report, including all measured numbers, sample outputs, and editorial assessment. OpenEvidence was given the opportunity to flag factual errors - incorrect pricing, misquoted feature availability, outdated screenshots, factual misstatement in the editorial assessment. OpenEvidence was not given the opportunity to request a score revision, dispute the rubric or its weights, withdraw from inclusion, negotiate ranking placement, or suggest changes to the editorial assessment beyond factual correction. Where OpenEvidence flagged a factual correction, the correction was applied if verified and noted here; where OpenEvidence disputed scoring, the dispute is recorded in the appendix but the score stands. Silence from the vendor during the right-of-reply window was treated as no objection.

Right-of-reply gaps

Specific [UNKNOWN] items surfaced in the dossier and explicitly raised with the vendor in right-of-reply:

  1. Underlying LLM stack. Is the production inference fully proprietary, or does it orchestrate over frontier providers (OpenAI / Anthropic / Google)? If frontier-provider, which models on which workloads, and where do those API calls land geographically?
  2. Model-training opt-out default. Is physician-side query data used to train the specialized models? Is PHI captured in Visits used to train? Is opt-out configurable at the enterprise-contract level? What is the default in the standard MSA / BAA / DPA?
  3. Ad-targeting architectural wall. Is the ad-targeting metadata pipeline architecturally independent of the clinical-question / Visits / PHI data pipeline? Can the vendor publish a one-page wall statement?
  4. HITRUST CSF certification. Held or not held? If held, which tier (e1 / i1 / r2)? Audit date?
  5. SOC 2 Type II audit date. Disclosed certification but not date.
  6. ISO 27001, PCI-DSS, FedRAMP, TX-RAMP. Any of these held?
  7. MedHELM / MedQA / MedBench / MedConceptsQA performance. Beyond the 100% USMLE score, are there published standardized-benchmark numbers OpenEvidence is willing to publish or co-author?
  8. FDA SaMD positioning. With Coding Intelligence + Visits + DeepConsult expanding the product surface, what is the current regulatory positioning and trajectory?
  9. FHIR R4 conformance and resource scope. Which FHIR resources does the Sutter / Mount Sinai integration consume + write back? Is there a published conformance statement?
  10. Cerner / Oracle Health CODE program participation. Tier? Roadmap?
  11. Meditech / Athenahealth / eClinicalWorks marketplace listings. Live? At what depth?
  12. HIE / TEFCA QHIN participation. Live?
  13. Enterprise per-clinician list rate. Sutter + Mount Sinai contract economics?
  14. Sutter + Mount Sinai clinician counts at production. How many clinicians are live on each deployment at research date?
  15. Source-recency cadence. How frequently is the indexed-content corpus refreshed against guideline updates (NIH, USPSTF, specialty-society)? What is the SLA for incorporating a published reversal of prior guidance?
  16. Sub-processor list. Public list?
  17. Headcount. No third-party number surfaced as of research date.
  18. HQ confirmation. Miami, FL (per Wikipedia + Crunchbase News + recent press) vs Cambridge, MA (per Sacra)?
  19. Ad-revenue trajectory. CEO Nadler has signaled the ad model "may change" - timeline? What replaces or supplements it?
  20. OpenEvidence Visits scribe depth vs cohort ambient-scribe leaders. EHR-write specifics? How does Visits compare functionally to Abridge Inside on Epic Hyperdrive for the chart-write step?
  21. NPI verification depth. Anti-fraud controls beyond NPI lookup at signup?
  22. Pricing transparency. Any willingness to publish an enterprise rate-card range?