The Yardstick Feed

AI updates, frontier-model releases, and standards work.

A weekly digest for B2B AI buyers: capability and safety announcements from frontier labs, AI standards and regulatory updates, agent-reliability research, and Yardstick methodology notes. Sourced from a fixed list of labs, regulators, and research institutions.

June 12, 2026 Yardstick

We audited our own AI for hallucinations: the method, the data, and the script that checks our math

25 models, the 30 hardest companies we cover, 454 audited runs, every claim checked by deterministic code against frozen copies of real source pages. No model produced zero hallucinations, and the gate caught every one the winning pipeline produced before anything published. The whitepaper, the full anonymized dataset, and a verification script that recomputes every number are free to share and republish under CC BY 4.0.

By Yardstick Research
MAY 22 '26
May 22, 2026 Weekly Digest

AI Updates, Week of May 22, 2026

Frontier-lab consolidation and oversight signal both showed up this week. Cohere shipped Command A+ and announced two strategic MOUs the same day, DeepMind released its Co-Scientist research-agent system, AWS Bedrock unlocked per-request cost attribution, and the UK AI Security Institute published its Frontier AI Trends Report.
Frontier labs
- Introducing Command A+ Cohere Blog
  An open-weights mixture-of-experts model built for high-performance agentic tasks that can be privately deployed on as little as two H100 GPUs. The pitch: enterprise-grade reasoning and multimodal capability without the inference-cost profile of a frontier dense model.
  Read at cohere.com →
- Co-Scientist: A multi-agent AI partner to accelerate research Google DeepMind
  A multi-agent system built on Gemini that iteratively generates, debates, and refines novel hypotheses for life-sciences research. Positioned as a collaborator for working scientists rather than a one-shot reasoning engine.
  Read at deepmind.google →
- Cohere acquires Reliant AI to expand sovereign enterprise AI Cohere Blog
  Reliant AI is a Montreal- and Berlin-based biopharma AI company. The acquisition extends Cohere's sovereign-AI positioning into healthcare-vertical workflows where data residency and compliance shape vendor selection more than raw model capability.
  Read at cohere.com →
Infrastructure & platforms
- Amazon Bedrock expands support for request-level usage attribution AWS What's New
  Customers can now tag InvokeModel and InvokeModelWithResponseStream calls with team, application, environment, and experiment metadata for per-request cost reporting. Closes a long-standing gap for enterprises that need to attribute Bedrock spend across business units.
  Read at aws.amazon.com →
Standards & regulation
- Frontier AI Trends Report UK AI Security Institute
  An assessment of the AI oversight landscape, its robustness to capability advances, and the pathways that could lead to its degradation. Pairs with NIST's RMF as buyer-facing evidence on whether a vendor's claims about model safety hold up to independent evaluation.
  Read at aisi.gov.uk →
By Yardstick Research
MAY 15 '26
May 15, 2026 Weekly Digest

AI Updates, Week of May 15, 2026

Distribution and oversight dominated the week. OpenAI stood up a Deployment Company to embed engineers with enterprise customers, the UK AI Security Institute shipped two safety papers, and Anthropic announced a four-year, $200M partnership with the Gates Foundation.
Frontier labs
- OpenAI launches the OpenAI Deployment Company OpenAI
  A new arm of OpenAI focused on embedding Forward Deployed Engineers with enterprise customers, paired with the acquisition of Tomoro. Signals an industry shift toward services-and-models bundles rather than model-API-only sales.
  Read at openai.com →
- Anthropic forms $200 million partnership with the Gates Foundation Anthropic News
  Four-year commitment of grant funding, Claude usage credits, and technical support across global health, life sciences, education, and economic mobility. Foundation-grade enterprises now have a defined path to Claude deployment with vendor support included.
  Read at anthropic.com →
Standards & regulation
- Alignment research paper UK AI Security Institute
  AISI's alignment work probes whether frontier-model behavior actually matches stated objectives under adversarial conditions. Useful as procurement reference when a vendor claims their agent is safe by design.
  Read at aisi.gov.uk →
- Cyber autonomous systems capabilities research UK AI Security Institute
  AISI's narrow-cyber suite finds the length of tasks frontier models can autonomously complete in cybersecurity scenarios is doubling every few months. Direct implication for any AI-buyer doing threat modeling, the offensive-capability curve outpaces typical procurement cycles.
  Read at aisi.gov.uk →
By Yardstick Research
MAY 8 '26
May 8, 2026 Weekly Digest

AI Updates, Week of May 8, 2026

Three deployment-relevant shifts. OpenAI released GPT-5.5 Instant with a measured drop in hallucination rate, AWS Bedrock pushed AgentCore into GovCloud, and HiDream-O1-Image opened a state-of-the-art image model under open weights.
Frontier labs
- GPT-5.5 Instant: smarter, clearer, and more personalized OpenAI
  New default model in ChatGPT. OpenAI reports 52.5% fewer hallucinated claims than GPT-5.3 Instant in internal evaluations covering medicine, law, and finance. Worth knowing if you are running a Claude-vs-GPT bake-off for a B2B vertical with hallucination risk.
  Read at openai.com →
Infrastructure & platforms
- Amazon Bedrock AgentCore now available in AWS GovCloud (US-West) AWS What's New
  Enterprise-grade agentic AI capabilities now reach AWS GovCloud, unlocking agent deployment for workloads with elevated compliance needs. Relevant for federal contractors and regulated buyers whose data-residency rules previously blocked Bedrock.
  Read at aws.amazon.com →
- Amazon Bedrock now offers OpenAI models, Codex, and Managed Agents (Limited Preview) AWS What's New
  GPT-5.5 and GPT-5.4 come to Bedrock with unified security, governance, and cost controls. The Codex coding agent runs inside existing AWS environments, processing inference through Bedrock and applying usage toward AWS commitments, useful for shops with AWS spend commitments who want OpenAI-tier coding agents without a separate vendor contract.
  Read at aws.amazon.com →
Research
- HiDream-O1-Image: open-weights text-to-image released Hugging Face
  8B-parameter image model open-sourced with both undistilled and distilled variants plus a Reasoning-Driven Prompt Agent. Debuted at #8 on the Artificial Analysis Text-to-Image Arena, useful reference point for buyers evaluating generative-imagery vendors against an open-weights baseline.
  Read at huggingface.co →
By Yardstick Research

See all 5 updates →

Tip us on a source

A source we should be tracking?

Email a link to a frontier-lab blog, a standards body, or a research feed worth adding to the source list. Vendor pitches belong on the submit-vendor page instead.

hello@yardstickresearch.app

Email a tip

Or submit a vendor for cohort consideration

Take the free 4-minute readiness audit.

Get your score, peer benchmarks, and three tailored vendor recommendations. No email required to see your results.

Take the audit

AI updates, frontier-model releases, and standards work.

We audited our own AI for hallucinations: the method, the data, and the script that checks our math

AI Updates, Week of May 22, 2026

Frontier labs

Infrastructure & platforms

Standards & regulation

AI Updates, Week of May 15, 2026

Frontier labs

Standards & regulation

AI Updates, Week of May 8, 2026

Frontier labs

Infrastructure & platforms

Research

A source we should be tracking?

Take the free 4-minute readiness audit.