OpenAI HealthBench Dataset

News

OpenAI releases HealthBench dataset to test AI in health care

The dataset—called HealthBench—is OpenAI's first major independent health care project. It includes 5,000 "realistic health conversations," each with detailed grading tools to evaluate AI ...

CNET2mon

OpenAI Launches HealthBench, a Dataset That Benchmarks Health Care AI ...

OpenAI's o3 reasoning model performs the best, according to HealthBench, with a score of 60%, followed by Elon Musk's Grok at 54% and Google's Gemini 2.5 Pro at 52%.

MobiHealthNews2mon

OpenAI unveils HealthBench to evaluate LLMs' safety in healthcare

OpenAI has announced the launch of HealthBench, a benchmark to evaluate AI models in healthcare using real-world applicability and physician judgment. "The 5,000 conversations in HealthBench simulate ...

Longview News-Journal2mon

OpenAI Releases HealthBench Dataset to Test AI in Health Care

Experts say it improves AI evaluation but warn that more review is needed TUESDAY, May 13, 2025 (HealthDay News) — OpenAI has unveiled a large dataset to help test how well artificial ...

Hosted on MSN2mon

OpenAI Launches HealthBench, a Dataset That Benchmarks Health ... - MSN

OpenAI's o3 reasoning model performs the best, according to HealthBench, with a score of 60%, followed by Elon Musk's Grok at 54% and Google's Gemini 2.5 Pro at 52%.. In an example on OpenAI's ...

SF Weekly2mon

OpenAI Releases HealthBench Dataset to Test AI in Health Care

Experts say it improves AI evaluation but warn that more review is needed TUESDAY, May 13, 2025 (HealthDay News) — OpenAI has unveiled a large dataset to help test how well artificial intelligence (AI ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results