OpenAI HealthBench Dataset

News

Benchmarks in medicine: the promise and pitfalls of evaluating AI tools ...

HealthBench challenges traditional AI benchmarks in healthcare, emphasizing real-world clinical judgment over fact recall and precision.

SiliconANGLE1mon

OpenAI to retain deleted ChatGPT conversations following court order

OpenAI will retain users’ deleted ChatGPT conversations to comply with a recently issued court order. Brad Lightcap, the artificial intelligence developer’s chief operating officer, disclosed ...

Ars Technica1mon

OpenAI confronts user panic over court-ordered retention of ChatGPT ...

Late Thursday, OpenAI confronted user panic over a sweeping court order requiring widespread chat log retention—including users' deleted chats—after moving to appeal the order that allegedly ...

Bleeping Computer1mon

OpenAI is hopeful GPT-5 will compete a little more

OpenAI's next big foundational model is GPT-5, and the AI startup is hoping that the model will compete a little more with rivals.

MediaNama1mon

OpenAI Launches 'Sign in with ChatGPT' for Third-Party Apps

OpenAI has asked developers to integrate ChatGPT login option into their apps so that users can sign into third-party apps using ChatGPT.

thecardiologyadvisor.com1mon

OpenAI Releases HealthBench Dataset to Test AI in Health Care

HealthBench, OpenAI's first major independent health care project, includes 5000 “realistic health conversations,” each with detailed grading tools to evaluate AI responses. HealthDay News — OpenAI ...

optometryadvisor1mon

OpenAI Releases HealthBench Dataset to Test AI in Health Care

OpenAI has unveiled a large dataset to help test how well artificial intelligence (AI) models answer health care questions.

Business Insider1mon

OpenAI Launches Codex, a Multitasking AI Coding Agent - Business Insider

OpenAI launched Codex, an AI tool that automates coding tasks like fixing bugs, but could also help you make a dinner reservation.

Wired1mon

OpenAI Launches an Agentic, Web-Based Coding Tool | WIRED

OpenAI Launches an Agentic, Web-Based Coding Tool As vibe coding takes off, OpenAI says Codex will help advanced developers automate chores in a safe and explainable way.

The New York Times1mon

OpenAI Unveils New Tool for Computer Programmers

The tool, Codex, will be able to handle multiple tasks at the same time, the company said. OpenAI is also in talks to acquire a coding tool called Windsurf for $3 billion.

Observer1mon

ChatGPT Used for Medical Advice, OpenAI Launches HealthBench Tool ...

As more people turn to ChatGPT for health concerns, OpenAI introduces a new benchmark to evaluate the safety and accuracy of its medical responses.

MobiHealthNews1mon

OpenAI unveils HealthBench to evaluate LLMs' safety in healthcare

OpenAI has announced the launch of HealthBench, a benchmark to evaluate AI models in healthcare using real-world applicability and physician judgment. "The 5,000 conversations in HealthBench simulate ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results