OpenAI published a new write-up about elevated errors in ChatGPT that significantly increased failed conversation attempts. The issue was caused by a misconfigured internal experiment.
According to OpenAI:
“On February 19, 2025, from 9:48 AM to 11:19 AM PT, ChatGPT experienced a service degradation, leading to a significant increase in failed conversation attempts. This resulted in blank responses for many users.
The root cause was a misconfigured internal experiment that unintentionally triggered a surge in traffic, overwhelming our inference infrastructure. This increase in load led to saturation of compute resources, causing failures in generating responses.
After identifying the root cause, we took immediate action by temporarily shedding load from free-tier users to stabilize the system. As capacity was restored, paid users gradually recovered, and the full service was restored by 11:19 AM PT.”
OpenAI Continues To Work On Solutions
The incident response goes on to note that they continue to work on changes that will prevent similar outages from happening, writing:
“Stronger Safeguards: Building better protections around experiment changes and configurations by moving from a uniform approval process to a risk-based model to ensure safer rollouts of experiments.
Faster Root Cause Identification: Automating notifications for relevant changes and experiments to more quickly identify root causes of increased failures.”
Read the incident report