We were able to fully restore service as of this afternoon.'īaszucki said Roblox will publish a post-mortem with more details once the company has completed its analysis, along with actions it will take to prevent this from happening again. Upon successfully identifying this root cause, we were able to resolve the issue through performance tuning, re-configuration, and scaling back of some load. 'Due to the difficulty in diagnosing the actual bug, recovery took longer than any of us would have liked. The result was that most services at Roblox were unable to effectively communicate and deploy. Rather the failure was caused by the growth in the number of servers in our datacenters. This was not due to any peak in external traffic or any particular experience. 'A core system in our infrastructure became overwhelmed, prompted by a subtle bug in our backend service communications while under heavy load.
'This was an especially difficult outage in that it involved a combination of several factors,' David Baszucki, Roblox founder and CEO, said in a blog post.