Degraded load times and live updates
Incident Report for ProductOS
Postmortem

Some API servers began consuming large amounts of CPU, resulting in them being restarted due to failing health checks. This resulted in periodic failed or slow network requests and degraded performance for real time updates. We scaled up the number of API instances to compensate, but simultaneously we hit a bug in our load balancer due to a misconfiguration that prevented network traffic from hitting our backend instances. The load balancer issue was fixed quickly and service resumed as normal.

Posted Jul 27, 2020 - 15:29 UTC

Resolved
This incident has been resolved.
Posted Jul 27, 2020 - 15:26 UTC
Update
We are continuing to monitor for any further issues.
Posted Jul 27, 2020 - 15:16 UTC
Monitoring
The number of available API servers has been increased and we're monitoring the situation
Posted Jul 27, 2020 - 14:12 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jul 27, 2020 - 13:43 UTC
Identified
The jellyfish API servers are experiencing elevated CPU usage, resulting in regular instance restarts. This is affecting API query times and realtime updates, including messaging. We are increasing the number of available servers to help tackle the issue.
Posted Jul 27, 2020 - 13:43 UTC
This incident affected: Jellyfish (API).