This morning we found that we had introduced a problem in the release last night that affected the stability of the platform. This was the reason for the downtime early today, which lasted for a little over an hour.
We also had downtime for about 20 minutes at 2 PM, PST. The source of this was related, but different, than the problem we experienced earlier in the morning.
We do know that what is creating instability is part of the distributed caching infrastructure we use to speed up data access operations. However, we don’t know exactly why this is happening, although we’re working feverishly to find the source of the problem and fix it.
As part of this work, we are now taking a 30 minute planned downtime to restart some components and remove new systems we added earlier today.
If the problem happens again later today, we may have to roll back the release done last night to make sure that some subtle bug isn’t responsible for this instability.
This problem is affecting runtime, not storage, so the data on your Ning Networks remains safe.
No related posts.