Post incident report for service disruption on 28/02
PostedFriday February 28th 2014
Today we experienced a service disruption to all services hosted from our Sydney data centre, and partial disruption to services hosted from our Brisbane data centre, for approximately 20 minutes between 11:18AM and 11:38AM AEDST.
The service disruption was caused by a power surge and brief loss of power to the primary feed due to the primary UPS bank overheating after an earlier air conditioning failure. Once on site, data centre technicians manually transferred supply direct to mains power which restored power to the primary feed.
Even though the majority of our infrastructure is supplied power from both the primary and secondary power feeds, it appears that some of our provider’s networking equipment is not which caused the complete loss of network services to the facility. The majority of our cPanel shared hosting and VPS customers who had already been migrated to our new Dell platform did not experience a loss of power and therefore the system up time would remain.
When power was returned to the primary feed just after 11:30AM AEDST, our provider was able to restore network connectivity and our team was able to perform a post incident assessment. At this time, approximately 95% of services in Sydney returned to normal operation, however there were some minor issues with legacy hardware that is still powered only by a single feed.
This is the third issue that we have had with power in this facility, and we have made it clear that as soon as we were able we would be relocating all services out of this data centre in to the new NextDC S1 enterprise data centre. This move has been in planning for several months and is already scheduled to take place on the evening of Saturday, March 22. A status event is scheduled to be posted next Monday, with a follow up email to all customers shortly after.
The move to NextDC S1 will rectify a number of key issues we have with the current facility, including power availability and network redundancy.
Every device will be powered by dual power, and any single corded legacy devices will be powered through an ATS (Automatic Transfer Switch) which automatically detects a loss of power in one feed and switches it to another without disruption. Our network will also no longer be reliant on a single provider to deliver multiple services, and will utilise a number of providers and diverse network paths for complete redundancy.
The new network in NextDC S1 will be provided by Synergy Wholesale, which features a brand new Brocade-powered network with dual redundant connectivity from a tier 1 transit provider, a tier 2 transit provider, and gigabits of peering from multiple internet exchanges in NSW.
These changes will ensure that an issue of this nature is unable to happen again without some extraordinary circumstances taking place beforehand.
Providing a high quality service to all of our customers is paramount to our business and our Management team, and we are satisfied that work that has already been completed and changes that are due to be made in three weeks time will prevent issues of this nature from happening in the future.