Post Incident Report: Sydney 05/12
The following report has been prepared for affected customers. VentraIP Australia sincerely apologises for any inconvenience caused.
At approximately 6:30AM AEDT on Wednesday December 5 our internal and external monitoring alerted our technical team to an issue with our Sydney PoP. It reported all services in Sydney as being critical and some services to Brisbane critical, and within minutes our CTO was on the phone to our fibre provider in Sydney and asked if there were any known issues and was told there were none.
We immediately dispatched our on-site technician and within three minutes he was inside the data centre performing preliminary investigations. In consultation with our CTO, all elements of our network infrastructure were checked and re-checked. All of our equipment was operating without error.
Our CTO again phoned our fibre provider and after a lengthy amount of time on hold he was advised that there was an issue with a fibre cut somewhere in Sydney and that an ETR was unknown.
Just after 7:30AM our CEO immediately began communicating with our Account Manager as an outage of this nature should have been impossible due to the redundant infrastructure we believed we were connected to. He also began communicating with another provider in the same facility to coordinate a contingency plan should a satisfactory outcome not be reached with our primary provider within a reasonable time frame.
During this time our provider was preparing to locate the break in the fibre using OTDR traces from the Sydney CBD and North Sydney. No further updates were provided at this time.
At 9:39AM our internal and external monitoring detected that connectivity to our Sydney PoP had returned. Our CTO performed precautionary checks on network infrastructure to ensure that full routing tables had returned and that all links were active. At this time customers began reporting that services had been restored.
We received an email from our fibre provider at 9:48AM informing us that the on-site facilities manager had successfully migrated our fibre circuits to local PoP infrastructure that operates on a protected circuit and that all of our services had been restored.
The root cause has been determined to be a cut in a fibre cable in the Mosman, North Sydney area. Our fibre provider believes that the cable had been deliberately cut and has referred the matter to the NSW police for investigation, as there have been reports of similar incidents in the same area affecting other carriers.
Even though our services returned at 9:39AM, service was not restored to this cable until 7:58PM once splicing work had been completed.
Since this incident, the same cable has been cut again on two separate occasions in similar areas, however these cuts did not affect our services in any way.
Post Incident Follow-Up
In November 2011 during an incident relating to a software error on a core network device, belonging to our provider, we discovered that our services from the Sydney CBD to our facility in Cromer were not connected to protected infrastructure. Prior to entering this facility we were informed we were connected to and contracted to a service which included a redundant secondary fibre path. At this time we were told that a local PoP build was delayed due to other projects and that ‘within a few months’ it would be complete and our services would be migrated.
Management staff from our provider have profusely apologised to us for this migration not taking place and confirmed that we should have been contacted many months ago to have this migration performed, however it was simply overlooked. We understand that things like this happen and it was an unintended mistake.
The changes which have now resulted from this incident mean that we are now connected to the local PoP infrastructure that is protected by multiple fibre paths. These paths have been tested over the past seven days where two further fibre cuts have occurred; all of our services remained online without incident.
We have been informed that we will be required to schedule a time in the coming weeks to perform some additional work to tidy up the changes that were made as a result of this incident to reduce cable lengths and take more optimum paths. We will schedule an outage for this work to be completed in the early hours of the morning to avoid disruption.
As part of Project Re-Energise that was announced by our CEO last week, our technical team will be reviewing our network infrastructure in all locations and performing evaluations based on future requirements. It is our intention to begin operating from a second facility in Sydney in 2013 and this will add an additional layer of redundancy to the Cromer facility as we will be taking a fibre cross connect to this facility from an alternate provider.