M1: downtime could have lasted weeks if not for contingency plans

January 22nd, 2013 | by Alfred Siew

M1 logo

M1 has responded to criticisms of its 3G disruption last week by saying that its contingency plans had kicked in to cut the downtime from a possible 12 to 16 weeks to less than three days.

The telecom operator yesterday detailed how the fault was created in the early hours of January 15, and claimed that the network was fully restored by 6pm on January 17.

It is keen to prove its case because an upcoming investigation by the government regulator could well determine if it would face a heavy penalty for one of the country’s most serious outages.

In the early hours of January 15, an M1 vendor upgrading the transmission network had created sparks and smoke in its network centre while connecting a power cable to a rack of network equipment.

This set off one of the 88 water sprinklers there as well as a gas that is used to suppress fires. The water damaged one of the network switches that caused all the trouble for the telco and thousands of users islandwide, M1 explained yesterday evening in a press statement.

It said it had two contingency plans that kicked into action. It diverted traffic from the damaged equipment and increased the capacity on its important core network, which handled all its traffic.

Separately, the telco said it had to reconfigure 416 base stations, by connecting to the alternative network switch and making sure the traffic is balanced across the new equipment. This was followed by drive-by tests.

The incident had happened as M1 was creating a “network resiliency” plan that began in 2011 and was to complete by mid-2013, it revealed yesterday.

Understandably, users are still angry a week after the incident, despite an apology by the telco and an offer of free calls, messages and Internet surfing for three days.

Questions remain too about M1’s explanation. For example, why was there so little backup for such a critical mobile network switch and what was its solution if more than one had broken down or been damaged? Is this the standard that other telcos have adopted?

These are likely to be among the questions to be asked by the Infocomm Development Authority. It last week confirmed that it would be investigating this case and urged M1 to address customer concerns actively.

The regulator had fined M1 S$300,000 in 2011 for another network disruption, while handing out a S$400,000 penalty to SingTel for a more serious incident in 2012.

Advertisements

4 Comments

  1. Shame on you M1. You explanation does not makes sense at all. I suspect you are telling half truth to your subscribers.

  2. In a data center environment normally gas would be activated first before before water sprinklers Should the fire become uncontrollable the heat would have set the sprinkler pipe to burst to further suppress the damage. M1 should explain why both gas and water sprinklers are configured in such a manner? They should not be activated at the same time. M1 should also do equipment running test before plugging them live into the production area. It seems to suggest to me that M1 data center does not follow proper best practice.

  3. jk says:

    LTE network is still not up!! How can we say fully restored?

  4. Boh gas suppression says:

    If all 88 sprinklers were set off, will there be close to a year of “possible” downtime? But then again, why use water sprinklers instead of fire suppression gas in a data center?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.