We took off the defected car of the router. This provoked the reboot of the 7 other cards :
0/2/CPU0 A9K-8T-L MBI-BOOTING PWR,NSHUT,MON
0/3/CPU0 A9K-8T-L MBI-BOOTING PWR,NSHUT,MON
0/4/CPU0 A9K-8T-L MBI-BOOTING PWR,NSHUT,MON
0/5/CPU0 A9K-8T-L MBI-BOOTING PWR,NSHUT,MON
0/6/CPU0 A9K-8T-L MBI-BOOTING PWR,NSHUT,MON
0/7/CPU0 A9K-8T-L MBI-BOOTING PWR,NSHUT,MON
All traffic routed by rbx-g1-a9, one of our core routers on roubaix was impacted between 12:55 12:35 ET approximately. One of the new cards 24x10G that we inserted last night (http://status.ovh.co.uk/?do=details&id=2272) was found defective while wa was activating the new ports.
Sequence of events during the outage:
- The traffic through the router started decreasing (important packets loss)
- New ports were immediately taken off,but the problem persisted
- card 0 was removed of the chassis,no more packet loss, but all other cards rebooted 8T-l (not the other 24x10GE). The router loses instantly 48x10G of its capacity. The routing is now largely provided by the rbx-g2-a9.
- However,the traffic is impacted again, this time because several links were saturated and "side effects" caused by the loss of all these links on the other routers
- Cards reboot,but on this kind of equipment,the linecards take long minutes to get back operational.
- Finally, we set the card 24x10GE back after causing failure by 8T-L and we will set the uplinks on this card. The router is back to its normal status,after 20 minutes.
We are working currently with Cisco in order to identify the origin of the problem and replace of the defective card as soon as we can.
We wait the spare card that should arrive this week.
they are very very new cards and the stock of spare
is not yet in place at Cisco.
Powered by Flyspray