OVHcloud Network Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
FS#7462 — 188.165.13/24 188.165.14/24 188.165.15/24 178.33.122/24
Scheduled Maintenance Report for Network & Infrastructure
Completed
Following the update of switch N5 we found a BUG in the newest version that makes sometimes the APR disappear in the network.

We are urgently downgrading to the less newer version.

Update(s):

Date: 2012-10-13 03:43:40 UTC
The intervention is completed. All ports are UP and all HG are up in the monitoring.

The origin of the problem:
2 days ago we updated the software on some HG switches. tonight ,suddenly the switch said \"servers' ports are down.\"
we first downgranded the software version from 5.2.1b to 5.2.1 because we had yesterday the first signals that b has problems.
finally we had to downgrade it to 5.1.3 and only then all problems has gone.

This is an unusual problem due to software bugs in network equipment that we are using. it is rare, very rare, but it happens.

We are sorry for the trouble.

Affected customers will have the right to 1 free month since the SLA has largely exploded.


Date: 2012-10-13 03:13:54 UTC
10/13/2012 04:47:14.816521: Module register received
10/13/2012 04:47:14.818478: Registration response sent
10/13/2012 04:47:15.401136: Module Online Sequence
10/13/2012 04:47:19.281549: Module Online


FEX is up. ports are UP.

Date: 2012-10-13 03:12:26 UTC
10/13/2012 04:45:59.702382: Image preload successful.
10/13/2012 04:46:00.822397: Deleting route to FEX
10/13/2012 04:46:00.831361: Module disconnected
10/13/2012 04:46:00.833211: Module Offline
10/13/2012 04:46:00.839272: Deleting route to FEX
10/13/2012 04:46:00.847072: Module disconnected
10/13/2012 04:46:00.890047: Offlining Module
10/13/2012 04:46:00.892061: Deleting route to FEX
10/13/2012 04:46:00.899818: Module disconnected
10/13/2012 04:46:00.963837: Offlining Module


Date: 2012-10-13 02:45:54 UTC
FEX update image 5.1.3
Logs:
10/13/2012 04:41:46.636029: Module register received
10/13/2012 04:41:46.637450: Image Version Mismatch
10/13/2012 04:41:46.638126: Registration response sent
10/13/2012 04:41:46.638647: Requesting satellite to download image

Date: 2012-10-13 02:45:07 UTC
Configuration is applied.Everything is up.

We will set the FEX 105 properly, which was replaced by the spare.
These servers will be down for more 10 minutes.

Date: 2012-10-13 02:29:18 UTC
The ports configuration is lost. We will re-apply it.

Date: 2012-10-13 02:28:08 UTC
In 5 minutes we will move the FEX spare on the FEX which is in the rack and which is correct.

Date: 2012-10-13 02:19:55 UTC
everything is UP.

CONCLUSION:
The version 5.2.1X is RACTIOACTIVE!!

Date: 2012-10-13 02:18:19 UTC
FEX are booting. ports are UP FINALLY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Date: 2012-10-13 02:17:32 UTC
FEX update

Logs:
10/13/2012 04:05:19.324425: Module register received
10/13/2012 04:05:19.325823: Image Version Mismatch
10/13/2012 04:05:19.326266: Registration response sent
10/13/2012 04:05:19.326737: Requesting satellite to download image

Date: 2012-10-13 02:17:09 UTC
The N5 has booted.The configuration started.

Then the FEX will start booting and will have to be updated, it usually takes 10min by FEX, it is done simultaneously.

Date: 2012-10-13 02:12:13 UTC
Images are on N5.

We are rebooting all.

Date: 2012-10-13 02:11:44 UTC
We will upload an older software version. We will move from 5.2.1.N1.1b.bin to 5.2.1.N1.1.bin and then switch to 5.1.3.N2.1.bin

We need 5 minutes to put the images on the two N5 and we will update it fastly, then we'll reboot everything on hard with power cut-off for switchs and fex.

Date: 2012-10-13 02:05:00 UTC
The fex spare starts.It's the same.

So now ..

Date: 2012-10-13 01:45:10 UTC
The FEX is electrically cut is up but the ports are still down.

We will wait for the spare to start.

Date: 2012-10-13 01:44:11 UTC
FEX is replaced physically with a new one and cut electrically.

Date: 2012-10-13 01:43:31 UTC
Replacing a cable. It's the same.

Date: 2012-10-13 01:43:02 UTC
It's the same. The ports are down

Date: 2012-10-13 01:02:40 UTC
sw-n5-13.248# reload
WARNING: This command will reboot the system
Do you want to continue? (y/n) [n] y


Date: 2012-10-13 01:02:26 UTC
We restarted the fex

same.

We rebooted the system.

Date: 2012-10-13 01:01:50 UTC
Many ports are on the status inactive :

Eth100/1/1 server inactive 589 full 10G --
Eth100/1/2 server inactive 589 full 10G --
Eth100/1/3 server inactive 589 full 10G --
Eth100/1/4 server notconnec 589 full 10G --
Eth100/1/5 server notconnec 589 full 10G --
Eth100/1/6 server inactive 589 full 10G --
Eth100/1/7 server inactive 589 full 10G --
Eth100/1/8 server inactive 589 full 10G --
Eth100/1/9 server inactive 589 full 10G --
Eth100/1/10 server inactive 589 full 10G --
Eth100/1/11 server sfpAbsent 588 full 10G --
Eth100/1/12 server inactive 589 full 10G --
Eth100/1/13 server inactive 589 full 10G --
Eth100/1/14 server inactive 589 full 10G --
Eth100/1/15 server inactive 589 full 10G --
Eth100/1/16 server inactive 589 full 10G --
Eth100/1/17 server inactive 589 full 10G --
Eth100/1/18 server inactive 589 full 10G --
Eth100/1/19 server connected trunk full 10G --
Eth100/1/20 server sfpAbsent trunk full 10G --
Eth100/1/21 server notconnec 588 full 10G --
Eth100/1/22 server connected 588 full 10G --
Eth100/1/23 server inactive 589 full 10G --
Eth100/1/24 server connected trunk full 10G --
Eth100/1/25 server inactive 589 full 10G --
Eth100/1/26 server inactive 589 full 10G --
Eth100/1/27 server inactive 589 full 10G --
Eth100/1/28 server inactive 589 full 10G --
Eth100/1/29 server sfpAbsent 588 full 10G --
Eth100/1/30 server inactive 589 full 10G --
Eth100/1/31 server sfpAbsent 588

Date: 2012-10-13 01:01:07 UTC
The problem appears only on 188.165.13/24 but we will downgrande everything we upgraded 2 days earlier.
Posted Oct 13, 2012 - 00:14 UTC