rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_red
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_red
icon_blue
icon_red
icon_orange
icon_red
icon_red
icon_red
icon_red
icon_red
icon_red
icon_red
icon_green
icon_red
icon_orange
icon_green
 

FS#346 — FS#3830 — Internal Routing Roubaix

Attached to Project— Network
Maintenance
the whole network
CLOSED
100%
In order to manage the traffic between our backbone routers on Roubaix (rbx-1-6k<>rbx-2-6k<>vss-1-6k<>vss-2-6k<>rbx-99-6k), we are establishing a new routing architecture. Switching to this new architecture would take place tonight starting from midnight.
This maintenance concerns the links Roubaix <> Brussels (bru-1-6k).

We are switching the links one by one which would not cause any impact on the traffic.
Date:  Saturday, 31 July 2010, 02:25AM
Reason for closing:  Done
Comment by OVH - Saturday, 31 July 2010, 01:45AM

Maintenances are not running well. We have the CRC between the routers. We returned to the initial setting. With more pains because of bugs:

rbx-99-6k#sh inter ten 9/1
[...]
30 second output rate 90000 bits/sec, 98 packets/sec
[...]

No way to pass the traffic.

rbx-99-6k#conf t
Enter configuration commands, one per line. End with CNTL/Z.
rbx-99-6k(config)#inter ten 9/1
rbx-99-6k(config-if)#shutdown
rbx-99-6k(config-if)#no shutdown
rbx-99-6k#sh inter ten 9/1
[...]
30 second output rate 2345596000 bits/sec, 384765 packets/sec
[...]

This is what we call a nice bug which wastes 2h at night.


Comment by OVH - Saturday, 31 July 2010, 01:47AM

We believe the CRC problems caused by non compatible optics (!?) between Cisco N5 and Cisco 6509 ...

We are retesting.


Comment by OVH - Saturday, 31 July 2010, 01:49AM

Lost.

We will return the links back as before and we will forward the bugs to Cisco ...


Comment by OVH - Saturday, 31 July 2010, 01:50AM

the problem is probably due to MTU which is XXXXX managed on N5
the XXXX to replace by "bad", "differently", etc


Comment by OVH - Saturday, 31 July 2010, 01:52AM

We modified the MTU configuration of the N5 switches and switched the link rbx-1<>rbx-2 above. The BGP session is actually stable. we are going to switch progressively the other links.


Comment by OVH - Saturday, 31 July 2010, 01:53AM

We are switching the links rbx-1<>vss-2 and rbx-2 <> vss-1


Comment by OVH - Saturday, 31 July 2010, 01:57AM

We located some problems on the link rbx-1<>vss-2 before even the start of the switching. We established a fiber temporarily and we expect a maintenance intervention so as to repair it once at all.

We are measuring an abnormal high attenuation on the links vss-2 <> rbx-99 that we would fix.


Comment by OVH - Saturday, 31 July 2010, 02:01AM

Defected links repairing will take place tonight from 23:00. Regarding the way we are improving in this part, we clutch on the switching the routing links on the new internal routing switches.


Comment by OVH - Saturday, 31 July 2010, 02:01AM

We are starting the maintenance.


Comment by OVH - Saturday, 31 July 2010, 02:04AM

Defected links are now repaired. We are beneficing to repair other defected links.


Comment by OVH - Saturday, 31 July 2010, 02:06AM

We reattempted switching the links 10G on the new infra but we are facing always difficulties. We are switching back to the old configuration unless rbx-1 <> rbx-2 which is the only link running correctly via this new infra.


Comment by OVH - Saturday, 31 July 2010, 02:08AM

Tonight, there will be tasks on the network Roubaix2. We are switching the traffic ss-1 <> vss-2 on a new infra nexus. in case of problem, we would return back immediately.


Comment by OVH - Saturday, 31 July 2010, 02:09AM

we are starting the switching operation.


Comment by OVH - Saturday, 31 July 2010, 02:09AM

The traffic is switched.


Comment by OVH - Saturday, 31 July 2010, 02:11AM

It is an MTU problem and a bug.

There is no problem between Nexus 5000 and 6509 standard and/or en SXF.
We are setting the MTU 9216 and that works properly.

Nexus 5000:
policy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
system qos
service-policy type network-qos jumbo


BOOTLDR: s72033_rp Software (s72033_rp-IPSERVICESK9-M), Version 12.2(18)SXF16, RELEASE SOFTWARE (fc2)
interface Port-channelXXX
mtu 9216

The bug exists between Nexus 5000 and VSS in SXI.
Cisco IOS Software, s72033_rp Software (s72033_rp-ADVIPSERVICESK9-M), Version 12.2(33)SXI3, RELEASE SOFTWARE (fc2)
2 bits are missing.
with

interface Port-channelXXX
mtu 9216

there is CRC on the interfaces
with

interface Port-channelXXX
mtu 9214

No more problems.

We have noticed it on the weft's height in BGP sessions.
Datagrams (max data segment is 9214 bytes):

# ping ip XXXX size 9216 df-bit

Type escape sequence to abort.
Sending 5, 9216-byte ICMP Echos to XXXX, timeout is 2 seconds:
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)

-> that's OK from 9214:

#ping ip XXXX size 9214 df-bit

Type escape sequence to abort.
Sending 5, 9214-byte ICMP Echos to XXXX, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/52/204 ms

We are going to finalise the internal routing infrastructure with this "workaround" then report the bug to Cisco ...


Comment by OVH - Saturday, 31 July 2010, 02:20AM

We are pursuing the tasks tonight hoping that dealing with MTU allows to fix the problem once at all and to switch totally on the new infra.


Comment by OVH - Saturday, 31 July 2010, 02:20AM

We are starting the tasks.


Comment by OVH - Saturday, 31 July 2010, 02:21AM

We are switching the traffic on the new links sw.int-1 <> vss-1/2 and rbx-99


Comment by OVH - Saturday, 31 July 2010, 02:23AM

The switching is accomplished. Remains a defected link (rbx-1<>sw.int-1) passed tonight in interim. that would be fixed by tomorrow.


Comment by OVH - Saturday, 31 July 2010, 02:25AM

MTU problem is resolved with the passing of nexus 5000 to nexus 7000:

http://status.ovh.co.uk/?do=details&id=345