OVHcloud Network Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
FS#5162 — rbx-g1-a9
Incident Report for Network & Infrastructure
Resolved
The card n°6 restarted after a defect :

LC/0/6/CPU0:Feb 20 18:57:14 UTC: pfm_node_lc[227]: %PLATFORM-PFM-0-CARD_RESET_REQ : pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 139342 (prm_server), Fault Sev: 0, Target node: 0/6/CPU0, CompId: 0x1f, Device Handle: 0x1007005, CondID: 1001, Fault Reason: NP DOUBLE ECC ERROR, NP=5, memId=17, subMemId=0x1
LC/0/6/CPU0:Feb 20 18:57:14 UTC: sysmgr[87]: %OS-SYSMGR-2-REBOOT : reboot required, process (pfm_node_lc) reason (pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 139342 (prm_server), Fault Sev: 0, Target node: 0/6/CPU0, CompId: 0x1f, Device Handle: 0x1007005, CondID: 1001, Fault Reason: NP DOUBLE ECC ERROR, NP=5, memId=17, subMemId=0x1)

Update(s):

Date: 2011-02-21 03:43:12 UTC
Cards were reaced

Date: 2011-02-21 03:42:46 UTC
We will replace the card #6 of g1 by the card #4 of g2 on which we have ports not used or little traffic.

Date: 2011-02-20 19:59:33 UTC
RP/0/RSP1/CPU0:Feb 20 18:55:16 UTC: pfm_node_rp[282]:
%PLATFORM-DIAGS-3-PUNT_FABRIC_DATA_PATH_FAILED :
Set|online_diag_rsp[229493]|System Punt/Fabric/data
Path Test(0x2000004)|failure threshold is 3, (slot, NP) failed: (8, 5)
LC/0/6/CPU0:Feb 20 18:57:05 UTC: pfm_node_lc[227]:
%PLATFORM-DIAGS-0-LC_NP_LOOPBACK_FAILED :
Set|online_diag_lc[139345]|Line card NPU loopback Test(0x2000
006)|NP loopback failure count crossed threshold, $s
LC/0/6/CPU0:Feb 20 18:57:05 UTC: pfm_node_lc[227]:
prm_fast_reset_subset fast reset api succeeded for chan 5
LC/0/6/CPU0:Feb 20 18:57:05 UTC: pfm_node_lc[227]: NP loopback
recovery action: Succeded (NP bitmask:0x20)
LC/0/6/CPU0:Feb 20 18:57:10 UTC: ETHER_CTRL[155]:
%PLATFORM-ETHER_CTRL-3-IF_OPERATION_FAIL : sending MAC to PRM, port 1,
error code File exists
RP/0/RSP1/CPU0:Feb 20 18:57:12 UTC: BM-DISTRIB[1142]: %L2-BM-6-ACTIVE
: TenGigE0/6/0/1 is Active as part of Bundle-Ether26
LC/0/6/CPU0:Feb 20 18:57:14 UTC: pfm_node_lc[227]:
%PLATFORM-NP-0-HW_DOUBLE_ECC_ERROR : Set|prm_server[139342]|Network
Processor Unit(0x1007005)|NP DOUBLE ECC ERROR, NP=5,
memId=17, subMemId=0x1
LC/0/6/CPU0:Feb 20 18:57:14 UTC: pfm_node_lc[227]:
%PLATFORM-PFM-0-CARD_RESET_REQ : pfm_dev_sm_perform_recovery_action,
Card reset requested by: Process I D: 139342
(prm_server), Fault Sev: 0, Target node: 0/6/CPU0, CompId: 0x1f,
Device Handle: 0x1007005, CondID: 1001, Fault Reason: NP DOUBLE ECC
ERROR, NP=5 , memId=17, subMemId=0x1
LC/0/6/CPU0:Feb 20 18:57:14 UTC: sysmgr[87]: %OS-SYSMGR-2-REBOOT :
reboot required, process (pfm_node_lc) reason
(pfm_dev_sm_perform_recovery_action, Card reset
requested by: Process ID: 139342 (prm_server), Fault Sev: 0, Target
node: 0/6/CPU0, CompId: 0x1f, Device Handle: 0x1007005, CondID: 1001,
Fault Rea son: NP DOUBLE ECC ERROR, NP=5, memId=17,
subMemId=0x1)
LC/0/6/CPU0:Feb 20 18:57:14 UTC: sysmgr[87]: %OS-LIBSYSMGR-3-PARSE :
parse_args: parse error: unmatched \"
LC/0/6/CPU0:Feb 20 18:57:14 UTC: sysmgr[87]: %OS-SYSMGR-3-ERROR :
sysmgr_shutdown_cleanup_handler: shutdown script execution timed-out!
Node will reset
LC/0/6/CPU0:8:35: sysmgr[87]: %OS-SYSMGR-7-DEBUG :
sysmgr_shutdown_cleanup_handler: shutdown script execution timed-out!
Node will reset
LC/0/6/CPU0:8:35: syslog_dev[85]: pfm_node_lc[227]: Request Graceful
Reboot via Sysmgr: Reason: pfm_dev_sm_perform_recovery_action, Card
reset requested b y: Process ID: 139342 (prm_server),
Fault Sev: 0, Target node: 0/6/CPU0, CompId: 0x1f, Device Handle:
0x1007005, CondID: 1001, Fault Reason: NP DOUBLE ECC
ERROR, NP=5, memId=17, subMemId=0x1
LC/0/6/CPU0:Feb 20 18:57:14 UTC: sysmgr[87]: %OS-SYSMGR-3-ERROR :
sysmgr_shutdown_cleanup_handler: shutdown triggered by (pfm_node_lc)
did not complete in 45 seconds, shutting down
RP/0/RSP0/CPU0:Feb 20 18:57:35 UTC: pfm_node_rp[282]:
%PLATFORM-DIAGS-3-SRSP_STANDBY_EOBC_FAILED :
Set|online_diag_rsp[229493]|SRSP standby EOBC Test(0x20
00001)|failure threshold is 3, slot(s) failed: 8
RP/0/RSP1/CPU0:Feb 20 18:57:36 UTC: shelfmgr[314]:
%PLATFORM-SHELFMGR-3-NODE_CPU_RESET : Node 0/6/CPU0 CPU reset
detected.
RP/0/RSP1/CPU0:Feb 20 18:57:36 UTC: shelfmgr[314]:
%PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/6/CPU0 A9K-8T-L
state:BRINGDOWN
RP/0/RSP1/CPU0:Feb 20 18:57:36 UTC: invmgr[214]:
%PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/6/CPU0, state: BRINGDOWN
RP/0/RSP0/CPU0:Feb 20 18:57:38 UTC: pfm_node_rp[282]:
%PLATFORM-DIAGS-3-SRSP_ACTIVE_EOBC_FAILED :
Set|online_diag_rsp[229493]|SRSP active EOBC Test(0x2000
002)|failure threshold is 3, slot(s) failed: 8
RP/0/RSP0/CPU0:Feb 20 18:57:40 UTC: pfm_node_rp[282]:
%PLATFORM-DIAGS-3-SRSP_STANDBY_EOBC_FAILED :
Clear|online_diag_rsp[229493]|SRSP standby EOBC Test(0x
2000001)|failure threshold is 3, slot(s) failed: 8
RP/0/RSP1/CPU0:Feb 20 18:57:41 UTC: shelfmgr[314]:
%PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/6/CPU0 A9K-8T-L
state:ROMMON
RP/0/RSP0/CPU0:Feb 20 18:57:43 UTC: pfm_node_rp[282]:
%PLATFORM-DIAGS-3-SRSP_ACTIVE_EOBC_FAILED :
Clear|online_diag_rsp[229493]|SRSP active EOBC Test(0x20
00002)|failure threshold is 3, slot(s) failed: 8
RP/0/RSP0/CPU0:Feb 20 18:57:56 UTC: pfm_node_rp[282]:
%PLATFORM-DIAGS-3-PUNT_FABRIC_DATA_PATH_FAILED :
Clear|online_diag_rsp[229493]|System Punt/Fabric/da ta
Path Test(0x2000004)|failure threshold is 3, (slot, NP) failed: (8, 5)
RP/0/RSP1/CPU0:Feb 20 18:58:04 UTC: shelfmgr[314]:
%PLATFORM-SHELFMGR_HAL-6-BOOT_REQ_RECEIVED : Boot Request from
0/6/CPU0, RomMon Version: 1.3
RP/0/RSP1/CPU0:3w3d:11h:24:13: shelfmgr[314]:
%PLATFORM-MBIMGR-7-IMAGE_VALIDATED : Remote location 0/6/CPU0: : MBI
tftp:/disk0/asr9k-os-mbi-4.0.1/lc/mbias r9k-lc.vm
validated
RP/0/RSP1/CPU0:Feb 20 18:58:04 UTC: shelfmgr[314]:
%PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/6/CPU0 A9K-8T-L
state:MBI-BOOTING
RP/0/RSP1/CPU0:Feb 20 18:58:26 UTC: pfm_node_rp[282]:
%PLATFORM-DIAGS-3-PUNT_FABRIC_DATA_PATH_FAILED :
Clear|online_diag_rsp[229493]|System Punt/Fabric/data Path
Test(0x2000004)|failure threshold is 3, (slot, NP) failed: (8, 5)
RP/0/RSP1/CPU0:Feb 20 18:59:01 UTC: shelfmgr[314]:
%PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/6/CPU0 A9K-8T-L
state:MBI-RUNNING
LC/0/6/CPU0:15: init[65540]: %OS-INIT-7-MBI_STARTED : total time 8.832
seconds
LC/0/6/CPU0:18: insthelper[60]: %INSTALL-INSTHELPER-7-PKG_DOWNLOAD :
MBI running; starting software download
LC/0/6/CPU0:Feb 20 18:59:13 UTC: sysmgr[87]: %OS-SYSMGR-5-NOTICE :
Card is COLD started
LC/0/6/CPU0:29: init[65540]: %OS-INIT-7-INSTALL_READY : total time
22.858 seconds
Posted Feb 20, 2011 - 19:16 UTC