Quantcast
Channel: All SRX Services Gateway posts
Viewing all articles
Browse latest Browse all 17645

SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

$
0
0

I have a branch office with a cluster of SRX220H2s that recently started exhibiting flapping issues with the secondary node in the cluster.  Every 5-10 minutes, the secondary node will be kicked out of the cluster, then added several minutes later, before starting the cycle over.  We've tried hard booting the secondary node to see if it would join and stick in the cluster, but it doesn't seem to help.

 

Additionally, I've noticed that the control-plane cpu on the primary node is consistently at 100%, with the jsrpd process consuming an awful amount of resources.  We have a number of essentially identical branch clusters elsewhere, none of which have jsrpd consuming high resources.  I know that that process is involved with the cluster process, in terms of messaging.  Checking the jsrpd logs, I'm seeing something very unusual:

 

 

May 14 16:55:04 TCP-S: accepted client connection.
May 14 16:55:04 TCP-S: TCP client from 130.16.0.1/56547 connected
May 14 16:55:04 TCP-S: TCP peer closed connection
May 14 16:55:04 last message repeated 100 times (hit threshold of (100))
May 14 16:55:04 last message repeated 200 times (hit threshold of (200))
May 14 16:55:04 last message repeated 300 times (hit threshold of (300))
May 14 16:55:04 last message repeated 400 times (hit threshold of (400))
May 14 16:55:04 last message repeated 500 times (hit threshold of (500))
May 14 16:55:04 last message repeated 600 times (hit threshold of (600))
May 14 16:55:05 last message repeated 700 times (hit threshold of (700))
May 14 16:55:05 last message repeated 800 times (hit threshold of (800))

Here's the system process extensive command output:

 

 

show system processes extensive
node0:
--------------------------------------------------------------------------
last pid: 47616;  load averages:  1.28,  1.26,  1.42  up 431+22:43:27    16:59:15
140 processes: 19 running, 108 sleeping, 2 zombie, 11 waiting

Mem: 210M Active, 149M Inact, 1036M Wired, 145M Cache, 112M Buf, 432M Free
Swap:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
 1403 root        5  76    0   996M 58812K RUN    0    ??? 102.20% flowd_octeon_hm
 1406 root        1 139    0 14096K  7032K RUN    0 727.7H 76.66% jsrpd
   22 root        1 171   52     0K    16K RUN    0 7574.2  0.00% idle: cpu0
   23 root        1 -20 -139     0K    16K RUN    0 118.8H  0.00% swi7: clock
    5 root        1 -16    0     0K    16K rtfifo 0  42.7H  0.00% rtfifo_kern_recv
   25 root        1 -40 -159     0K    16K WAIT   0  40.4H  0.00% swi2: netisr 0
 1413 root        1  76    0 12452K  5768K select 0  33.9H  0.00% license-check

show chasis cluster interfaces:

Control link status: Up

Control interfaces:
    Index   Interface        Status   Internal-SA
    0       fxp1             Up       Disabled

Fabric link status: Up

Fabric interfaces:
    Name    Child-interface    Status
                               (Physical/Monitored)
    fab0    ge-0/0/5           Up   / Up
    fab0
    fab1    ge-3/0/5           Up   / Up
    fab1

Redundant-ethernet Information:
    Name         Status      Redundancy-group
    reth0        Up          1
    reth1        Up          1
    reth2        Up          1

Redundant-pseudo-interface Information:
    Name         Status      Redundancy-group
    lo0          Up          0

Interface Monitoring:
    Interface         Weight    Status    Redundancy-group
    ge-3/0/0          255       Down      1
    ge-0/0/0          255       Up        1

{primary:node0}

last 100 of show log chassisd

show log chassisd | last 100
May 14 16:39:58 SCC: pseudo_create_devs_swfab: Skipping creation of swfab1, since fabric presence is set to true
May 14 16:39:58 SCC: lcc_detach_interfaces_not_online lcc 1
May 14 16:39:58 CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(3)
May 14 16:39:58 CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(4)
May 14 16:39:58 CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(5)
May 14 16:40:06 SCC: pfpc ready fpc 3 i2c 1897
May 14 16:40:06 SCC: fpc 3 clean, bringing online
May 14 16:40:06 SCC: lcc_send_fpc_online_cmd_generic:  lcc 1 fpc 0
May 14 16:40:06 SCC: pic_online_req for fpc 3, pic 0  lcc_slot 1 in lcc_recv_pic_online_req
May 14 16:40:06 SCC: lcc_send_pic_online_ack: On Switch-chassis: fpc 3 pic 0 pic_type 0x669 msg_len 20 tlv_len 0
May 14 16:40:06 SCC: From SCC send: fru 13361152 lcc_slot 1 online ack to LCC
May 14 16:40:06 SCC: From Switch-Chassis send: fpc 3 pic 0 online ack to LCC
May 14 16:40:08 SCC: lcc_recv_pic_attach: pic attach pic 0, flags 0x0, portcount 8, fpc 3
May 14 16:40:08 SCC: pic_set_online: i2c 0x669 pic 0 fpc 3 state 5 in_issu 0
May 14 16:40:08 SCC:  pic_type=1641 pic_slot=0 fpc_slot=3 pic_i2c_id=1641

May 14 16:40:08 SCC: fpc slot 3 pic_present 0x0 => 0x1
May 14 16:40:08 SCC: FPC 3 PIC 0, attaching clean
May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 0

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 0
May 14 16:40:08 SCC: Created pic for ge-3/0/0

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 1

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 1
May 14 16:40:08 SCC: Created pic for ge-3/0/1

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 2

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 2
May 14 16:40:08 SCC: Created pic for ge-3/0/2

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 3

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 3
May 14 16:40:08 SCC: Created pic for ge-3/0/3

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 4

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 4
May 14 16:40:08 SCC: Created pic for ge-3/0/4

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 5

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 5
May 14 16:40:08 SCC: Created pic for ge-3/0/5

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 6

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 6
May 14 16:40:08 SCC: Created pic for ge-3/0/6

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 7

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 7
May 14 16:40:08 SCC: Created pic for ge-3/0/7

May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/0
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/0
May 14 16:40:08 SCC: ge-3/0/0: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/1
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/1
May 14 16:40:08 SCC: ge-3/0/1: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/2
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/2
May 14 16:40:08 SCC: ge-3/0/2: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/3
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/3
May 14 16:40:08 SCC: ge-3/0/3: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/4
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/4
May 14 16:40:08 SCC: ge-3/0/4: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/5
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/5
May 14 16:40:08 SCC: ge-3/0/5: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/6
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/6
May 14 16:40:08 SCC: ge-3/0/6: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/7
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/7
May 14 16:40:08 SCC: ge-3/0/7: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 SCC: PIC (fpc 3 pic 0) message operation: add. ifd count 8, flags 0x3 in mesg
May 14 16:40:08 LCC: ignoring PIC message on LCC

For the moment, I've disabled the ports on the switch for the second node (node1) that keeps flapping, just so I don't keep seeing it go on and off, but can renable if needed. 

Any thoughts are appreciated!

 


Viewing all articles
Browse latest Browse all 17645

Trending Articles