I'm busy setting up our new SRX345 firewalls and in honesty it has been a complete nightmare! I finally managed to get the two clustered over our layer2 network with no errors, (by factory reset and doing the exact same config again, step by step). At that point both control and dual fabric links connected and all the subnets serviced via vlans on LACP reth0. everything appeared to be working properly.
The problems are now with failover and fail back.
When I issues shutdown to the port channel on the cisco that node0 connects to it fails over nearly immediately accordiging to 'show log messages' but the continuous ping to the vlan20 interface was lost for between 30 seconds and 10 minutes - normally around 6 minutes. I had preempt set and failback trggered by entering no shutdown on the cisco port channel was considerably quicker only losing pings for about 30-60 seconds.
I still have no idea why failover is taking such a long time and currently no idea on how to start diagnosing it but I now have a worse problem. I collegue suggested I try a more realisting failover and simulate a power cut to node0. This took only a minute to failover to node1 but on restoring the power node0 claims it cannot connect to node1. The cisco reports that LACP is no enabled on the reth and the control and fabric ports do not appear to have initialized - the cisco is behaving as if the ports are all connected to a hub.
In addition node0 is very slow to respond to the CLI on rollover cable and reports the following in the console:-
Message from syslogd@FW01 at Nov 7 17:29:37 ...
FW01 SCHED: Thread 4 (Module Init) ran for 1045 ms without yielding
Message from syslogd@FW01 at Nov 7 17:29:37 ...
FW01 Scheduler Oinker
Message from syslogd@FW01 at Nov 7 17:29:37 ...
FW01 Frame 00: sp = 0x510a68c8, pc = 0x182204e8
Message from syslogd@FW01 at Nov 7 17:29:37 ...
FW01 Frame 01: sp = 0x510a6970, pc = 0x182082e4
'Show interface terse' does not list the physical interfaces on node0
Anyone know what's going on? or how to fix it?