Cisco active/standby FWSM pair went active/active after switch upgrade

Published on April 13 2014 by dellpe

Question:

PROBLEM:
Both FWSM in our core switches went active/active and protected subnets have partial accessibility.

SETUP:

We have 2 x Cisco Catalyst 6509 switches running IOS version 12.2(18).
Each switch has a FWSM internal firewall module.
Both are multi-context split into 2 groups.
One FWSM should be active for both groups while the other FWSM should be standby.

DETAILS:
We upgraded our 6509s to IOS version 12.2(33). This bounced each switch. After the upgrade we noticed that both FWSM were active (not a good thing). We ended up reverting the IOS on both 6509s, but both FWSM remained active.

Before the upgrade our busiest context was at 97% memory utilization measured by the message received after modifiying the rule set on the command line. Now the rules won't compile fully and we're left with a partial config.

show memory shows 67% free memory out of 1gig of RAM.
show resource acl shows 12 partitions and the 97% memory utilized context has only 6,053 out of a max of 14,173 rules in use.
show failover shows each FWSM thinks it is the secondary and its mate is unknown.

We have one of the FWSMs powered off and are trying to get the other one up and stable with a good config, but it still refuses to compile the rule set.

THOUGHTS?
Can somebody explain what the 97% memory utilization really represents? If this is a resource issue it doesn't appear to be rule related based on the show resource acl output, but the 97% memory utilized messages make me wonder. What next steps should I take at this point?

Answer:

THE PROBLEM:
It turns out we ran into a node limit on the FWSM. Evidently you can be within your rule limit but hit the node ceiling. This doc https://supportforums.cisco.com/docs/DOC-8786 details the compilation memory exhaustion issue that hit us. Quoting:

This command show np 3 acl stats in the context in question will show if the total nodes is reached. This limit may be reached even before the ACL limit is reached. Each ACE may take a minimum of 2 nodes to a maximum up to to 5 nodes depending on where the ACL is being called. The ACL that is tied to MPF (modular policy framework) may take up more nodes than the ACL that is tied to a NAT or to the access-group. There is no way to calculate the number of nodes. The best way to monitor this is to regularly look at the above output to make sure the node count is not exceeded.

RESOLUTION:
1. Power down the secondary FWSM and power on the primary.
2. Revise our configuration by examining which rules could be removed (stale rules and recent changes allowing certain Development environments access to Production) in order to bring down the rule and therefore node count.
3. Ensure the new config compiles then reset the primary FWSM.
4. Verify the FWSM comes up clean with no errors.
5. Power down the primary FWSM and power up the secondary FWSM.
6. Repeat steps 3 and 4 with the secondary FWSM then power it down.
7. Power on the primary, ensure it's clean, power on the secondary.
8. Ensure the primary's config is copied to the secondary and that the primary is Active and sees the secondary as Standby.

MITIGATION:
1. We upgraded our development environment (2 x 6509 with Active/Standby FWSM) to FWSM version 4.1(13) from 3.2(23). The upgrade to the 4.x train increased the maximum node count for us from 28,356 to 38,439. More information:http://www.cisco.com/en/US/prod/collateral/modules/ps2706/product_bulletin_c25-478751.html.
2. We will upgrade our Production FWSM during the next change window.
3. We implemented a Kiwi CatTools job to run the show np 3 acl stats command in our contexts and e-mail us a daily report. Whenever we make a rule change we also run this command to ensure we're within node and rule limits.

It's interesting behavior that the FWSM would allow us to continue adding rules. It only failed when it was power cycled and tried to compile the rule set. Lessons learned!

3Anetwork.com is a world leading Cisco distributor, offering Cisco switches, Cisco routers, Cisco SFP, Cisco modules and others.