Thursday, August 5, 2021

Cisco 6880X - Part 4 (CSCux07070)

It's the year 2020 and the month of April -- still paying for the mistake I made 7 years ago when I decided to try out Cisco's 6880 platform. One of those nightmares that you can't seem to snap out of no matter how hard you try. I want to say a lot more s*** about it but there is no point, we have less than ten left that need to be thrown out the window. One of the "thing" (I don't even feel like calling it a switch anymore) started rebooting randomly -- 7 reboots in about 12 hour period. Our team reached out to Cisco TAC - TAC came to the conclusion that both power supplies need to be RMA'd (they couldn't find any SEA log files on the system). How that was determined is beyond me -- since I wasn't on the TAC call, I didn't have a chance to ask why in the world would both ps's need to be replaced. I any case, knowing 6880 platform the way I do, I just didn't buy the power supply story, so  I went looking -- nothing fancy, just went through the system logs and noticed this right before every reboot:

5/-1: MAJ, DIAG_BU, online_diag_flush_pak_queue: flushed 1 packets when testing [1/-1/8]
1/-1: MAJ, DIAG_BU, online_diag_flush_pak_queue: flushed 1 packets when testing [1/-1/21]
1/-1: MAJ, DIAG_BU, online_diag_flush_pak_queue: flushed 1 packets when testing [1/-1/9]
1/-1: MAJ, DIAG_BU, online_diag_flush_pak_queue: flushed 1 packets when testing [1/-1/1]
5/-1: MAJ, GOLD, diag_publish_result[5/-1]: cpu limit hit, skip publishing result, test_id[38], testing_type[4]
5/-1: MAJ, GOLD, diag_publish_result[5/-1]: cpu limit hit, skip publishing result, test_id[39], testing_ty

show mod:
--- ----- -------------------------------------- ------------------ -----------
  1   20  DCEF-X 16P SFP+ Multi-Rate             C6880-X-16P10G     
  5   20  6880-X 16P SFP+ Multi-Rate (Active)    C6880-X-SUP        

Just looking at those logs and a quick google search landed me here:

High CPU due to Interrupt on C6880-X-16P10G
CSCux07070




Friday, September 20, 2019

Cisco 9504 %SYSMGR-2-CFGWRITE_ABORTED: Configuration copy aborted.


%SYSMGR-2-CFGWRITE_ABORTED: Configuration copy aborted.
%SYSMGR-3-CFGWRITE_FAILED: Configuration copy failed (error-id 0x401E0000).
%SYSMGR-2-CFGWRITE_ABORTED_CONFELEMENT_RETRIES: Copy R S failed as config-failure retries are ongoing. Type "show nxapi retries" for checking the ongoing retries.
%SYSMGR-3-CFGWRITE_SRVFAILED: Service "confelem" failed to store its configuration (error-id 0x00000079).

Ran into this issue a couple of weeks ago while working on replacing some old 7K's with the newer 9500's running NXOS 7.0.3.I7.6. We had a few hundred lines of config and decided to copy/paste, everything went fine -- or so we thought until we tried saving the config.

Issue was related to missing/incomplete config. Neither I nor my coworker noticed any error messages when pasting the config but when you try to save the config, you are greeted with the error messages. Issue was resolved by the tac engineer after he noticed that we were missing the hsrp config under the svi's (only had the command "hsrp version 2" but missing the hsrp group and the ip). It would have been nice if the error message was a little more descriptive or at least offered a hint towards the actual issue. 
"show nxapi retries" didn't return any info either.

Tuesday, September 17, 2019

Cisco Firepower - FireSight- FTD/FDM/FMC

I've been meaning to write a few things about Cisco FireSight/Firepower/FMC/FDM/FTD (please feel free to share acronyms for this product that I might have missed) for a while now but decided against it- until now. And if you are a Cisco employee working on the firepower product or just a hardcore Cisco security lover, this post will probably not sit well with you. With that being said, my intention is not to bash the Firepower team but rather provide constructive criticism -- or something of that nature :) I'm not going to go into any technical details as that would take a long time since I like to have all the evidence when incriminating someone or something - And since this is not a paid post, I will keep it simple.  

My first experience with Firepower was on an ASA using the CX modules around 2015 (FireSight/FMC 5.4.x) and after a few hours of use, I had a list of things that I thought required immediate attention (I've done a few beta tests for this product over the past few years as well):

1. Antiquated interface -- reminds me of the 1990's web interfaces.
2. Dashboard widgets took forever to load.
3. Excruciatingly slow when applying "deploy" changes. Even a minor change took several minutes to deploy. The initial logic for the deployment never made sense to me, it required snort engine services to be stopped, traffic was dropped, remove & reapply config -- just bad design I guess. It has gotten better over the years but not the time it takes to deploy. The screenshot below is after the 6.4 upgrade (before someone from the Cisco team asks, yes, we are using FS 4000, not vFMC, 4100's are still running 6.3 but 2130's are on 6.4)


4. No real live logs in FMC. Noticeable lag when looking at Connection Events.
5. Extremely limited/Non existent VPN related troubleshooting via FMC. Have to rely on the CLI.
6. Throughput limitations when enabling all the required features -- IPS/URL Filtering/Malware etc.
7. No support for SSL decryption in hardware (until 2017/2018 with Cavium Nitrox crypto chips). Even with the hardware support the ssl decrypt sucks -- try it on your "Prod" 4110 if you have one and let me know how much more love you gained for Cisco afterwards :) 

I dare not get into the details of the pain of using FMC on daily basis, the pain of upgrading 4100's (FXOS/FTD, FMC HA), or the countless hours of headaches just to find out that there is no real feature parity with the ASA's or an alternative to it. So fast forward to September 2019, we are now running 6.4.0.4 and guess what? If you said not much has changed then you are a winner. It baffles me that an industry giant like Cisco is failing yet again to deliver a solid product (remember LMS 4.0, now called Cisco Prime, how about Cisco Cius, or my favorite 6880's with IA's). They spent 2.7 billion on Sourcefire acquisition and millions on redoing the code and what do we have? A half baked product that still can't stand up to the competition. Oh wait, here is something that I've been complaining about for years that has NOT been fixed (we are in the year 2019 -- do I need to say more):


Now you might be thinking that I'm just venting -- well guess what, I am, thanks to the Cisco Firepower team :) But don't just take my word for it, look at what Todd has to say...yes, Todd Lammle, not my friend Todd who thought I studied for "Sysco" exams and drove the "Sysco Foods" big rig for living until I educated him on Cisco:


I'm not a big fan of Palo Alto Networks (mainly due to their poor support) but PAN and Fortinet have a much better firewall product than Cisco. In the coming weeks, Cisco is going to release 6.5 for the general public and those drinking Cisco Kool-aid are already raving about how it's going to turn this flawed product into one of the best -- I on the other hand am not holding my breath. I tried to do my part by providing feedback, showed up to the meetings (including the security forum, beta testing, multiple Cisco Lives), emails and phone calls to the AM's, SE's, BU and to anyone who was willing to listen. I had a detailed/granular list  of all the issues, resolutions, feedback that I kept until late last year but no more -- I realized it is just a waste of my time, that Cisco firepower team literally has no clue what to do with this flawed product/software (no offense Cisco team, it is what it is). I personally think that Cisco Firepower team is doing nothing more than putting the lipstick on a pig with this release of FTD 6.5.

Are you an FTD/FMC users? Let me know your thoughts/comments.

Sunday, October 5, 2014

Cisco 6880X & 6800ia Part 3

We continue to have problems with these boxes, here is the latest 'Bus Error":






This was in production so we took about a 15 minute downtime, not pretty. Still working with Cisco TAC to find the root cause.

For Palo Alto Networks Cult members :)

https://www.nsslabs.com/blog/seriously

Floor is open for comments :)

Tuesday, September 9, 2014

Cisco 6880X & 6800ia Part 2

Our deployment of 6880’s and 6800ia’s is in a large healthcare system, which in hindsight was not the best move. We received the 6880’s with 15.1(2)SY1 code which had some serious issues, the biggest being SDP error messages which was causing 6800 extenders to flap constantly. Upon contacting Cisco TAC, I was told to upgrade the code to 15.1(2)SY2 (TAC engineer knew about it beforehand but there was no official documentation available….hmmm :)

Upgrading to SY2 did fix the flapping issue but caused a few other problems. Most important was that the extenders were getting stuck during the code upgrade (while pulling the new code from the parent switch). After a month long correspondence with TAC and BU engineers, we were told that “most” of these issues have been addressed in SY3 release of the code so we should upgrade to SY3. Keep in mind that every time you upgrade the 6880, the attached fex’s have to pull down the new code all over again and in the case of VSS, the fex’s reboot twice before during the process (when using eFSU). This may not be a big deal for a small setup but it is a huge problem in a 24x7 hospital environment. This problem is compounded by the fact that fex’s take 6-9 minutes to be fully operational after the reload. To make a long story short, even after the upgrade to SY3 we are still having major issues (including VSL link failure).

Here is a list of outstanding issues for 6880x (August 31, 2014):
  • Random VSL Link Failure.
  •  “ENTROPY_FAILURE: Unable to collect sufficient entropy”
  •  SSH stops working if the active switch goes into recovery mode. The only “fix” we have found so far is to reload both shelves.
  • 7-9 minutes boot time for the fex’s.
  •  In case of a single homed fex, uplink interface shutdown/failure or a simple twinax/fiber failure will reboot the fex (per cisco, it’s a “security” feature but it’s an issue for me).
  • ISSU/eFSU doesn’t provide much visibility into the upgrade process, leaving you wondering if it is stuck in the process (per cisco SY3 will show more “messages” during the upgrade process, I haven’t seen anything new so far).

We have received 12 of these boxes and have had 3 DOA linecards (C-6880-X-16P10G) so far. Keep in mind that you can’t interchange C-6880-X-LE-16P10G & C-6880-X-16P10G linecards as the LE is for Lite Edition (smaller hardware table) and will absolutely NOT work on the X (bigger hardware table) chassis. Installing/inserting these linecards into the chassis is tricky as well. If you don’t have it aligned exactly at the proper angle, it will get stuck and you will not be able to yank it out without messing up something else…..poorly designed linecard to say the least.


If you are not bothered by any of this stuff then you are a good candidate for deploying 6880’s J

Wednesday, August 27, 2014

Cisco 6880X & 6800ia Part 1

If you are thinking about deploying 6880X with some 6800ia’s then you may want to read this post in its entirety before making the final decision. I will be updating either this post or adding new posts as I come across new/relevant information.

I’ve long been a Cisco fan and one could have sold me a brick with a cisco logo on it (up until a couple of years ago) and I would have been very happy with that purchase. I would still have the same love for Cisco had I not touched the Cisco ACS (4.x & 5.x), Cisco Prime LMS 4.1 in the last few years, and the new 6880X with 6800 instant access switches. For now, let’s forget about the ACS/LMS and discuss 6880/6800ia. Being that routing and switching is one of Cisco’s core competencies; one would expect a very stable, reliable, and a feature rich (new features) product (like the 6500, Nexus 7K, ASR etc.). Unfortunately 6880X doesn’t enjoy any of those traits for now.
There is no doubt that the idea is good, you take the same model as nexus 7K/5K with fex’s and make it available outside of the datacenter environment but the execution of this plan has been subpar. With that being said, let us start with some of the “good” stuff about these boxes:

    1.      Extremely competitive price (compared to the 6500/6807 with Sup 2t).
    2.      PoE availability on the 6800 instant access switches.
    3.      Great 10g port density for the price.
    4.      Feature rich (L2/L3, full MPLS, GRE in hardware).

Please note that my assumption of “good” is heavily based on the pricing.
6880X datasheet:

And now some of the “not so good” list:
  1.  Max instant access switch/fex ports restricted to 1008. This means that you can only deploy 21 6800ia’s switches/fex’s per VSS pair. Per Cisco, this number will be increased to perhaps 2000 ports or more by the end of the year.
  2. You can only stack up to (3) 6800ia switches.
  3. You can only use FEX id’s 1-12 for now. So if you have a deployment where you need 15 single 6800ia switches….well you can only deploy 12 for now
Here is a complete list of restrictions and the things you can’t do on the 6800ia’s:
http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/15-1SY/config_guide/sup2T/15_1_sy_swcg_2T/instant_access.pdf

In the next post, I will share our 6880/6800ia deployment and the ongoing struggles with these boxes.