Tuesday, September 9, 2014

Cisco 6880X & 6800ia Part 2

Our deployment of 6880’s and 6800ia’s is in a large healthcare system, which in hindsight was not the best move. We received the 6880’s with 15.1(2)SY1 code which had some serious issues, the biggest being SDP error messages which was causing 6800 extenders to flap constantly. Upon contacting Cisco TAC, I was told to upgrade the code to 15.1(2)SY2 (TAC engineer knew about it beforehand but there was no official documentation available….hmmm :)

Upgrading to SY2 did fix the flapping issue but caused a few other problems. Most important was that the extenders were getting stuck during the code upgrade (while pulling the new code from the parent switch). After a month long correspondence with TAC and BU engineers, we were told that “most” of these issues have been addressed in SY3 release of the code so we should upgrade to SY3. Keep in mind that every time you upgrade the 6880, the attached fex’s have to pull down the new code all over again and in the case of VSS, the fex’s reboot twice before during the process (when using eFSU). This may not be a big deal for a small setup but it is a huge problem in a 24x7 hospital environment. This problem is compounded by the fact that fex’s take 6-9 minutes to be fully operational after the reload. To make a long story short, even after the upgrade to SY3 we are still having major issues (including VSL link failure).

Here is a list of outstanding issues for 6880x (August 31, 2014):
  • Random VSL Link Failure.
  •  “ENTROPY_FAILURE: Unable to collect sufficient entropy”
  •  SSH stops working if the active switch goes into recovery mode. The only “fix” we have found so far is to reload both shelves.
  • 7-9 minutes boot time for the fex’s.
  •  In case of a single homed fex, uplink interface shutdown/failure or a simple twinax/fiber failure will reboot the fex (per cisco, it’s a “security” feature but it’s an issue for me).
  • ISSU/eFSU doesn’t provide much visibility into the upgrade process, leaving you wondering if it is stuck in the process (per cisco SY3 will show more “messages” during the upgrade process, I haven’t seen anything new so far).

We have received 12 of these boxes and have had 3 DOA linecards (C-6880-X-16P10G) so far. Keep in mind that you can’t interchange C-6880-X-LE-16P10G & C-6880-X-16P10G linecards as the LE is for Lite Edition (smaller hardware table) and will absolutely NOT work on the X (bigger hardware table) chassis. Installing/inserting these linecards into the chassis is tricky as well. If you don’t have it aligned exactly at the proper angle, it will get stuck and you will not be able to yank it out without messing up something else…..poorly designed linecard to say the least.


If you are not bothered by any of this stuff then you are a good candidate for deploying 6880’s J

14 comments:

  1. i'm working on a DC design ..guess i'll consider the old 4500 and 3750X stacks ..at least it's working :)...thanks your pain has avoided mine ;)...

    ReplyDelete
  2. For DC designs go with Nexus

    ReplyDelete
  3. We have a full setup of 6807-XL with Instant access and it's a DISASTER.

    We have a lot of errors like yours, crashes, etc.

    Don't upgradeto 15.2.1. Worst release of a Cisco IOS I ever saw.

    ReplyDelete
  4. I confirm that 15.2.1(SY1) is a nightmare! The problem is that this is the only version which supports more than 12 FEX ids..

    My biggest problem now is QoS.. For some reason I got some packets being dropped (those who hit Q4). And despite all the changes & tweaks I've done - nothing really helped unless I put all the traffic within Queue2 or 3 (which have bigger buffers assigned to them)..

    Have you guys had any issues with QoS or it's just me?

    ReplyDelete
  5. Looks like 15.2(1)SY1a resolves a lot of what I've seen being reported by others. After evaluating the solution for a while, we decided to go with 6880-X-LE and 6800ia switches at the access layer. Building and testing now and trying to recreate many of the above items just to see what happens prior to anything going into production.

    ReplyDelete
  6. I did try 15.2(1)SY1a and it is indeed looking good so far. I heard from a Cisco guy they've fixed most of the issues known to IA, so fingers crossed...

    ReplyDelete
  7. Hi,
    We have a large campus deployment and it seems if you go above 1000 dot1x/MAB enabled interfaces in total none of the authentication works. TAC currently busy looking into that, but we also had lots of teething probs with this platform...

    ReplyDelete
    Replies
    1. Jman,
      Is the isssue still not resolved? Can you share your TAC case number?
      We will be having that type of deployment very soon.

      thanks,

      Delete
    2. Hi,
      TAC confirmed it to be a bug...
      Bug ID: CSCuv50743 is fixed in 15.2(01)SY02
      We will be upgrading hopefully this Thursday and I will give feedback again...just remind me...
      Ciao

      Delete
    3. Ciao, did it work after the upgrade?

      thanks,

      Yatao

      Delete
    4. Hi,
      We upgraded to 15.2.1SY2.
      Sofar looking good. We have 1300 ports configured with dot1x/mab and we dont see issues.
      We will add some more fexes shortly.
      Ciao

      Delete
  8. There is also a memory leak issue with 15.2(1)SY1a that has caused our 6880 and 6800i switches to crash. TAC said an upgrade to 15.2(1)SY2 fixes this issue.

    ReplyDelete
  9. have tryed to upgrade from 12.1.2.sy9 to 15.2.1.sy3 via ISSU, but on my actice sup, the VSL links went into error-disable, and the standby sup came up in recover moode on the old sw wersion instead of booting up on the new sy3.
    Have anyony seen this before ??

    ReplyDelete