Jump to content
Калькуляторы

Cisco 6500 разваливается redundancy

Приветствую всех.

 

Есть железка Catalyst 6509 с двумя головами SUP720-10GE, работают в режиме SSO.

IOS s72033-adventerprisek9-mz.151-2.SY11

 

В один прекрастный момент при попытке войти в режим конфигурации получаю отказ в надписью 

Config mode cannot be entered during Standby initialization

 

Смотрю и вижу, что отвалился один из супервизоров.

 

#sh redundancy
Redundant System Information :
------------------------------
       Available system uptime = 6 days, 23 hours, 45 minutes
Switchovers system experienced = 1
              Standby failures = 14
        Last switchover reason = active unit failed

                 Hardware Mode = Duplex
    Configured Redundancy Mode = sso
     Operating Redundancy Mode = sso
              Maintenance Mode = Disabled
                Communications = Up

Current Processor Information :
-------------------------------
               Active Location = slot 6
        Current Software state = ACTIVE
       Uptime in current state = 6 days, 22 hours, 37 minutes
                 Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY11, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2017 by Cisco Systems, Inc.
Compiled Fri 21-Jul-17 06:12 by prod_rel_team
                          BOOT = sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY11.bin,12;sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY10.bin,12;sup-bootdisk:s72033-adventerprisek9_wan-mz.122-33.SXJ9.bin,12;
                   CONFIG_FILE =
                       BOOTLDR =
        Configuration register = 0x2102

Peer (slot: 5) information is not available because it is in 'DISABLED' state

 

ds1#sh module
Mod Ports Card Type                              Model              Serial No.
--- ----- -------------------------------------- ------------------ -----------
  1   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1101CTTW
  2   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1021P4JY
  3   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1023QBRV
  4   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1021NW0X
  5    0  Supervisor-Other                       Unknown            Unknown
  6    5  Supervisor Engine 720 10GE (Active)    VS-S720-10G        SAL1223T4DA
  7   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1052BYZA
  8   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL09370BH6

Mod MAC addresses                       Hw    Fw           Sw           Status
--- ---------------------------------- ------ ------------ ------------ -------
  1  001a.6cbe.a460 to 001a.6cbe.a48f   2.5   12.2(14r)S5  15.1(2)SY11  Ok
  2  0017.e041.1d9c to 0017.e041.1dcb   2.3   12.2(14r)S5  15.1(2)SY11  Ok
  3  0018.1833.8644 to 0018.1833.8673   2.3   12.2(14r)S5  15.1(2)SY11  Ok
  4  0017.0ed4.82e4 to 0017.0ed4.8313   2.3   12.2(14r)S5  15.1(2)SY11  Ok
  5  0000.0000.0000 to 0000.0000.0000   0.0   Unknown      Unknown      Unknown
  6  0019.e8bb.3114 to 0019.e8bb.311b   2.0   8.5(2)       15.1(2)SY11  Ok
  7  001a.2f80.f5c0 to 001a.2f80.f5ef   2.5   12.2(14r)S5  15.1(2)SY11  Ok
  8  0015.6245.f740 to 0015.6245.f76f   2.3   12.2(14r)S5  15.1(2)SY11  Ok

Mod  Sub-Module                  Model              Serial       Hw     Status
---- --------------------------- ------------------ ----------- ------- -------
  1  Centralized Forwarding Card WS-F6700-CFC       SAD103102LM  3.0    Ok
  2  Centralized Forwarding Card WS-F6700-CFC       SAL1019MBDB  2.0    Ok
  3  Centralized Forwarding Card WS-F6700-CFC       SAL1029W0ZC  2.0    Ok
  4  Centralized Forwarding Card WS-F6700-CFC       SAL1017LFEZ  2.0    Ok
  5  Policy Feature Card 3       VS-F6K-PFC3C       SAL12372VEC  1.0    Other
  6  Policy Feature Card 3       VS-F6K-PFC3C       SAL1222S0GS  1.0    Ok
  6  MSFC3 Daughterboard         VS-F6K-MSFC3       SAL1224TW93  1.0    Ok
  7  Centralized Forwarding Card WS-F6700-CFC       SAL10360MGJ  3.0    Ok
  8  Centralized Forwarding Card WS-F6700-CFC       SAL093813ST  2.0    Ok

Mod  Online Diag Status
---- -------------------
  1  Pass
  2  Pass
  3  Pass
  4  Pass
  5  Unknown
  6  Pass
  7  Pass
  8  Pass

 

Попробовал сделать ресет модулю (а впоследствии и передернуть его) - не помогло, модуль перегружается циклически и не спаривается с активным супервизором. Пишет про таймаут. В логах вот такое:

 

Apr 22 16:52:36 ds1 294872: Apr 22 13:52:34.015: %OIR-SP-3-PWRCYCLE: Card in module 5, is being power-cycled 'Module reset'
Apr 22 17:03:09 ds1 295197: Apr 22 14:03:08.094: %PFREDUN-SP-6-ACTIVE: Standby processor removed or reloaded, changing to Simplex mode
Apr 22 17:06:25 ds1 295308: Apr 22 14:06:25.094: %PFREDUN-SP-6-ACTIVE: Standby initializing for SSO mode
Apr 22 17:06:28 ds1 295310: Apr 22 14:06:28.705: %RF_ISSU-SP-3-RF_MSG_NOT_OK: RF ISSU msg type (101) for client (3) on domain (0) is not ok
Apr 22 17:06:29 ds1 295311: Apr 22 14:06:28.705: %RF-SP-5-SEND_FAIL: RF client progression send failure for reason (RF_BAD_MESSAGE)
Apr 22 17:06:29 ds1 295313: Apr 22 14:06:28.705: %SYS-SP-3-LOGGER_FLUSHED: System was paused for 00:00:03 to ensure console debugging output.
Apr 22 17:06:30 ds1 295314: Apr 22 14:06:29.840: %PFREDUN-SP-6-ACTIVE: Standby processor removed or reloaded, changing to Simplex mode
Apr 22 17:06:31 ds1 295315: Apr 22 14:06:29.840: %OIR-SP-3-PWRCYCLE: Card in module 5, is being power-cycled 'Module reset'
Apr 22 17:09:43 ds1 295429: Apr 22 14:09:42.858: %PFREDUN-SP-6-ACTIVE: Standby initializing for SSO mode
Apr 22 17:19:57 ds1 295777: Apr 22 14:19:57.777: %ONLINE-SP-6-BOOT_TIMER: Module 5, Proc. 0. Failed to bring online because of boot timer event
Apr 22 17:19:58 ds1 295778: sm(cygnus_oir_bay slot5), running yes, state empty
Apr 22 17:19:58 ds1 295779: Last transition recorded: (remove)-> occupied (remove)-> empty (remove)-> empty_clr_persist (remove)-> empty (remove)-> empty_clr_persist (remove)-> empty (insert)-> may_be_occupied (remove)-> empty (remove)-> empty_clr_persist (remove)-> empty
Apr 22 17:19:59 ds1 295781: Apr 22 14:19:57.781: %SYS-SP-3-LOGGER_FLUSHED: System was paused for 00:10:14 to ensure console debugging output.

 

Вылечилось это полной перезагрузкой шасси. Но повторилось через неделю на том же модуле.

Кстати говоря, этот модуль работал один несколько лет в симплексе и не знал проблем. Месяц назад где-то поставили второй в пару и обновили IOS. 

Точно такой же IOS и точно в такой же конфигурации с двумя VS-S720-10G работает второе шасси без подобных проблем.

 

Кто знает что это может быть и как вывести его из этого состояния без перезагрузки всего шасси? Подозреваю, что когда отваливается модуль, redundancy залипает в SSO и после этого не может восстановить состояние. По идее должен ведь SSO перейти в какое-то другое состояние?

 

В общем что-то странное происходит :( Есть идеи?

 

 

Share this post


Link to post
Share on other sites

6 minutes ago, zhenya` said:

Воткните консоль в него и почитайте. скорее всего RIP

Консоль втыкал. Там тоже пишет про таймаут и уходит в циклическую перезагрузку.

Что такое RIP?

Share this post


Link to post
Share on other sites

12 minutes ago, vurd said:

Rest in peace

ааа) ну после перезагрузки шасси все работает. 

Подозрительно, что после отвала одной головы, redundancy не переходит в simplex. Почему он в SSO остается? Это ведь не так.

Возможно тут собака и порылась.

Share this post


Link to post
Share on other sites

#show redundancy domain all
Redundant System Information :
------------------------------
       Available system uptime = 1 week, 1 hour, 58 minutes
Switchovers system experienced = 1
              Standby failures = 18
        Last switchover reason = active unit failed

                 Hardware Mode = Duplex
    Configured Redundancy Mode = sso
     Operating Redundancy Mode = sso
              Maintenance Mode = Disabled
                Communications = Up

Current Processor Information :
-------------------------------
               Active Location = slot 6
        Current Software state = ACTIVE
       Uptime in current state = 1 week, 50 minutes
                 Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY11, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2017 by Cisco Systems, Inc.
Compiled Fri 21-Jul-17 06:12 by prod_rel_team
                          BOOT = sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY11.bin,12;sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY10.bin,12;sup-bootdisk:s72033-adventerprisek9_wan-mz.122-33.SXJ9.bin,12;
                   CONFIG_FILE =
                       BOOTLDR =
        Configuration register = 0x2102

Peer Processor Information :
----------------------------
              Standby Location = slot 5
        Current Software state = DISABLED
       Uptime in current state = 1 hour, 41 minutes
                 Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY11, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2017 by Cisco Systems, Inc.
Compiled Fri 21-Jul-17 06:12 by prod_rel_team
        Configuration register = 0x2102



Redundant System Information (domain# 1):
------------------------------
       Available system uptime = 1 week, 1 hour, 58 minutes
Switchovers system experienced = 0
              Standby failures = 0
        Last switchover reason = none

                 Hardware Mode = Simplex
              Maintenance Mode = Disabled
                Communications = Down      Reason: Failure

Current Processor Information :
-------------------------------
               Active Location = slot 6
        Current Software state = DISABLED
       Uptime in current state = 1 week, 1 hour, 58 minutes
                 Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY11, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2017 by Cisco Systems, Inc.
Compiled Fri 21-Jul-17 06:12 by prod_rel_team
                          BOOT = sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY11.bin,12;sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY10.bin,12;sup-bootdisk:s72033-adventerprisek9_wan-mz.122-33.SXJ9.bin,12;
                   CONFIG_FILE =
                       BOOTLDR =
        Configuration register = 0x2102

Peer (slot: 5) information is not available because it is in 'DISABLED' state

 

 

Вот почему у него сейчас состояние SSO, когда модуля он вообще не видит?

 

#show redundancy states
       my state = 13 -ACTIVE
     peer state = 1  -DISABLED
           Mode = Duplex
           Unit = Secondary
        Unit ID = 6

Redundancy Mode (Operational) = sso
Redundancy Mode (Configured)  = sso
Redundancy State              = sso
     Maintenance Mode = Disabled
 Communications = Up

   client count = 148
 client_notification_TMR = 30000 milliseconds
          keep_alive TMR = 9000 milliseconds
        keep_alive count = 0
    keep_alive threshold = 18
           RF debug mask = 0x0

 

Share this post


Link to post
Share on other sites

1 minute ago, zhenya` said:

Standby failures = 18

как бы намекает, что одной ногой на том свете уже.

Так он по кругу модуль грузит, поэтому и столько failures. В первом сообщении было 14 :) Вот за час натикало

Share this post


Link to post
Share on other sites

Есть идеи как вывести его сейчас из состояния блокировки конфигурации без перезагрузки шасси?

Share this post


Link to post
Share on other sites

Перегрузил, поменял местами модули. Посмотрим...

#sh modul
Mod Ports Card Type                              Model              Serial No.
--- ----- -------------------------------------- ------------------ -----------
  1   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1101CTTW
  2   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1021P4JY
  3   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1023QBRV
  4   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1021NW0X
  5    5  Supervisor Engine 720 10GE (Active)    VS-S720-10G        SAL1223T4DA
  6    5  Supervisor Engine 720 10GE (Hot)       VS-S720-10G        SAL12372PJX
  7   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL1052BYZA
  8   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX     SAL09370BH6

Mod MAC addresses                       Hw    Fw           Sw           Status
--- ---------------------------------- ------ ------------ ------------ -------
  1  001a.6cbe.a460 to 001a.6cbe.a48f   2.5   12.2(14r)S5  15.1(2)SY11  Ok
  2  0017.e041.1d9c to 0017.e041.1dcb   2.3   12.2(14r)S5  15.1(2)SY11  Ok
  3  0018.1833.8644 to 0018.1833.8673   2.3   12.2(14r)S5  15.1(2)SY11  Ok
  4  0017.0ed4.82e4 to 0017.0ed4.8313   2.3   12.2(14r)S5  15.1(2)SY11  Ok
  5  0019.e8bb.3114 to 0019.e8bb.311b   2.0   8.5(2)       15.1(2)SY11  Ok
  6  001d.45e2.6030 to 001d.45e2.6037   2.0   8.5(2)       15.1(2)SY11  Ok
  7  001a.2f80.f5c0 to 001a.2f80.f5ef   2.5   12.2(14r)S5  15.1(2)SY11  Ok
  8  0015.6245.f740 to 0015.6245.f76f   2.3   12.2(14r)S5  15.1(2)SY11  Ok

Mod  Sub-Module                  Model              Serial       Hw     Status
---- --------------------------- ------------------ ----------- ------- -------
  1  Centralized Forwarding Card WS-F6700-CFC       SAD103102LM  3.0    Ok
  2  Centralized Forwarding Card WS-F6700-CFC       SAL1019MBDB  2.0    Ok
  3  Centralized Forwarding Card WS-F6700-CFC       SAL1029W0ZC  2.0    Ok
  4  Centralized Forwarding Card WS-F6700-CFC       SAL1017LFEZ  2.0    Ok
  5  Policy Feature Card 3       VS-F6K-PFC3C       SAL1222S0GS  1.0    Ok
  5  MSFC3 Daughterboard         VS-F6K-MSFC3       SAL1224TW93  1.0    Ok
  6  Policy Feature Card 3       VS-F6K-PFC3C       SAL12372VEC  1.0    Ok
  6  MSFC3 Daughterboard         VS-F6K-MSFC3       SAL12351G4C  1.0    Ok
  7  Centralized Forwarding Card WS-F6700-CFC       SAL10360MGJ  3.0    Ok
  8  Centralized Forwarding Card WS-F6700-CFC       SAL093813ST  2.0    Ok

Mod  Online Diag Status
---- -------------------
  1  Pass
  2  Pass
  3  Pass
  4  Pass
  5  Pass
  6  Pass
  7  Pass
  8  Pass

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.