kostas Posted April 22, 2019 Posted April 22, 2019 Приветствую всех. Есть железка Catalyst 6509 с двумя головами SUP720-10GE, работают в режиме SSO. IOS s72033-adventerprisek9-mz.151-2.SY11 В один прекрастный момент при попытке войти в режим конфигурации получаю отказ в надписью Config mode cannot be entered during Standby initialization Смотрю и вижу, что отвалился один из супервизоров. #sh redundancy Redundant System Information : ------------------------------ Available system uptime = 6 days, 23 hours, 45 minutes Switchovers system experienced = 1 Standby failures = 14 Last switchover reason = active unit failed Hardware Mode = Duplex Configured Redundancy Mode = sso Operating Redundancy Mode = sso Maintenance Mode = Disabled Communications = Up Current Processor Information : ------------------------------- Active Location = slot 6 Current Software state = ACTIVE Uptime in current state = 6 days, 22 hours, 37 minutes Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY11, RELEASE SOFTWARE (fc3) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2017 by Cisco Systems, Inc. Compiled Fri 21-Jul-17 06:12 by prod_rel_team BOOT = sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY11.bin,12;sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY10.bin,12;sup-bootdisk:s72033-adventerprisek9_wan-mz.122-33.SXJ9.bin,12; CONFIG_FILE = BOOTLDR = Configuration register = 0x2102 Peer (slot: 5) information is not available because it is in 'DISABLED' state ds1#sh module Mod Ports Card Type Model Serial No. --- ----- -------------------------------------- ------------------ ----------- 1 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1101CTTW 2 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1021P4JY 3 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1023QBRV 4 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1021NW0X 5 0 Supervisor-Other Unknown Unknown 6 5 Supervisor Engine 720 10GE (Active) VS-S720-10G SAL1223T4DA 7 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1052BYZA 8 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL09370BH6 Mod MAC addresses Hw Fw Sw Status --- ---------------------------------- ------ ------------ ------------ ------- 1 001a.6cbe.a460 to 001a.6cbe.a48f 2.5 12.2(14r)S5 15.1(2)SY11 Ok 2 0017.e041.1d9c to 0017.e041.1dcb 2.3 12.2(14r)S5 15.1(2)SY11 Ok 3 0018.1833.8644 to 0018.1833.8673 2.3 12.2(14r)S5 15.1(2)SY11 Ok 4 0017.0ed4.82e4 to 0017.0ed4.8313 2.3 12.2(14r)S5 15.1(2)SY11 Ok 5 0000.0000.0000 to 0000.0000.0000 0.0 Unknown Unknown Unknown 6 0019.e8bb.3114 to 0019.e8bb.311b 2.0 8.5(2) 15.1(2)SY11 Ok 7 001a.2f80.f5c0 to 001a.2f80.f5ef 2.5 12.2(14r)S5 15.1(2)SY11 Ok 8 0015.6245.f740 to 0015.6245.f76f 2.3 12.2(14r)S5 15.1(2)SY11 Ok Mod Sub-Module Model Serial Hw Status ---- --------------------------- ------------------ ----------- ------- ------- 1 Centralized Forwarding Card WS-F6700-CFC SAD103102LM 3.0 Ok 2 Centralized Forwarding Card WS-F6700-CFC SAL1019MBDB 2.0 Ok 3 Centralized Forwarding Card WS-F6700-CFC SAL1029W0ZC 2.0 Ok 4 Centralized Forwarding Card WS-F6700-CFC SAL1017LFEZ 2.0 Ok 5 Policy Feature Card 3 VS-F6K-PFC3C SAL12372VEC 1.0 Other 6 Policy Feature Card 3 VS-F6K-PFC3C SAL1222S0GS 1.0 Ok 6 MSFC3 Daughterboard VS-F6K-MSFC3 SAL1224TW93 1.0 Ok 7 Centralized Forwarding Card WS-F6700-CFC SAL10360MGJ 3.0 Ok 8 Centralized Forwarding Card WS-F6700-CFC SAL093813ST 2.0 Ok Mod Online Diag Status ---- ------------------- 1 Pass 2 Pass 3 Pass 4 Pass 5 Unknown 6 Pass 7 Pass 8 Pass Попробовал сделать ресет модулю (а впоследствии и передернуть его) - не помогло, модуль перегружается циклически и не спаривается с активным супервизором. Пишет про таймаут. В логах вот такое: Apr 22 16:52:36 ds1 294872: Apr 22 13:52:34.015: %OIR-SP-3-PWRCYCLE: Card in module 5, is being power-cycled 'Module reset' Apr 22 17:03:09 ds1 295197: Apr 22 14:03:08.094: %PFREDUN-SP-6-ACTIVE: Standby processor removed or reloaded, changing to Simplex mode Apr 22 17:06:25 ds1 295308: Apr 22 14:06:25.094: %PFREDUN-SP-6-ACTIVE: Standby initializing for SSO mode Apr 22 17:06:28 ds1 295310: Apr 22 14:06:28.705: %RF_ISSU-SP-3-RF_MSG_NOT_OK: RF ISSU msg type (101) for client (3) on domain (0) is not ok Apr 22 17:06:29 ds1 295311: Apr 22 14:06:28.705: %RF-SP-5-SEND_FAIL: RF client progression send failure for reason (RF_BAD_MESSAGE) Apr 22 17:06:29 ds1 295313: Apr 22 14:06:28.705: %SYS-SP-3-LOGGER_FLUSHED: System was paused for 00:00:03 to ensure console debugging output. Apr 22 17:06:30 ds1 295314: Apr 22 14:06:29.840: %PFREDUN-SP-6-ACTIVE: Standby processor removed or reloaded, changing to Simplex mode Apr 22 17:06:31 ds1 295315: Apr 22 14:06:29.840: %OIR-SP-3-PWRCYCLE: Card in module 5, is being power-cycled 'Module reset' Apr 22 17:09:43 ds1 295429: Apr 22 14:09:42.858: %PFREDUN-SP-6-ACTIVE: Standby initializing for SSO mode Apr 22 17:19:57 ds1 295777: Apr 22 14:19:57.777: %ONLINE-SP-6-BOOT_TIMER: Module 5, Proc. 0. Failed to bring online because of boot timer event Apr 22 17:19:58 ds1 295778: sm(cygnus_oir_bay slot5), running yes, state empty Apr 22 17:19:58 ds1 295779: Last transition recorded: (remove)-> occupied (remove)-> empty (remove)-> empty_clr_persist (remove)-> empty (remove)-> empty_clr_persist (remove)-> empty (insert)-> may_be_occupied (remove)-> empty (remove)-> empty_clr_persist (remove)-> empty Apr 22 17:19:59 ds1 295781: Apr 22 14:19:57.781: %SYS-SP-3-LOGGER_FLUSHED: System was paused for 00:10:14 to ensure console debugging output. Вылечилось это полной перезагрузкой шасси. Но повторилось через неделю на том же модуле. Кстати говоря, этот модуль работал один несколько лет в симплексе и не знал проблем. Месяц назад где-то поставили второй в пару и обновили IOS. Точно такой же IOS и точно в такой же конфигурации с двумя VS-S720-10G работает второе шасси без подобных проблем. Кто знает что это может быть и как вывести его из этого состояния без перезагрузки всего шасси? Подозреваю, что когда отваливается модуль, redundancy залипает в SSO и после этого не может восстановить состояние. По идее должен ведь SSO перейти в какое-то другое состояние? В общем что-то странное происходит :( Есть идеи? Вставить ник Quote
zhenya` Posted April 22, 2019 Posted April 22, 2019 Воткните консоль в него и почитайте. скорее всего RIP Вставить ник Quote
kostas Posted April 22, 2019 Author Posted April 22, 2019 6 minutes ago, zhenya` said: Воткните консоль в него и почитайте. скорее всего RIP Консоль втыкал. Там тоже пишет про таймаут и уходит в циклическую перезагрузку. Что такое RIP? Вставить ник Quote
kostas Posted April 22, 2019 Author Posted April 22, 2019 12 minutes ago, vurd said: Rest in peace ааа) ну после перезагрузки шасси все работает. Подозрительно, что после отвала одной головы, redundancy не переходит в simplex. Почему он в SSO остается? Это ведь не так. Возможно тут собака и порылась. Вставить ник Quote
kostas Posted April 22, 2019 Author Posted April 22, 2019 #show redundancy domain all Redundant System Information : ------------------------------ Available system uptime = 1 week, 1 hour, 58 minutes Switchovers system experienced = 1 Standby failures = 18 Last switchover reason = active unit failed Hardware Mode = Duplex Configured Redundancy Mode = sso Operating Redundancy Mode = sso Maintenance Mode = Disabled Communications = Up Current Processor Information : ------------------------------- Active Location = slot 6 Current Software state = ACTIVE Uptime in current state = 1 week, 50 minutes Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY11, RELEASE SOFTWARE (fc3) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2017 by Cisco Systems, Inc. Compiled Fri 21-Jul-17 06:12 by prod_rel_team BOOT = sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY11.bin,12;sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY10.bin,12;sup-bootdisk:s72033-adventerprisek9_wan-mz.122-33.SXJ9.bin,12; CONFIG_FILE = BOOTLDR = Configuration register = 0x2102 Peer Processor Information : ---------------------------- Standby Location = slot 5 Current Software state = DISABLED Uptime in current state = 1 hour, 41 minutes Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY11, RELEASE SOFTWARE (fc3) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2017 by Cisco Systems, Inc. Compiled Fri 21-Jul-17 06:12 by prod_rel_team Configuration register = 0x2102 Redundant System Information (domain# 1): ------------------------------ Available system uptime = 1 week, 1 hour, 58 minutes Switchovers system experienced = 0 Standby failures = 0 Last switchover reason = none Hardware Mode = Simplex Maintenance Mode = Disabled Communications = Down Reason: Failure Current Processor Information : ------------------------------- Active Location = slot 6 Current Software state = DISABLED Uptime in current state = 1 week, 1 hour, 58 minutes Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY11, RELEASE SOFTWARE (fc3) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2017 by Cisco Systems, Inc. Compiled Fri 21-Jul-17 06:12 by prod_rel_team BOOT = sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY11.bin,12;sup-bootdisk:s72033-adventerprisek9-mz.151-2.SY10.bin,12;sup-bootdisk:s72033-adventerprisek9_wan-mz.122-33.SXJ9.bin,12; CONFIG_FILE = BOOTLDR = Configuration register = 0x2102 Peer (slot: 5) information is not available because it is in 'DISABLED' state Вот почему у него сейчас состояние SSO, когда модуля он вообще не видит? #show redundancy states my state = 13 -ACTIVE peer state = 1 -DISABLED Mode = Duplex Unit = Secondary Unit ID = 6 Redundancy Mode (Operational) = sso Redundancy Mode (Configured) = sso Redundancy State = sso Maintenance Mode = Disabled Communications = Up client count = 148 client_notification_TMR = 30000 milliseconds keep_alive TMR = 9000 milliseconds keep_alive count = 0 keep_alive threshold = 18 RF debug mask = 0x0 Вставить ник Quote
zhenya` Posted April 22, 2019 Posted April 22, 2019 Standby failures = 18 как бы намекает, что одной ногой на том свете уже. Вставить ник Quote
kostas Posted April 22, 2019 Author Posted April 22, 2019 1 minute ago, zhenya` said: Standby failures = 18 как бы намекает, что одной ногой на том свете уже. Так он по кругу модуль грузит, поэтому и столько failures. В первом сообщении было 14 :) Вот за час натикало Вставить ник Quote
kostas Posted April 23, 2019 Author Posted April 23, 2019 Есть идеи как вывести его сейчас из состояния блокировки конфигурации без перезагрузки шасси? Вставить ник Quote
kostas Posted April 25, 2019 Author Posted April 25, 2019 Перегрузил, поменял местами модули. Посмотрим... #sh modul Mod Ports Card Type Model Serial No. --- ----- -------------------------------------- ------------------ ----------- 1 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1101CTTW 2 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1021P4JY 3 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1023QBRV 4 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1021NW0X 5 5 Supervisor Engine 720 10GE (Active) VS-S720-10G SAL1223T4DA 6 5 Supervisor Engine 720 10GE (Hot) VS-S720-10G SAL12372PJX 7 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1052BYZA 8 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL09370BH6 Mod MAC addresses Hw Fw Sw Status --- ---------------------------------- ------ ------------ ------------ ------- 1 001a.6cbe.a460 to 001a.6cbe.a48f 2.5 12.2(14r)S5 15.1(2)SY11 Ok 2 0017.e041.1d9c to 0017.e041.1dcb 2.3 12.2(14r)S5 15.1(2)SY11 Ok 3 0018.1833.8644 to 0018.1833.8673 2.3 12.2(14r)S5 15.1(2)SY11 Ok 4 0017.0ed4.82e4 to 0017.0ed4.8313 2.3 12.2(14r)S5 15.1(2)SY11 Ok 5 0019.e8bb.3114 to 0019.e8bb.311b 2.0 8.5(2) 15.1(2)SY11 Ok 6 001d.45e2.6030 to 001d.45e2.6037 2.0 8.5(2) 15.1(2)SY11 Ok 7 001a.2f80.f5c0 to 001a.2f80.f5ef 2.5 12.2(14r)S5 15.1(2)SY11 Ok 8 0015.6245.f740 to 0015.6245.f76f 2.3 12.2(14r)S5 15.1(2)SY11 Ok Mod Sub-Module Model Serial Hw Status ---- --------------------------- ------------------ ----------- ------- ------- 1 Centralized Forwarding Card WS-F6700-CFC SAD103102LM 3.0 Ok 2 Centralized Forwarding Card WS-F6700-CFC SAL1019MBDB 2.0 Ok 3 Centralized Forwarding Card WS-F6700-CFC SAL1029W0ZC 2.0 Ok 4 Centralized Forwarding Card WS-F6700-CFC SAL1017LFEZ 2.0 Ok 5 Policy Feature Card 3 VS-F6K-PFC3C SAL1222S0GS 1.0 Ok 5 MSFC3 Daughterboard VS-F6K-MSFC3 SAL1224TW93 1.0 Ok 6 Policy Feature Card 3 VS-F6K-PFC3C SAL12372VEC 1.0 Ok 6 MSFC3 Daughterboard VS-F6K-MSFC3 SAL12351G4C 1.0 Ok 7 Centralized Forwarding Card WS-F6700-CFC SAL10360MGJ 3.0 Ok 8 Centralized Forwarding Card WS-F6700-CFC SAL093813ST 2.0 Ok Mod Online Diag Status ---- ------------------- 1 Pass 2 Pass 3 Pass 4 Pass 5 Pass 6 Pass 7 Pass 8 Pass Вставить ник Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.