Intel NetStructure
Intel NetStructure® ZT 7102 Chassis Management Module
Software Technical Product Specification
February 2006
Order Number: 311609-001
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800548-4725, or by visiting Intel's website at http://www.intel.com. Intel, Intel logo, Intel NetStructure, and Intel XScale are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright © Intel Corporation 2003-2006. All rights reserved.
2
ZT 7102 Chassis Management Module
Contents
1 Introduction ...................................................................................................................................... 18
1.1 1.2 1.3 1.4 1.5 1.6
2
Overview............................................................................................................................. 18 Third party chassis integration ............................................................................................ 18 ZT 7102 Features ............................................................................................................... 18 Acronyms............................................................................................................................ 19 Specification conformance.................................................................................................. 20 Related documents............................................................................................................. 21 Embedded Debug and Bootstrap ....................................................................................... 22 Operating System ............................................................................................................... 22 Firmware Interfaces ............................................................................................................ 22 2.3.1 Command Line Interface (CLI) .............................................................................. 22 2.3.2 SNMP .................................................................................................................... 23 2.3.3 Remote Procedure Call (RPC) Interface................................................................ 23 Ethernet Interfaces ............................................................................................................. 23 FTP ..................................................................................................................................... 23 Sensor Event Logs (SEL) ................................................................................................... 24 2.6.1 CMM SEL Architecture .......................................................................................... 24 2.6.2 Satellite Management Controller (SMC) Boards....................................................24 2.6.3 Baseboard Management Controller (BMC) Boards ............................................... 24 2.6.4 Retrieving a SEL .................................................................................................... 24 2.6.5 Clearing a SEL....................................................................................................... 24 2.6.6 Retrieving the Raw SEL......................................................................................... 25 BIST Test Flow ................................................................................................................... 26 Boot-BIST ........................................................................................................................... 27 Early-BIST .......................................................................................................................... 27 Mid-BIST ............................................................................................................................. 27 Late-BIST............................................................................................................................ 28 QuickBoot Feature .............................................................................................................. 28 3.6.1 Configuring QuickBoot ........................................................................................... 29 Event Log Area and Event Management............................................................................ 29 OS Flash Corruption Detection and Recovery Design ....................................................... 29 3.8.1 Monitoring the Static Images ................................................................................. 29 3.8.2 Monitoring the Dynamic Images ............................................................................ 30 3.8.3 CMM Failover ........................................................................................................ 30 BIST Test Descriptions ....................................................................................................... 30 3.9.1 Flash Checksum Test ............................................................................................ 30 3.9.2 Base Memory Test................................................................................................. 30 3.9.3 Extended Memory Tests ........................................................................................ 30 3.9.4 FPGA Version Check............................................................................................. 31 3.9.5 DS1307 RTC Test ................................................................................................. 31 3.9.6 NIC Presence/Local PCI Bus Test......................................................................... 31 3.9.7 OS Image Checksum Test..................................................................................... 31
Software Specifications ..................................................................................................................... 22
2.1 2.2 2.3
2.4 2.5 2.6
3
Built-In Self Test (BIST) ..................................................................................................................... 26
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
3.9
Software Technical Product Specification
3
3.9.8 3.9.9
4
CRC32 Checksum ................................................................................................. 32 IPMB Bus Busy/Not Ready Test............................................................................ 32
Setting Up the CMM ......................................................................................................................... 34
4.1 4.2
4.3
4.4
5
Connecting to the CMM ...................................................................................................... 34 Initial setup.......................................................................................................................... 34 4.2.1 Setting IP address properties ................................................................................ 34 4.2.2 Setting a hostname................................................................................................ 38 4.2.3 Setting the amount of time for auto-logout............................................................. 38 Connecting to the CMM ...................................................................................................... 38 4.3.1 Connecting to the CMM using a secure login session........................................... 38 4.3.2 Connecting to the CMM using Telnet .................................................................... 39 4.3.3 Connecting to the CMM using FTP........................................................................ 40 Rebooting the CMM............................................................................................................ 40 Overview............................................................................................................................. 42 Command Line Syntax And Arguments.............................................................................. 42 cmmget and cmmset Syntax .............................................................................................. 42 Location Parameter ............................................................................................................ 42 Target Parameter................................................................................................................ 43 Dataitem Parameter............................................................................................................ 43 5.6.1 bladestatus dataitem.............................................................................................. 49 Value Parameter................................................................................................................. 60 IPMI Error Completion Codes............................................................................................. 61 Overview............................................................................................................................. 64 Synchronization .................................................................................................................. 64 Heterogeneous synchronization ......................................................................................... 66 6.3.1 State and data cache files ..................................................................................... 66 6.3.2 Interpretation of CMM-related SEL events ............................................................ 67 6.3.3 SDR and SIF synchronization................................................................................ 67 6.3.4 Configuring synchronization of user scripts ........................................................... 67 6.3.5 Synchronization requirements ............................................................................... 68 Initial data synchronization ................................................................................................. 69 6.4.1 Initial Data Sync Failure......................................................................................... 69 DataSync Status Sensor..................................................................................................... 69 6.5.1 Sensor bitmap........................................................................................................ 69 6.5.2 Querying the DataSync Status sensor................................................................... 70 6.5.3 SEL event .............................................................................................................. 71 6.5.4 SNMP traps ........................................................................................................... 72 6.5.5 System Health ....................................................................................................... 72 CMM failover....................................................................................................................... 72 6.6.1 Scenarios that prevent failover .............................................................................. 72 6.6.2 Scenarios that fail over to a healthier standby CMM ............................................. 73 6.6.3 Manual failover from the active CMM .................................................................... 73 6.6.4 Scenarios that force a failover ............................................................................... 74 CMM Status events ............................................................................................................ 74 Manual failover from the standby CMM .............................................................................. 75 6.8.1 Two types of promotion ......................................................................................... 75
Command Line Interface .................................................................................................................. 42
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8
6
Redundancy, Synchronization, and Failover ........................................................................................ 64
6.1 6.2 6.3
6.4 6.5
6.6
6.7 6.8
4
ZT 7102 Chassis Management Module
6.9
7
6.8.2 Disabling and enabling automatic failover ............................................................. 77 6.8.3 Promotion of the standby and the active Ethernet interface .................................. 79 6.8.4 Both CMMs active.................................................................................................. 79 Failover when switch link to active CMM drops .................................................................. 79 Overview............................................................................................................................. 80 7.1.1 Process existence monitoring ................................................................................ 80 7.1.2 Thread watchdog monitoring ................................................................................. 80 7.1.3 Process Integrity monitoring .................................................................................. 81 Processes monitored .......................................................................................................... 82 Process monitoring targets ................................................................................................. 82 Process monitoring dataitems ............................................................................................ 83 7.4.1 Examples ............................................................................................................... 83 Process monitoring CMM events ........................................................................................ 84 Failure scenarios and event processing ............................................................................. 85 7.6.1 No action recovery ................................................................................................. 85 7.6.2 Successful restart recovery ................................................................................... 86 7.6.3 Successful failover and restart recovery ................................................................ 86 7.6.4 Successful failover and reboot recovery ................................................................ 87 7.6.5 Failed failover and reboot recovery for a non-critical process ............................... 87 7.6.6 Failed failover and reboot recovery for a critical process....................................... 88 7.6.7 Excessive restarts and escalation is no action ...................................................... 89 7.6.8 Excessive restarts and successful escalation of failover and reboot ..................... 90 7.6.9 Excessive restarts with failed escalation of failover and reboot for a non-critical process90 7.6.10 Excessive restarts with failed escalation of failover and reboot for a critical process 91 7.6.11 Process administrative action ................................................................................ 92 7.6.12 Excessive failover and reboots with administrative action ..................................... 93 Process Integrity Executable ..............................................................................................93 Configuring pms.ini ............................................................................................................. 94 7.8.1 Global data ............................................................................................................ 94 7.8.2 Data specific to each process ................................................................................ 95 7.8.3 Process definition section of pms.ini...................................................................... 99 PIE-specific data configuration .........................................................................................103 7.9.1 PIE section name.................................................................................................104 7.9.2 ProcessIntegrityExecutable parameter ................................................................104 7.9.3 UniqueID parameter ............................................................................................104 7.9.4 AdminState parameter .........................................................................................105 7.9.5 ProcessIntegrityInterval parameter ......................................................................105 PmsPieSnmp ....................................................................................................................105 7.10.1 SNMP PIE section of pms.ini ...............................................................................106 WP/BPM PIE ....................................................................................................................106 7.11.1 WP/BPM Section of pms.ini .................................................................................107 Resetting the Password in a Single CMM System ...........................................................108 Resetting the Password in a Dual CMM System ..............................................................108
Process Monitoring and Integrity ........................................................................................................ 80
7.1
7.2 7.3 7.4 7.5 7.6
7.7 7.8
7.9
7.10 7.11
8
Resetting the Password ..................................................................................................................108
8.1 8.2
9
Formatting Events and Traps ...........................................................................................................110
Software Technical Product Specification
5
9.1 9.2 9.3
System Events Overview.................................................................................................. 110 SNMP Trap Output ........................................................................................................... 111 9.2.1 Sending SNMP traps for unrecognized events.................................................... 112 SEL Entries....................................................................................................................... 112 9.3.1 SEL Header Format............................................................................................. 112 9.3.2 SEL Text Translation Format............................................................................... 113 9.3.3 SEL Raw Format ................................................................................................. 113 9.3.4 Configuring the SEL display format ..................................................................... 113 9.3.5 Displaying unrecognized SEL events .................................................................. 114 Introduction ....................................................................................................................... 116 Threshold-Based Sensors ................................................................................................ 116 Threshold-based sensors on the CMM............................................................................. 116 Discrete Sensors .............................................................................................................. 118 10.4.1 Event description strings...................................................................................... 118 Current Value.................................................................................................................... 119 Sensor information details ................................................................................................ 119 10.6.1 SEL entries .......................................................................................................... 119 10.6.2 SNMP Trap Event Syntax.................................................................................... 120 Sensor Targets ................................................................................................................. 121 Introduction ....................................................................................................................... 122 Health events listing.......................................................................................................... 122 Healthevents Queries ....................................................................................................... 123 11.3.1 Health Events Queries for Individual Sensors ..................................................... 123 11.3.2 Health Events Queries for All Sensors on a Location.......................................... 123 11.3.3 No Active Events ................................................................................................. 124 11.3.4 Not Present or Non-IPMI Locations ..................................................................... 125 Health LEDs...................................................................................................................... 125 Comparing the HEALTHY# Signal and Health ................................................................. 126 Slot Power-Up Sequence ................................................................................................. 126 12.2.1 Assertion of BD_SEL# ......................................................................................... 126 12.2.2 Assertion of HEALTHY# During Power-Up.......................................................... 127 Obtaining the Power State of a Board .............................................................................. 127 Controlling the Power State of a Slot................................................................................ 128 12.4.1 Powering Off a Board .......................................................................................... 128 12.4.2 Powering On a Board .......................................................................................... 128 12.4.3 Resetting a Board ................................................................................................ 129 Power Sequencing Commands and Policies.................................................................... 129 12.5.1 Retrieving the Healthy# Ramp-Up Time .............................................................. 129 12.5.2 Setting the Healthy# Ramp-Up Time ................................................................... 129 12.5.3 Retrieving the Maximum Power-Up Attempts...................................................... 130 12.5.4 Setting the Maximum Power-Up Attempts........................................................... 130 12.5.5 Power Sequencing SEL Events........................................................................... 130 Power Sequencing Policy and the Manual Recovery State.............................................. 131 12.6.1 Controlling the Power Sequencing Policy............................................................ 132 12.6.2 Querying the State of the Power Sequencing Policy ........................................... 132
10
Sensors ......................................................................................................................................... 116
10.1 10.2 10.3 10.4 10.5 10.6
10.7
11
Health Events ................................................................................................................................ 122
11.1 11.2 11.3
11.4
12
Slot Power Control.......................................................................................................................... 126
12.1 12.2
12.3 12.4
12.5
12.6
6
ZT 7102 Chassis Management Module
12.6.3 12.6.4 12.6.5 12.6.6
13
Obtaining the Power Sequencing Policy..............................................................132 Exiting the Manual Recovery State......................................................................134 SEL Events ..........................................................................................................135 Chassis Health Events.........................................................................................135
Power Supplies ..............................................................................................................................138
13.1 13.2 13.3 13.4 13.5 13.6 13.7
13.8
14
Detecting power supplies..................................................................................................138 Inhibit, Degrade, and Fail hardware signals .....................................................................138 Inhibiting power supplies ..................................................................................................138 Precautions for inhibiting power supplies .........................................................................138 Inhibiting Power Supplies .................................................................................................139 Enabling power supplies...................................................................................................139 Power Supply Slot Sensor ................................................................................................139 13.7.1 CMM interface for Power Supply Slot sensor ......................................................139 13.7.2 Power Supply Slot sensor SEL events ................................................................140 13.7.3 Power Supply Slot Sensor Health Events............................................................140 13.7.4 Power Supply Slot Sensor SNMP Trap ...............................................................141 13.7.5 Power Supply Slot Sensor ...................................................................................142 IPMI power supplies .........................................................................................................142 14.0.1 Adding a Drone Mode Capable SBC ...................................................................144 14.0.2 Removing a Drone Mode Capable SBC ..............................................................144
Drone Mode SBC Support ...............................................................................................................144
15
Active and Offline Slots ...................................................................................................................146
15.1 15.2 15.3 15.4
16
Determining the State of a Slot.........................................................................................146 Setting a Slot to Offline Mode ...........................................................................................146 Setting a Slot to Active Mode............................................................................................146 Limitations of the Offline Mode .........................................................................................146 FRU Information ...............................................................................................................148 FRU Query Syntax............................................................................................................148 FRU Data Output Format..................................................................................................149 Setting Fan Speed ............................................................................................................150 17.1.1 Considerations for Turning Off the Fans..............................................................150 Automatic Fan Control ......................................................................................................150 Querying Fan Tray Sensors..............................................................................................151 CMM MIB..........................................................................................................................153 MIB Design .......................................................................................................................153 18.2.1 MIB Tree ..............................................................................................................153 18.2.2 CMM MIB Objects................................................................................................154 18.2.3 System location MIB objects................................................................................155 18.2.4 MIB Usage ...........................................................................................................167 18.2.5 Querying non-IPMI compliant blades and power supplies...................................167 SNMP agent .....................................................................................................................168 18.3.1 Configuring the SNMP agent port ........................................................................168
Obtaining FRU Information ..............................................................................................................148
16.1 16.2 16.3
17
Fan Control and Monitoring .............................................................................................................150
17.1 17.2 17.3
18
Simple Network Management Protocol .............................................................................................152
18.1 18.2
18.3
Software Technical Product Specification
7
18.4
18.5
18.6 18.7
19
18.3.2 Configuring the agent to respond to SNMP v3 requests ..................................... 168 18.3.3 Configuring the agent back to SNMP v1.............................................................. 168 18.3.4 Setting up an SNMP v3 MIB browser .................................................................. 168 SNMP trap utility ............................................................................................................... 169 18.4.1 Configuring the SNMP trap port........................................................................... 169 18.4.2 Configuring the CMM to send SNMP v3 traps..................................................... 169 18.4.3 Configuring the CMM to send SNMP v1 traps..................................................... 169 Configuring and enabling SNMP trap addresses.............................................................. 169 18.5.1 Configuring an SNMP trap address ..................................................................... 170 18.5.2 Enabling and disabling SNMP traps .................................................................... 170 18.5.3 Alerts using SNMP v3.......................................................................................... 170 SNMP v3 Security: Authentication Protocol and Privacy Protocol.................................... 170 snmpd.conf ....................................................................................................................... 171 Setting Up the RPC Interface ........................................................................................... 172 Using the RPC Interface ................................................................................................... 173 19.2.1 GetAuthCapability() ............................................................................................. 173 19.2.2 ChassisManagementApi() ................................................................................... 174 19.2.3 ChassisManagementApi() Threshold Response Format..................................... 182 19.2.4 ChassisManagementApi() String Response Format ........................................... 182 19.2.5 ChassisManagementApi() Integer Response Format.......................................... 186 19.2.6 FRU String Response Format ............................................................................. 187 RPC Sample Code ........................................................................................................... 188 RPC Usage Examples ...................................................................................................... 188 System Details.................................................................................................................. 192 Startup and Shutdown Scripts .......................................................................................... 192 System Resources Available to User Applications ........................................................... 192 20.3.1 File System Storage Constraints ......................................................................... 192 20.3.2 RAM Constraints.................................................................................................. 193 20.3.3 Interrupt Constraints ............................................................................................ 193 Ram Disk Directory Structure ........................................................................................... 193 CLI scripting...................................................................................................................... 196 21.1.1 Script Synchronization ......................................................................................... 196 Event scripting .................................................................................................................. 197 21.2.1 Triggering scripts from health events................................................................... 197 21.2.2 Triggering scripts from event codes..................................................................... 198 21.2.3 Triggering scripts from slot events....................................................................... 198 21.2.4 Listing scripts associated with events.................................................................. 199 21.2.5 Disassociating scripts from an event ................................................................... 199 Environment variables ...................................................................................................... 199 Error processing and messages ....................................................................................... 200 21.4.1 Invalid pathname ................................................................................................. 200 21.4.2 Script does not exist ............................................................................................ 201 21.4.3 Pathname specified is a directory........................................................................ 201 21.4.4 Moved or removed script still associated with event............................................ 201 21.4.5 Script has zero bytes ........................................................................................... 201
Remote Procedural Calls................................................................................................................. 172
19.1 19.2
19.3 19.4
20
Application Hosting ......................................................................................................................... 192
20.1 20.2 20.3
20.4
21
CMM Scripting ............................................................................................................................... 196
21.1 21.2
21.3 21.4
8
ZT 7102 Chassis Management Module
21.5
22
21.4.6 Script lacks execute permission...........................................................................202 21.4.7 Script is on the standby CMM ..............................................................................202 Unable to write to actionscripts.cfg ...................................................................................202 Command Logging ...........................................................................................................204 Error Logging ....................................................................................................................204 22.2.1 error.log ...............................................................................................................204 22.2.2 debug.log .............................................................................................................204 cmmdump utility................................................................................................................205 Kernel crash logging .........................................................................................................205 22.4.1 Kinds of data logged ............................................................................................205 22.4.2 Accessing logged data.........................................................................................205 22.4.3 Sample log file .....................................................................................................206 logger command ...............................................................................................................208 22.5.1 Syntax and semantics..........................................................................................208 22.5.2 Customized logging .............................................................................................209 22.5.3 Log rotation ..........................................................................................................209 22.5.4 Restarting syslogd ...............................................................................................211 22.5.5 Caveats and limitations........................................................................................211 Configuring the Telco Alarm Connector Pins....................................................................212 Obtaining the Configuration of the Telco Alarm Sensor Connector Pins ..........................212 Telco Alarm Connector Sensor Naming ...........................................................................212 23.3.1 Sensor Names using SNMP ................................................................................213 Telco Alarm Sensors and Redundancy ............................................................................213 Key Features of the Firmware Update Process ................................................................214 Update Process Architecture ............................................................................................214 Critical Software Update Files and Directories .................................................................215 Update package................................................................................................................216 24.4.1 Update Package File Validation...........................................................................216 24.4.2 Update Firmware Package Version .....................................................................217 24.4.3 Component Versioning ........................................................................................217 SaveList and Data Preservation .......................................................................................217 24.5.1 Changes from earlier versions .............................................................................218 Update Mode ....................................................................................................................219 Update_Metadata File ......................................................................................................219 Firmware Update Synchronization/Failover Support ........................................................219 Synchronized files that are also on the saveList...............................................................220 Configuring automatic or manual failover .........................................................................221 24.10.1 Setting Failover Configuration Flag .....................................................................221 24.10.2 Retrieving the Failover Configuration Flag...........................................................222 Single CMM System .........................................................................................................222 Redundant CMM Systems................................................................................................222 CLI Firmware Update Procedure ......................................................................................223 Hooks for User Scripts......................................................................................................224 24.14.1 Update Mode User Scripts...................................................................................224 24.14.2 Data Restore User Scripts ...................................................................................224
Command and Error Logging ...........................................................................................................204
22.1 22.2
22.3 22.4
22.5
23
Telco Alarm Sensors .......................................................................................................................212
23.1 23.2 23.3 23.4
24
Updating CMM Firmware .................................................................................................................214
24.1 24.2 24.3 24.4
24.5 24.6 24.7 24.8 24.9 24.10
24.11 24.12 24.13 24.14
Software Technical Product Specification
9
24.15 24.16 24.17 24.18 24.19 24.20
24.14.3 Examples ............................................................................................................. 225 Update Process ................................................................................................................ 226 Update Process Status and Logging ................................................................................ 227 Collecting debugging information ..................................................................................... 227 Update process status and logging .................................................................................. 227 Update Process Sensor and SEL Events ......................................................................... 228 RedBoot* Monitor Update Process................................................................................... 228 24.20.1 Required Setup.................................................................................................... 228 24.20.2 Update Procedure................................................................................................ 228 Overview........................................................................................................................... 230 FRU update architecture................................................................................................... 230 FRU update process......................................................................................................... 231 FRU recovery process ...................................................................................................... 231 FRU verification ................................................................................................................ 231 FRU display ...................................................................................................................... 232 FRU update command line interface ................................................................................ 232 Setting the library path and invoking the utility ................................................................. 234 Using the location option .................................................................................................. 235 Updating the FRU ............................................................................................................. 235 Retrieving the inventory .................................................................................................... 235 Viewing the contents of the FRU ...................................................................................... 235 Writing the contents of the FRU........................................................................................ 236 Dumping the contents of the FRU .................................................................................... 236 Installing a chassis FRU file with fruUpdate ..................................................................... 236 Configuration File Format ................................................................................................. 238 26.1.1 File Format........................................................................................................... 238 26.1.2 String Constraints ................................................................................................ 238 26.1.3 Numeric Constraints ............................................................................................ 239 26.1.4 Tags..................................................................................................................... 239 26.1.5 Control Commands.............................................................................................. 239 26.1.6 Probing Commands ............................................................................................. 241 26.1.7 Update Commands ............................................................................................. 244 Input of Data ..................................................................................................................... 250 26.2.1 Display Commands.............................................................................................. 251 26.2.2 Input Commands ................................................................................................. 251 26.2.3 Command Quick Reference ................................................................................ 253 Examples .......................................................................................................................... 256 Overview........................................................................................................................... 258 Command Syntax and Interface ....................................................................................... 258 27.2.1 Command Request String Format ....................................................................... 258 27.2.2 Response String .................................................................................................. 259 Usage examples ............................................................................................................... 259 27.3.1 Using the CLI ....................................................................................................... 259 Using SNMP ..................................................................................................................... 259
25
FRU Update Utility .......................................................................................................................... 230
25.1 25.2 25.3 25.4 25.5 25.6 25.7 25.8 25.9 25.10 25.11 25.12 25.13 25.14 25.15
26
FRU Update Configuration File ........................................................................................................ 238
26.1
26.2
26.3
27
IPMI Pass-Through ......................................................................................................................... 258
27.1 27.2
27.3 27.4
10
ZT 7102 Chassis Management Module
28
Third Party Chassis Integration ........................................................................................................262