IBM ESS800 阵列 换盘 快速维护手册v3.2

Embed Size (px)

Citation preview

IBM ESS v3.2

IBM

ESS Cluster 1/Line Cord1 Ready ON Power complete ON Message OFF

Cluster 2/Line Cord2 ON ON OFF

Power Power Power Power Power

Complete Complete Complete Complete Complete

Line Cord 1 Line Cord 2 : ESS

Messages Cluster 1 Cluster 2 Messages licensed internal code (LIC) cluster Messages cluster Local Power ESSESS cluster power offcluster power on power on cluster Unit Emergency ()

ESS ESS 1. ESS 2. ESS 3 5 3. Power Complete Power CompleteNext Level Support 4. 10 5. Unit Emergency Only OFF Bulk Power Assembly OFF

1. 2. 3.

4. 5.

ESS Bulk Power Assembly ON 3 ESS Line Cord 1 Line Cord 2 Power Complete 3 10Power Complete Next Level SupportPower Complete CLUSTER READY 25 Cluster Ready Next Level Support

ESS ESS

ESS Service Terminal 1. ThinkPad ESSNet (Windows NT 4.0 PC Adminstrator password) 1) ESS RS/6000 9 9 Cluster1 Cluster2 2 S2

S2 2) NetTerm"IBM 2105 ESS (Direct connect, IBM3151 emulation)" 3)"service" Cluster LCD 2. Master Console (RedHat Linux PC service service) 1)"ESS Terminal Selector" 2) Cluster 3)"service" Cluster LCD ESS (Problem Log) 1.ESS Problem Message Service Terminal Message Problem Close CancelMessage 1)"Repair Menu" 2)"Show / Repair Problems Needing Repair" 3) Problem 2. 7 Service Terminal Problem ExpireRepair Menu Utility 1)"Utility" 2)"Problem Log Menu" 3)"List Problems" 4)"OPEN""PENDING""EXPIRE" Problem 5) Problem ESS 1."Repair Menu" 2."End of Call Status" : 1)"The following problems are still OPEN or PENDING""None" 2)"The following resources are still quiesced""None" 3)"The following resources are still fenced""None" 4)"Pinned Data""None" 5)"DDM , Array or Rank Status""Normal" 6)"Cluster dual hard drives status""Normal"(ESS800 ) "d""q" ESS PE Password? PE L2/L3 ESS ( PE Password 7 *24168 1. ESS NET Console Master Console PE Password: 1) ESS NET Console Master Console 2) ESS Specialist 3) Communication 4)"Reset PE password" 5)"yes"

2. Service Terminal PE Password: 1)"Configuration Options Menu" 2)"Configure Communications Resources Menu" 3)"Call Home / Remote Services Menu" 4)"Enable Product Engineering Access" ESS PE Package PE package ESS RS/6000 snap L2/L3 ESS ESS 2 cluster PE Package cluster AIX A: 1."Utility" 2."Trace/State Save Menu" 3."Build PE Package and Off-load to Diskettes" 4."PE data""yes""PE Copy Services data" CopyService "yes""no" 5."y""a" AIX 1 AIX L2/L3 AIX PEpkg 6. cluster PE package B:FTP 1. ThinkPad ESS Master Console IP 172.31.1.88 ESS 2 cluster IP 2 cluster IP Machine Test MenuExternal Connections MenuCluster-Cluster Communication Test 2. ThinkPad FTP server Netterm NetFtpd 3. NetFtpd Options Define FTP Server Access Accept anyone who calls Allow anonymous access Windows Serv-U FTP server disable security 4. Master Console Console Launcher Call Home Setting 5. Dumps and Traces Use the IBM FTP Data Repository Server Use the same setup as Call Home 6.Use Passive Transfers 7. Destination server Host Name ThinkPad IP 8.User ID/passwd/Port anonymous/******/21Destination Server directory / 9. support PE Initiate PE Package or Trace/Dump RetrievalService 10. Local/Alternate cluster PE dataPhone Number 11. cluster PE Package Master Console 8

12.Master Console 9 5 Console Launcher Query Management Pending Transfer call home 13. Query Management Call home Increase Priority 14. Console Launcher Console Status Display message file Master Console ThinkPad FTP 15.ThinkPad c:\.tar.zip PE package formatter ESS Statesave Statesave ESS snapshot RS/6000 coredump L2/L3 A: 1."Utility" 2."Trace/State Save Menu" 3."Off-load - Statesave/Trace/Dump Files to Diskettes" 4."Off-load Statesave Files to Diskettes" 5. PMH Statesave ( PMH ) 6. 4~7M 4~6 7. Cluster B:FTP PE package Initiate PE Package or Trace/Dump Retrieval PE data Local/Alternate cluster dump/trace files to retrieve Statesave c:\ cpssdump02.05.1.zip PE package StateSave img FTP EMT for Windows start wizard I build image from A: image P-Series ThinkPad P-Series :

dd if=/dev/rfd0 of=/tmp/xxxxxx.img 1 image 2 image PE package StateSave 3 PE package StateSave ESFSC4 ip 9.189.71.208cufsc esfsc/e/DASD_LOG/XXXXX000672/XXXXX PMH 4 cluster PE package StateSave PE PackagePE_CL1_yymmdd.imgPE_CL2_yymmdd.img SatetSaveSS_CL1_yymmdd.imgSS_CL2_yymmdd.img

PE PackageFTP PEPKG 1. PE Package *.img copy PEPKG 2. pepkg.exe xxxxx.img /o 3. PE.tar.Z \XXXXXXX.CLY XXXXXXX Y Cluster \7525660.CL2\ PE Package 1. PE Package *.img copy PEPKG CLL_1.img CLL_2.img 2. CLL_2.img CLL_1.002 CLL_1 002 3. pepkg.exe CLL_1.img /opepkg.exe CLL_1.002 PE.tar.Z 2.4.*.* ESS 800 PE Package P-Series AIX 1. AIX restore f /tmp/*.img*.img CLL.img 2. mount volume 1 3. PE.tar.zip 2.4.*.* ESS 800 PE

Package PE.tar FTP PC PE.tar.zip P-Series PE Package 1. *.img FTP Cluster 2 PE pkg CLL_1.img CLL_2.img 2. restore f /datadump/pepkg/CLL.imgCLL.img 3. mount volume 1 Login/Telnet Session 2 cp cp CLL_1.img CLL.img 4. Session 1 mount volume 2 Login/Telnet Session 2 cp CLL_2.img CLL.img 5. PE.tar.zip 2.4.*.* ESS 800 PE Package PE.tar FTP PC PE.tar.zip Cluster PE Package Shark PE Formatter package Shark PE Formatter *.tar.Z *.tar.zip PE ESS modem 1. ESS Net Modem Expender ESS E20/F20 9600,8,N,1

Thinkpad modem

Ctrl+E Ctrl E Caps Lock e CtrlShifte APS >/C 1 /C 2 ESS cluster1 cluster2 service terminal 2. Master Console MSA ESS F20 800 Netterm Netterm 9600,8,N,1

modem , Deskport modem: dsq2m Multitech modem: dsq2mkl Redhat Linux User : remote Password : rem2enc

1

ESS Cluster

PMH Health check open ESS PMH addtxt FI 0000(FI form insert FA FA )

Health Check ESS

Line1 ESS Line2 Line345 Cluster12 Line6 Line7 PE package Line8PE password Line9 copy service Line10 ESS Line12134 host bay 4 adapter PE PE Package ESS PMH FA 1661 1661

xSubsystem Impact ESS F11 submit F11

PMH Queue L1.5 STGTSG80K L2 ESDASD672 L3 2105PE680 ESS Call Home Query CMOSCN80K ESS PMH Query STGTSG 80K TSG review ESS Hostname: esfsc4.toyosu.japan.ibm.com IP: 9.189.71.208 /: cufsc/esfsc : e/DASD_LOG/xxxxx000672/ e/DASD_LOG/11882000672/

xxxxx PMH

v1.2.1

repair menu Show / Repair Problems Needing Repair CE G5 +2 1.5.2.x 1. CE 20 2.5 7 10 2. 4 CE onsite AIX IO cluster 1. Repair Menu 5 2.Repair/VerifyDDM 10 (Make resources not available for customer use.) 3.Format/Resume DDM 4 Model/Machine Type FRU

1. Repair Menu 5 Main Service Menu Move cursor to desired item and press Enter. Repair Menu Install/Remove Menu Configuration Options Menu Licensed Internal Code Maintenance Menu Machine Test Menu Utility Menu

Repair Menu Move cursor to desired item and press Enter. Show / Repair Problems Needing Repair Replace a FRU Repair / Verify DDM(s) Format / Resume DDM(s) Show Result of DDM Format / Resume Operation Alternate Cluster Repair Menu Close a Previously Repaired Problem End of Call Status cluster 1 cluster2 Repair Menu Move cursor to desired item and press Enter. Select a Problem to View or Repair Move cursor to desired item and press Enter. Use arrow keys to scroll. # Note: See MAP 1200: "Prioritizing Symptoms and Problems" # in the isolation chapter of the service guide if # more than one problem log is listed below. # # CLUSTER BAY 1 PROBLEMS: # ID ESC SRN Date Time Problem Description # mm-dd-yyyy hh:mm:ss Status

# 38 E100 49501 03-24-2005 00:17:21 PENDING FRU FAILURE # CLUSTER BAY 2 PROBLEMS: # No problems were found on cluster 2 Select a FRU to replace then begin the Repair Move cursor to desired item and press Enter. Use arrow keys to scroll. [TOP] # Problem ID ............. = 38 # ESC .................... = E100 # SRN .................... = 49501 # Problem Status ......... = PENDING # Description ............ = FRU FAILURE # First Occurrence ....... = Wed Mar 23 08:29:07 2005 # Last Occurrence ........ = Thu Mar 24 00:17:21 2005 # Reporting Unit ......... = 2105-800 75-28268 # # Possible FRUs to replace: # Engineering FRU Likely FRU Location and/or # FRU Name Name to fix FRU Error Code #--------------------------------------------------------------------rsDDM0906 72.8GB 15K DDM 100% R1-U2-W4-D6 # # ESC E100: The description for this ESC varies depending on other # factors. Consult the Service Guide for details # # Action: Repair this problem by replacing one or more FRUs in the # above list. # # There is no MAP specified for this problem. # #--------------------------------------------------------------------# Additional Information for the FRUs listed above: # # Engineering FRU Name ... = rsDDM0906 # Part Number .......... = 18P5441 cluster cluster cluster 1 cluster2

repair menu 1. cluster 2. cluster 3. cluster cluster ssa subsystem 4.

: E100 49501 02-04-2005 17:17:49 PENDING FRU FAILURE EB00 31000 02-04-2005 17:17:49 PENDING SSA subsystem detec (EB00 Description SSA subsystem 100% 50% case,) SSA : E291 49501

02-03-2005 18:13:13 PENDING SSA subsystem detec

rsDDM0906 72.8G 15K DDM 100% R1-U2-W4-D6 location ( location code SSA location code.) Part Number .......... = 18P5441 FRU FRU FRU P/N TYPE SPEED 18P6143 18.2 GB 10K RPM 18P5162 18.2 GB 15K RPM Model 800 only 18P6144 36.4 GB 10K RPM 18P5164 36.4 GB 15K RPM 18P6145 72.8 GB 10K RPM 17P6311 72.8 GB 15K RPM 18P6146 145.6 GB 10K RPM

Model 800 only

Model 800 only Model 800 only

2Repair/Verify DDM 10 repair menu Repair / Verify DDM(s) Repair Menu Move cursor to desired item and press Enter.

Show / Repair Problems Needing Repair Replace a FRU Repair / Verify DDM(s) Format / Resume DDM(s) Show Result of DDM Format / Resume Operation Alternate Cluster Repair Menu Close a Previously Repaired Problem End of Call Status State fail DDM Repair Menu Select the DDMs you would like to repair Move cursor to desired item and press F7. ONE OR MORE items can be selected. Press Enter AFTER making all selections. [TOP] # NOTES: # 1. No more than one DDM on the same loop may be selected # 2. DDM's needing repair have DDM state = Fail # 3. All Failing DDM's may NOT be listed here. Please reference # the problem logs and Map 3149. # 4. If a listed DDM is not selectable then use the problem log # to repair other problems on the SSA loop before returning. .() ?[MORE...14] ? ? rsDDM0505 Spare Violet 72.8 15000 R1-U1-W8-D5 ? rsDDM0605 Spare Violet 72.8 15000 R1-U1-W7-D5 ? rsDDM0702 Spare Blue 72.8 15000 R1-U1-W6-D2 ? rsDDM0705 Spare Blue 72.8 15000 R1-U1-W6-D5 ? rsDDM0906 Fail Red 72.8 15000 R1-U2-W4-D6 ? rsDDM1005 Spare Red 72.8 15000 R1-U2-W3-D5 ? rsDDM1105 Spare Orange 72.8 15000 R1-U2-W2-D5 ? rsDDM1205 Spare Orange 72.8 15000 R1-U2-W1-D5 .() # * DDM has previously been repaired and needs to # be formatted and/or repaired. F7 fail 1 Repair Menu

? ? ? ? ? ? ? ?

Mo ? Take the following resources away from customer use ? ? ? ?Move cursor to desired item and press Enter. ? ? ? ?[TOP] ? ? # The following resources are needed for the repair. ? ? # Ensure that the customer has performed the appropriate actions to ? ? # make the following resources unavailable ? ? # ? ? # --- You have asked to Quiesce the following resources: ? ? # ? ? # SSA Disk Drive Module R1-U2-W4-D6 rsDDM0906 ? ? # ? ? # --- This requires that Service Mode be set for the following: ? ? # ? ? # SSA Disk Drive Module R1-U2-W4-D6 rsDDM0906 ? ? # ? ? # --- Which will cause the host systems to lose access to: ? ? # Access will not be lost to any additional resources. ? ? # ? Make resources not available for customer use. [BOTTOM] Make resources not available for customer use

Repair Menu Mo ? Multiple DDM Repair ? ? ? ?Move cursor to desired item and press Enter. ? ? ? ?[TOP] ? ? # The following DDMs were successfully quiesced and should be ? ? # physically replaced. ? ? # ? ? # See the "FRU Removal and Replacement Procedures" ? ? # chapter in the Service Guide to replace the following ? ? # DDM(s) now: ?

? ? ? ?

# ? # DDM DDM Loop Size ? # Name State Name (GB) RPM Location ? # rsDDM0906 xxxx Red 72.8 15000 R1-U2-W4-D6 ? # Do not press continue until all DDMs have been physically replaced. # # To avoid causing additional damage to the DDM(s) being removed, # do the following: # ? # ? ? # 1. Unlatch the DDM(s) ? ? # 2. Wait a minimum of 3 seconds for the DDM(s) to stop spinning ? ? # 3. Remove the DDM(s) from the DDM Bay ? ? # ? ? # Note: Additional damage to a returned DDM can have a negative ? ? # effect on failure analysis and the warranty recovery costs. ? ? # ? Continue Repair location code location code DDM CHECK [ TopGun 3 3 failure analysis state xxxx Continue Repair Verification tests in progress This may take approximately 30 minutes to complete Testing has started ... Start checking configuration (ssa001 ssa101 1 rsDDM0906)... Querying warmstart & reset counts. This could take up to 6 minutes.... Updating ODM Queueing up to configure ssa001. This could take up to 30 minutes.... Configuring ssa001 Getting list of drives and their hop counts Checking drive capacities in the loop Verifying drives and loop connections Running drive diagnostics ................................. Checking DDM capacities, RPMs, and data rates

Checking for power/fan faults Verifying drives and loop connections Getting list of drives and their hop counts ............ ? Verification Results ? ? ? ?Move cursor to desired item and press F7. Use arrow keys to scroll. ? ? ONE OR MORE items can be selected. ? ?Press Enter AFTER making all selections. ? ? ? ?[MORE...6] # The following DDMs were successfully repaired. Use the # Format/Resume DDM(s) to format and # resume the DDM(s). # DDM DDM Loop Size ? ? # Name State Name (GB) RPM Location ? ? # rsDDM0906 Free Red 72.8 15000 R1-U2-W4-D6 ? ? # ? ? # This repair is complete and the problem status has been changed to ? ? # Closed for the following problem(s). Please select the problems tha ? ? # you would like to close the PMH for in RETAIN: ? ? 38 ? NONE [BOTTOM] PMH NONE COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. [TOP] Add comments to be sent in the Call Home (y/n) : call home yes or no 1 This repair is complete. /usr/lpp/searas/bin/rsCHEOR: Outgoing Call Home records are disabled on this ma. Cannot create End of Repair Call Home.

ending /usr/lpp/searas/bin/rsCHEOR -h rsCurLog020405-173525.history [Fri ] None [BOTTOM] F3 repair menu Show / Repair Problems Needing Repair cluster close

3Format/Resume DDM 4 DDM Format/Initialize/Certify/Resume CE 4 15

Repair Menu Move cursor to desired item and press Enter. Show / Repair Problems Needing Repair Replace a FRU Repair / Verify DDM(s) Format / Resume DDM(s) Show Result of DDM Format / Resume Operation Alternate Cluster Repair Menu Close a Previously Repaired Problem End of Call Status

Format/Resume DDMs ? ? ?Move cursor to desired item and press F7. ? ? ONE OR MORE items can be selected. ? ?Press Enter AFTER making all selections. ? ? ? ? # The following DDMs need to be formatted/resumed. Select one or more ? ? # DDMs to format/resume. ? ? # ? ? #Name Location Description Status ? ? rsDDM0906 R1-U2-W4-D6 SSA Disk Drive Module 2 Ready to format Format/Resume 1 DDMs listed above

?

F7 format Format/Resume 1 DDMs listed above

Confirm the selected DDMs to format/resume Move cursor to desired item and press Enter. #Name Location Description Status # rsDDM0906 R1-U2-W4-D6 SSA Disk Drive Module 2 Ready to format Start format/resume operation # # Note: You will be logged out only if a format operation is started # for a selected DDM. Start format/resume operation : COMMAND STATUS Command: running stdout: yes stderr: no

Before command completion, additional instructions may appear below. DDMs to be resumed: none DDMs to be changed from 'failed' status to 'ready to format' status: none DDMs to be formatted: rsDDM0906 Format operation has started. You will be logged off in 20 seconds. CE say Good-Bye, log off logoff format, format format service terminal WEB , format

ENTER PASSWORD DISPLAYED ON 2105 CLUSTER OP-PANEL service's Password: Logging in... There are background processes running. The following processes must complete before any service action is allowed... Select a Process to Show Status

Move cursor to desired item and press Enter. Use arrow keys to scroll. # One or more DDM Format/Certify processes are in progress. # Select a process (ID) to view its status summary. # # NOTE: Format/Certify process(es) must end before service can continu # #ID Cluster Operation Quantity Status Start Dat # Bay 26628 1 Format/Initialize/Certify 2 DDMs Running Fri Feb # ###

NOTE: Estimated times with no system activity. 1 DDM, times in minutes 384 DDMs, times in hours ========================= ========================= Capacity Format Initialize Certify Format Initialize Certify 36GB 30 35 35 1 5 7 Start Format/Initialize/Certify operation for 2 DDM(s) ... (ID=26628) Total Format Initialize Certify Total Elapsed Time Started Started/ Started/ Started/ Passed/ in Minutes Passed Passed Passed Failed 1 1/1 1/1 1/0 0/0 1.0 DDM Format/Initialize/Certify operation (ID=26628) is still running Press Enter to update the screen, or enter Q to quit FRU: 18P6143 18G 10Krpm 9 FRU: 18P6144/18P5164 36G 10K/15Krpm 16 FRU: 18P6145/17P6311 72G 10K/15Krpm 30 FRU: 18P61456 146G 10Krpm 50 Service Used Part 70 120

Total Passed/ failed 1/1 DDM Format/Initialize/Certify operation has ended (0) Parent process (ID=23722) is still running DDM Format/Initialize/Certify operation has ended (0). format Repair Menu Move cursor to desired item and press Enter. Show / Repair Problems Needing Repair Replace a FRU Repair / Verify DDM(s) Format / Resume DDM(s) Show Result of DDM Format / Resume Operation Alternate Cluster Repair Menu Close a Previously Repaired Problem End of Call Status Tue Jul 12 19:22:00 TAIST 2005 - Starting format and resume previously repaired DDMs ... DDMs to be formatted and resumed: rsDDM2808 NOTE: Estimated times with no system activity. 1 DDM, times in minutes 384 DDMs, times in hours ========================= ========================= Capacity Format Initialize Certify Format Initialize Certify 73GB 30 50 35 1 10 9 Start Format/Initialize/Certify operation for 1 DDM(s) ... (ID=18420) Total Format Initialize Certify Total Elapsed Time Started Started/ Started/ Started/ Passed/ in Minutes Passed Passed Passed Failed 1 1/1 1/1 1/1 1/0 29.0 DDM Format/Initialize/Certify operation has ended (0) Tue Jul 12 19:51:48 TAIST 2005 - The following resources have problem opened du

Tue Jul 12 19:51:48 TAIST 2005 none DDMs to be resumed: rsDDM2808 - Successful Tue Jul 12 19:52:12 TAIST 2005 - Format/Resume operation ended successfully. format Result for DDM Format/Resume operation can not be found.