sundot home page unix consulting in the uk and mainland europe (archives)
home | about sundot | unix | downloads | archives›date | archives›subject | external links

Storage Automated Diagnostic Environment replaces StorTools - a step forward in testing of SAN/Fibre attached Sun hardware.

posted by hal 20030529 (archived) | permalink | path | initial version: 20030529

After struggling with StorTools last year I recently had the need to test an a5000. It seems like the old StorTools package has been decommisioned and instead we've got the Storage Automated Diagnostic Environment (version 2.2 released last week and now downloadable (registration required) from the sun download/sysadmin area).

The package installs in /opt/SUNWstade and just like SyMon and SunVTS is a bit bloated. Recent devices like fibre channel tape drives, V880 disks and backplane are supported. Sun seems to think that we want to manage all things at all times from anywhere (not a bad idea) - so by default there is a web server listening on port 7654 (login: ras/agent), you can set up your storage to be diagnosed/monitored in a NIS like master/slave/client style. Installation is straightforward (but the web interface is unusable until you have put in the site information). Your /etc/inet/services, /etc/inet/inetd.conf and /var/spool/cron/crontabs/root will be modified by the (master) install.

The web interface gives you great control over the test options (ie you can select individual disks in the A5000-Disk test) and is just as usable as the command line alternative. Use the verbose option when accessing the individual tests from the command line (as some tests takes a long time and to give you an idea of what is going on behind the scenes). Here's some sample output testing an A5200:

freya# pwd
/opt/SUNWstade/Diags/bin
freya#
freya# ./a5ksestest -v -o dev=/dev/es/ses0
"a5ksestest:  called with options: dev=/dev/es/ses0"
"a5ksestest: Started."
"Started test on /dev/es/ses0"
"Box Name = freysen00"
"SES physical path: /devices/sbus@2,0/SUNW,socal@d,10000/sf@0,0/ses@w5080020000026901,0:0"
"Power Supplies (0,2 in front, 1 in rear)"
"Power Supply 0 is OK"
"Power Supply 1 is OK"
"Power Supply 2 is OK"
"Fans (0 in front, 1 in rear)"
"Fan 0 detected to be OK"
"Fan 1 detected to be OK"
"ESI Interface board(IB) (A top, B bottom)"
"Interface Board A: OK()"
"GBIC module (1 on left, 0 on right in IB)"
"Interface Board A: GBIC Module: 0 OK(mod.-05)"
"Interface Board A: GBIC Module: 1 NOT INSTALLED"
"Interface Board B: OK()"
"GBIC module (1 on left, 0 on right in IB)"
"Interface Board B: GBIC Module: 0 NOT INSTALLED"
"Interface Board B: GBIC Module: 1 OK(mod.-05)"
"Disk backplane (0 in front, 1 in rear)"
"Front backplane"
"Front backplane is OK"
"Temperature sensors (on front backplane)"
"Temperature sensor 0, on the front backplane is OK: 40C "
"Temperature sensor 1, on the front backplane is OK: 37C "
"Temperature sensor 2, on the front backplane is OK: 37C "
"Temperature sensor 3, on the front backplane is OK: 36C "
"Temperature sensor 4, on the front backplane is OK: 39C "
"Temperature sensor 5, on the front backplane is OK: 40C "
"Temperature sensor 6, on the front backplane is OK: 39C "
"All temperature sensors on front backplane are OK "
"Back backplane"
"Back backplane is OK"
"Temperature sensors (on rear backplane)"
"Temperature sensor 0, on the rear backplane is OK: 34C "
"Temperature sensor 1, on the rear backplane is OK: 39C "
"Temperature sensor 2, on the rear backplane is OK: 37C "
"Temperature sensor 3, on the rear backplane is OK: 37C "
"Temperature sensor 4, on the rear backplane is OK: 36C "
"Temperature sensor 5, on the rear backplane is OK: 36C "
"Temperature sensor 6, on the rear backplane is OK: 37C "
"All temperature sensors on rear backplane are OK "
"(freysen00) Interconnect assembly"
"(freysen00) Interconnect Assembly OK"
"(freysen00) Loop  configuration"
"(freysen00) Loop A is configured as a single loop"
"(freysen00) Loop B is configured as a single loop"
"Drive 0, on the front is Installed"
"Drive 0, on the front is OK"
"Node WWN for Drive 0, on the front is 20000020370c45e0 "
"Drive 0, on the back is Installed"
"Drive 0, on the back is OK"
"Node WWN for Drive 0, on the back is 20000020370c3e10 "
...
...
"Node WWN for Drive 5, on the back is 20000020370c450b "
"Drive 6, on the front is Installed"
"Drive 6, on the front is OK"
"Node WWN for Drive 6, on the front is 20000020370c4df8 "
"Drive 6, on the back is Installed"
"Drive 6, on the back is OK"
"Node WWN for Drive 6, on the back is 20000020370c3e1c "
"a5ksestest: Stopped successfully."
freya# 

freya# ./a5ktest -v -o "dev=/dev/rdsk/c0t22d0s2|fstest=disable"
"Options: dev=/dev/rdsk/c0t22d0s2|fstest=disable"
grep: can't open ../../../../var/opt/SUNWstade/DATA/WWN_MAP
"a5ktest: Started."
grep: can't open ../..//DATA/WWN_MAP
"Write/Read Device Buffer Loopback: Pattern: 0x7e7e7e7e"
"Write/Read Device Buffer Started: pattern: 0x7e7e7e7e"
"Write/Read Device Buffer Completed: 1000 passes"
"Write/Read Device Buffer Loopback: Pattern: 0x1e1e1e1e"
"Write/Read Device Buffer Started: pattern: 0x1e1e1e1e"
"Write/Read Device Buffer Completed: 1000 passes"
...
...
"Write/Read Device Buffer Loopback: Pattern: 0xfdfdfdfd"
"Write/Read Device Buffer Started: pattern: 0xfdfdfdfd"
"Write/Read Device Buffer Completed: 1000 passes"
"number of blocks 17682084"
"Testing 176820 blocks on disk"
"blk_base(),base=1,nblk=17682084"
"Start AsyncIO test from block 1 to 176821. "
"Start SyncIO test. "
"Test passed."
"End of Rawtest: /dev/rdsk/c0t22d0s2"
"Checking RDLS counts"
"RDLS counts are the same"
"a5ktest: Stopped successfully."
freya# 

Use luxadm -v display your_a5000 to find the device paths of your socal devices and then add the string ":devctl" to the path when running the socal test:/

freya# ./socaltest -v -o "dev=/devices/sbus@2,0/SUNW,socal@d,10000/sf@0,0:devctl" 
"called with options: dev=/devices/sbus@2,0/SUNW,socal@d,10000/sf@0,0:devctl"
"socaltest: Started."
"Begin socaltest on /devices/sbus@2,0/SUNW,socal@d,10000:0: board 1, slot d, port 0"
"socaltest on /devices/sbus@2,0/SUNW,socal@d,10000:0 done"
"socaltest: Stopped successfully."
freya#

A5000 (and other arrays) best practice which is hardly ever seen - no wonder why!

posted by hal 20021213 (archived) | permalink | path | initial version: 20021212

The Sun manual "Mass Storage Subsystem Best Practices" (806-1949-10) has some useful hints:

I presume the above is related to the X interface version of stortools (possibly even an old version). I could not get this to run (with any of OpenWindows, CDE or Gnome 2.0):

munin# stortools
vtsui: DirectColor is available but not default on this system.
Colormap problems may result.
X Error of failed request:  BadMatch (invalid parameter attributes)
  Major opcode of failed request:  1 (X_CreateWindow)
  Serial number of failed request:  136
  Current serial number in output stream:  142
munin#

By unsetting the DISPLAY variable (and including /opt/SUNWvtsst/bin in path) I was able to run stortools (got the best results display wise by using dtterm and making sure TERM was set to dtterm). It core dumped halfway through though - but that did not seem to affect the end result after kicking up the tty interface again (the screen actually looks much better - this is not a screenshot; just a html pre statement with a class to emulate a screen) :


***Hostname:munin~~~~Model:"Ultra*Enterprise"~~~~~SunVTS version:4.1~~~~~MP=4***
* start    reset         quit           reprobe     test_mode    tests_select  *
* grouping set_options   option_files   log_files   connect_to   intervention  *
********************************************************************************
*****************Status****************xx**************Test_Groups**************
*           System_status:idle         xx [*] StorEdge   Options               *
*System_passes:1     Total_errors:0    xx                                      *
*         Elapsed_time:000:07:01       xx                                      *
*                         Status_view  xx                                      *
* ***********************************Message********************************** *
*  (munin)Testing completed:  1 pass(es), 0 error(s)                         * *
*  press [ESC] to dismiss                                                    * *
*                                                                            * *
*  *************************************************************************** *
*    sena00_soc0_0            1      0 xx                                      *
*     a5k*ses0_s0(a5ksestest) 1      0 xx                                      *
*     a5k*ses1_s0(a5ksestest) 1      0 xx                                      *
*     c0t0d0*f0(a5ktest)      1      0 xx                                      *
********************************************************************************
*************************************Console************************************
* Connection test complete                                                     *
*                                                                              *
*                                                                              *
********************************************************************************

I have not been able to find an html or pdf version of this best practices manual on the web (sun main website and docs.sun.com) or through searches on google/alltheweb. I suspect that a lot of people have thrown this manual away or not seen it. StorTools can be downloaded from Sun. It is probably a good idea to try StorTools before you have problems instead of after ... I am not to impressed by it; crashing on a freshly patched machine and not being able to launch the X gui does not really build confidence (it has a total bugcount of 334 in SunSolve and there is no patch for 4.2). You can always try to run the tests manually and there's a pdf of the manual in the download package.


Creative Commons
 License Valid HTML 4.01! Valid CSS! Powered by Blosxom!
All content on this website is governed by a Creative Commons license.