Author |
SCSI disk going offline |
pfau
Member
Posts: 33
Location: North Brunswick, NJ
Joined: 12.03.08 |
Posted on August 09 2011 14:22 |
|
|
One of my SCSI drives keeps going offline. When I connect to the system I find the drive in mount verification timeout. I can dismount the drive but any attempt to do anything with it results in "medium is offline", including attempting to access it with rztools. There are no errors logged against the device.
The disk comes back if I reboot the system but after having a few failures a couple of weeks apart it has now failed twice today.
Does anyone have any ideas of how I might be able to figure out why it's going offline? |
|
Author |
RE: SCSI disk going offline |
pfau
Member
Posts: 33
Location: North Brunswick, NJ
Joined: 12.03.08 |
Posted on August 09 2011 16:01 |
|
|
I may have figured it out. I opened the system to check connections and ended up pulling the drives out. The top of this particular drive was very hot. I moved it to another bay where it could get some more airflow across it. I did some searching on Google and it appears page 13 (0x0d) of the log sense data may be temperature data. If that's true, the drive is operating very close to its cutoff temperature. I put a fan in front of the system where it will draw air out of the case across the top of the drive. According to rztools, it's cooling off. |
|
Author |
RE: SCSI disk going offline |
pfau
Member
Posts: 33
Location: North Brunswick, NJ
Joined: 12.03.08 |
Posted on August 10 2011 07:54 |
|
|
Well, not quite fixed. I just checked on the system and the disk is down again. |
|
Author |
RE: SCSI disk going offline |
somersdave
Member
Posts: 67
Location: bristol,UK
Joined: 23.03.07 |
Posted on August 11 2011 06:46 |
|
|
I believe it's quite warm in NE USA at the moment
Someone with lots of Alpha experience told me to fit 'blanking panels' (don't know the correct term) over the unused disk drive slots ( for the 'bricks' ) in order to maintain air flow, rather than blowing it straight out of the machine.
I've been running my Alpha4000 continuously at about 380 W ( without monitor ) and the system temp. gets to a maximum of about 30 deg. C. (system shutdown 51 deg. C) The 4100 (4 CPU) takes nearly 800 watts.
Edited by somersdave on August 11 2011 06:48 |
|
Author |
RE: SCSI disk going offline |
pfau
Member
Posts: 33
Location: North Brunswick, NJ
Joined: 12.03.08 |
Posted on August 11 2011 16:42 |
|
|
My system is a PWS 600au. The disks are mounted internally, not in an expansion cabinet. The box doesn't have a whole lot of airflow through it. There's a fan on the processor that blows air across the CPU and out the front. There's another fan in the power supply. I think that's it. I have to try to get some more fans in the case. There's not much air flow across the drives at all. |
|
Author |
RE: SCSI disk going offline |
somersdave
Member
Posts: 67
Location: bristol,UK
Joined: 23.03.07 |
Posted on August 12 2011 06:44 |
|
|
This MIGHT be of of use when monitoring some systems - I found more info. was returned when logged in as SYSTEM. Found on the web courtesy of one Scott Belviso.
$ ty enviro.com
$! Program: ENV_CHECK.COM
$!
$! Purpose: Gathers and displays the state of the internal
$! Fans/Temperature/Thermal/Power Supplies of the system. Not all systems
$! are capable of reporting this information so the output will be
$! different for each type. Some systems can't report any.
$!
$! History:
$! Scott Belviso 02/07/03 - Original Creation
$!
$! Paramaters:
$! none
$!
$! Run instructions:
$! @env_check
$!
$!
$ thermal_ctr = 0
$ thermal_size = 2
$ thermal_length = 32
$ fan_ctr = 0
$ fan_size = 2
$ fan_length = 32
$ temp_ctr = 0
$ temp_size = 2
$ temp_length = 32
$ power_ctr = 0
$ power_size = 2
$ power_length = 32
$ tv = f$getsyi("thermal_vector"
$ fv = f$getsyi("fan_vector"
$ temp_v = f$getsyi("temperature_vector"
$ pv = f$getsyi("power_vector"
$!
$! Main
$!
$main:
$ gosub thermal_loop
$ gosub fan_loop
$ gosub temp_loop
$ gosub power_loop
$ goto done
$!
$! Begin subroutines
$!
$thermal_loop:
$ thermal_ctr = thermal_ctr + 1
$ if thermal_ctr * thermal_size .gt. thermal_length then return
$ thermal'thermal_ctr = -
f$extract(thermal_length - (thermal_size * thermal_ctr),thermal_size,tv)
$ if thermal'thermal_ctr .eqs. "01" -
then write sys$output "Thermal ''thermal_ctr' is Good"
$ if thermal'thermal_ctr .eqs. "00" -
then write sys$output "Thermal ''thermal_ctr' is BAD"
$! if thermal'thermal_ctr .eqs. "FF" -
$! then write sys$output "Thermal ''thermal_ctr' is Not Present"
$ goto thermal_loop
$!
$fan_loop:
$ fan_ctr = fan_ctr + 1
$ if fan_ctr * fan_size .gt. fan_length then return
$ fan'fan_ctr = f$extract(fan_length - (fan_size * fan_ctr),fan_size,fv)
$ if fan'fan_ctr .eqs. "01" -
then write sys$output "FAN ''fan_ctr' is Good"
$ if fan'fan_ctr .eqs. "00" -
then write sys$output "FAN ''fan_ctr' is BAD"
$! if fan'fan_ctr .eqs. "FF" -
$! then write sys$output "FAN ''fan_ctr' is Not Present"
$ goto fan_loop
$!
$temp_loop:
$ temp_ctr = temp_ctr + 1
$ if temp_ctr * temp_size .gt. temp_length then return
$ temp'temp_ctr = -
f$extract(temp_length - (temp_size * temp_ctr),temp_size,temp_v)
$ if temp'temp_ctr .nes. "FF"
$ then
$ actual_temp = temp'temp_ctr
$ actual_temp = %x'actual_temp
$ write sys$output "Temp ''temp_ctr' is ''actual_temp' Celsius"
$ endif
$ goto temp_loop
$!
$power_loop:
$ power_ctr = power_ctr + 1
$ if power_ctr * power_size .gt. power_length then return
$ power'power_ctr = -
f$extract(power_length - (power_size * power_ctr),power_size,pv)
$ if power'power_ctr .eqs. "01" -
then write sys$output "Power Supply ''power_ctr' is Good"
$ if power'power_ctr .eqs. "00" -
then write sys$output "Power Supply ''power_ctr' is BAD"
$! if power'power_ctr .eqs. "FF" -
$! then write sys$output "Power Supply ''power_ctr' is Not Present"
$ goto power_loop
$!
$done:
$ exit
output for the alpha 4000
$ @enviro
Thermal 1 is Good
FAN 1 is Good
FAN 2 is Good
Temp 1 is 24 Celsius
Power Supply 1 is Good
Power Supply 3 is Good
$ |
|