The crashes are random - kernel fault - invalid memory address - or just a console freeze.
Its not predictable and can happen anytime. Today the crash happened twice: first when nothing was going on. Second time a FTP file transfer was in progress. Earlier we have seen the crash while there were some X Windows were open (thru Reflection X) but idle anyways.
Twice the dsk0 (root file system) was corrupted and we had to re-install Tru64. Now I am keeping a backup of all the disk images.
Posts: 63 Location: nr Heathrow, Middlesex, UK Joined: 18.03.10
Posted on August 11 2011 02:28
Hi,
Could you post the crash details please ?
- captured from the console immediately after the panic... or
- captured in /var/adm/messages after the next Tru64 boot ... or
- captured in /var/adm/crash/crash-data.X (if it saved ok) after the next Tru64 boot ...
1) I did not capture the console messages - so I will post that after next crash - whenever that is.
2) Some parts of the /var/adm/messages:
Aug 10 12:11:34 AXPES40 vmunix: No B-cache detected
Aug 10 12:11:34 AXPES40 vmunix: Alpha boot: available memory from 0xd48000 to 0x7ffe000
Aug 10 12:11:34 AXPES40 vmunix: Compaq Tru64 UNIX V5.1B (Rev. 2650); Wed Aug 10 12:08:57 IST 2011
Aug 10 12:11:34 AXPES40 vmunix: physical memory = 128.00 megabytes.
Aug 10 12:11:34 AXPES40 vmunix: available memory = 114.70 megabytes.
Aug 10 12:11:34 AXPES40 vmunix: using 454 buffers containing 3.54 megabytes of memory
Aug 10 12:11:34 AXPES40 vmunix: Firmware revision: 7.0
Aug 10 12:11:34 AXPES40 vmunix: PALcode: UNIX version 1.46
Aug 10 12:11:34 AXPES40 vmunix: AlphaServer 400 4/166
Aug 10 12:11:34 AXPES40 vmunix: DECchip 21071
Aug 10 12:11:34 AXPES40 vmunix: 82378IB (SIO) PCI/ISA Bridge
Aug 10 12:11:34 AXPES40 vmunix: pci0 (primary bus:0) at nexus
Aug 10 12:11:34 AXPES40 vmunix: Loading SIOP: script 800000, reg 82000000, data 40d4e000
Aug 10 12:11:34 AXPES40 vmunix: scsi0 at psiop0 slot 0 rad 0
Aug 10 12:11:34 AXPES40 vmunix: isa0 at pci0
Aug 10 12:11:34 AXPES40 vmunix: gpc0 at isa0
Aug 10 12:11:34 AXPES40 vmunix: gpc1 not probed
Aug 10 12:11:34 AXPES40 vmunix: ace0 at isa0
Aug 10 12:11:34 AXPES40 vmunix: ace1 at isa0
Aug 10 12:11:34 AXPES40 vmunix: lp0 at isa0
Aug 10 12:11:34 AXPES40 vmunix: tu0: DECchip 21040: Revision: 2.0
Aug 10 12:11:34 AXPES40 vmunix: tu0 at pci0 slot 11
Aug 10 12:11:34 AXPES40 vmunix: tu0: DEC TULIP (10Mbps) Ethernet Interface, hardware address: 00-50-BA-8D-B1-6B
Aug 10 12:11:34 AXPES40 vmunix: tu0: console mode: selecting 10BaseT (UTP) port: half duplex
Aug 10 12:11:34 AXPES40 vmunix: kernel console: ace0
Aug 10 12:11:34 AXPES40 vmunix: dli: configured
Aug 10 12:11:34 AXPES40 vmunix: NetRAIN configured.
Aug 10 12:11:35 AXPES40 vmunix: Random number generator configured.
Aug 10 12:11:35 AXPES40 vmunix: vm_swap_init: swap is set to eager allocation mode
Aug 10 12:11:47 AXPES40 vmunix: Environmental Monitoring Subsystem Configured.
Aug 10 12:20:55 AXPES40 vmunix: Can't find an OSF-BASE, UNIX-WORKSTATION, or UNIX-SERVER license PAK
Aug 10 12:43:45 AXPES40 vmunix: Can't find an OSF-BASE, UNIX-WORKSTATION, or UNIX-SERVER license PAK
Aug 10 12:44:38 AXPES40 vmunix: Can't find an OSF-BASE, UNIX-WORKSTATION, or UNIX-SERVER license PAK
Aug 11 11:40:05 AXPES40 vmunix: No B-cache detected
Aug 11 11:40:05 AXPES40 vmunix: Alpha boot: available memory from 0xd48000 to 0x7ffe000
Aug 11 11:40:05 AXPES40 vmunix: Compaq Tru64 UNIX V5.1B (Rev. 2650); Wed Aug 10 12:08:57 IST 2011
Aug 11 11:40:05 AXPES40 vmunix: physical memory = 128.00 megabytes.
Aug 11 11:40:05 AXPES40 vmunix: available memory = 114.70 megabytes.
Aug 11 11:40:05 AXPES40 vmunix: using 454 buffers containing 3.54 megabytes of memory
Aug 11 11:40:05 AXPES40 vmunix: Firmware revision: 7.0
Aug 11 11:40:05 AXPES40 vmunix: PALcode: UNIX version 1.46
Aug 11 11:40:05 AXPES40 vmunix: AlphaServer 400 4/166
Aug 11 11:40:05 AXPES40 vmunix: DECchip 21071
Aug 11 11:40:05 AXPES40 vmunix: 82378IB (SIO) PCI/ISA Bridge
Aug 11 11:40:05 AXPES40 vmunix: pci0 (primary bus:0) at nexus
Aug 11 11:40:05 AXPES40 vmunix: Loading SIOP: script 800000, reg 82000000, data 40d4e000
Aug 11 11:40:05 AXPES40 vmunix: scsi0 at psiop0 slot 0 rad 0
Aug 11 11:40:05 AXPES40 vmunix: isa0 at pci0
Aug 11 11:40:05 AXPES40 vmunix: gpc0 at isa0
Aug 11 11:40:05 AXPES40 vmunix: gpc1 not probed
Aug 11 11:40:05 AXPES40 vmunix: ace0 at isa0
Aug 11 11:40:05 AXPES40 vmunix: ace1 at isa0
Aug 11 11:40:05 AXPES40 vmunix: lp0 at isa0
Aug 11 11:40:05 AXPES40 vmunix: tu0: DECchip 21040: Revision: 2.0
Aug 11 11:40:05 AXPES40 vmunix: tu0 at pci0 slot 11
Aug 11 11:40:05 AXPES40 vmunix: tu0: DEC TULIP (10Mbps) Ethernet Interface, hardware address: 00-50-BA-8D-B1-6B
Aug 11 11:40:05 AXPES40 vmunix: tu0: console mode: selecting 10BaseT (UTP) port: half duplex
Aug 11 11:40:06 AXPES40 vmunix: kernel console: ace0
Aug 11 11:40:06 AXPES40 vmunix: dli: configured
Aug 11 11:40:06 AXPES40 vmunix: NetRAIN configured.
Aug 11 11:40:06 AXPES40 vmunix: Random number generator configured.
Aug 11 11:40:06 AXPES40 vmunix: vm_swap_init: swap is set to eager allocation mode
Aug 11 11:40:18 AXPES40 vmunix: Environmental Monitoring Subsystem Configured.
Aug 11 11:46:00 AXPES40 vmunix: No B-cache detected
Aug 11 11:46:01 AXPES40 vmunix: Alpha boot: available memory from 0xd48000 to 0x7ffe000
Aug 11 11:46:01 AXPES40 vmunix: Compaq Tru64 UNIX V5.1B (Rev. 2650); Wed Aug 10 12:08:57 IST 2011
Aug 11 11:46:01 AXPES40 vmunix: physical memory = 128.00 megabytes.
Aug 11 11:46:01 AXPES40 vmunix: available memory = 114.70 megabytes.
Aug 11 11:46:01 AXPES40 vmunix: using 454 buffers containing 3.54 megabytes of memory
Aug 11 11:46:01 AXPES40 vmunix: Firmware revision: 7.0
Aug 11 11:46:01 AXPES40 vmunix: PALcode: UNIX version 1.46
Aug 11 11:46:01 AXPES40 vmunix: AlphaServer 400 4/166
Aug 11 11:46:01 AXPES40 vmunix: DECchip 21071
Aug 11 11:46:01 AXPES40 vmunix: 82378IB (SIO) PCI/ISA Bridge
Aug 11 11:46:01 AXPES40 vmunix: pci0 (primary bus:0) at nexus
Aug 11 11:46:01 AXPES40 vmunix: Loading SIOP: script 800000, reg 82000000, data 40d4e000
Aug 11 11:46:01 AXPES40 vmunix: scsi0 at psiop0 slot 0 rad 0
Aug 11 11:46:01 AXPES40 vmunix: isa0 at pci0
Aug 11 11:46:01 AXPES40 vmunix: gpc0 at isa0
Aug 11 11:46:01 AXPES40 vmunix: gpc1 not probed
Aug 11 11:46:01 AXPES40 vmunix: ace0 at isa0
Aug 11 11:46:01 AXPES40 vmunix: ace1 at isa0
Aug 11 11:46:01 AXPES40 vmunix: lp0 at isa0
Aug 11 11:46:01 AXPES40 vmunix: tu0: DECchip 21040: Revision: 2.0
Aug 11 11:46:01 AXPES40 vmunix: tu0 at pci0 slot 11
Aug 11 11:46:01 AXPES40 vmunix: tu0: DEC TULIP (10Mbps) Ethernet Interface, hardware address: 00-50-BA-8D-B1-6B
Aug 11 11:46:01 AXPES40 vmunix: tu0: console mode: selecting 10BaseT (UTP) port: half duplex
Aug 11 11:46:01 AXPES40 vmunix: kernel console: ace0
Aug 11 11:46:01 AXPES40 vmunix: dli: configured
Aug 11 11:46:01 AXPES40 vmunix: NetRAIN configured.
Aug 11 11:46:01 AXPES40 vmunix: Random number generator configured.
Aug 11 11:46:01 AXPES40 vmunix: vm_swap_init: swap is set to eager allocation mode
Aug 11 11:46:13 AXPES40 vmunix: Environmental Monitoring Subsystem Configured.
Aug 11 12:03:47 AXPES40 vmunix: No B-cache detected
Aug 11 12:03:47 AXPES40 vmunix: Alpha boot: available memory from 0xd48000 to 0x7ffe000
Aug 11 12:03:47 AXPES40 vmunix: Compaq Tru64 UNIX V5.1B (Rev. 2650); Wed Aug 10 12:08:57 IST 2011
Aug 11 12:03:47 AXPES40 vmunix: physical memory = 128.00 megabytes.
Aug 11 12:03:47 AXPES40 vmunix: available memory = 114.70 megabytes.
You can see multiple boots happening - each one after a crash.
3) There no file under /var/adm/crash - will check again once there a crash again.
Posts: 63 Location: nr Heathrow, Middlesex, UK Joined: 18.03.10
Posted on August 11 2011 08:21
OK - Thanks for that update.
Since there is nothing under /var/adm/crash there will be no crash detail in /var/adm/messages either. An empty crash directory means that either there was a problem writing/reading the crash to/from the swap device, or that it wasn't a 'normal' Tru64 Panic ... So the console capture will be the only way forward. For that, please log the /dev/console tty output, or copy/paste it once(if) the next crash occurs.
The system has not crashed since yesterday. But today I got a new problem and this time I captured the console output - same thing is repeated in the /var/adm/messages too.
In the middle of a file transfer (FTP), the data disk (dsk1) became inaccessible.
Machine was alive - but /data (mount point for dsk1h) was not accessible.
ADVFS panic
===========
Console messages
================
login: live_dump: bs_frag_alloc: invalid frag group
grpHdrp->self is 948764922 and grpPg is 1072
grpHdrp->fragType is 2385653177 and fragType is 3
grpHdrp->freeFrags is -1368193776 - it should be non-zero
bs_frag_alloc: invalid frag group
AdvFS Domain Panic; Domain domain_dsk1h Id 0x4e3bc1f3.000adb50
An AdvFS domain panic has occurred due to either a metadata write error or an internal
inconsistency. This domain is being rendered inaccessible.
Please refer to guidelines in AdvFS Guide to File System Administration regarding what steps to
take to recover this domain.
12-Aug-2011 12:36:39 [600] AdvFS: An AdvFS domain panic has occurred on domain_dsk1h
saving /var/adm/crash/vmzcore.0 .................. done
AdvFS I/O error:
A read failure occurred - the AdvFS domain is inaccessible (paniced)
Domain#Fileset: domain_dsk1h#data
Mounted on: /data
Volume: /dev/disk/dsk1h
Tag: 0x00000f27.8002
Page: 0
Block: 10930176
Block count: 16
Type of operation: Read
Error: 5
EEI: 0x300 (Advfs cannot retry this)
AdvFS initiated retries: 0
Total AdvFS retries on this volume: 0
To obtain the name of the file on which
the error occurred, type the command:
/sbin/advfs/tag2name /data/.tags/3879
Some command outputs - after the panic
=================================
# df -k
Filesystem 1024-blocks Used Available Capacity Mounted on
root_domain#root 261616 127003 128728 50% /
/proc 0 0 0 100% /proc
usr_domain#usr 1430720 263806 1133216 19% /usr
usr_domain#var 1430720 23846 1133216 3% /var
domain_dsk1h#data 5947880 3561555 2371640 61% /data
# /sbin/advfs/advscan dsk1
Scanning devices /dev/rdisk/dsk1
Found domains:
# ls /data
ls: /data not found
# cd data
ksh: data: permission denied
# cd /var/adm/crash
# ls -l
total 17561
-rw-r--r-- 1 root system 4 Aug 12 12:36 bounds
-rw-r----- 1 root system 14595408 Aug 12 12:36 vmunix.0
-rw-r----- 1 root system 3377220 Aug 12 12:36 vmzcore.0
Posts: 63 Location: nr Heathrow, Middlesex, UK Joined: 18.03.10
Posted on August 12 2011 12:12
Hi,
You have had an AdvFS domain panic.
- invalid frag group
- write error or internal inconsistency (in/with the frag group metadata)
- and an AdvFS I/O error (read) of block 10930176 within dsk1h.
The domain is then 'offlined' and inaccessible.
This would normally be attributed to a storage problem - a disk error, or data inconsistency on disk perhaps introduced by previous problems.
The AdvFS domain will only become available again after a reboot.
However you should consider re-creating the volume, as the problem will likely re-occur when the same block or its neighbours is accessed again. Alternatively, you may try to repair the damaged section(s) with fixfdmn(8), but given the I/O error noted earlier, it may not be completely sucessful.
As you observe, this is not the same issue as described in your original post.
Posts: 63 Location: nr Heathrow, Middlesex, UK Joined: 18.03.10
Posted on August 13 2011 06:09
Also, you may wish to examine the Tru64 binary error log (in /var/adm/) using dia (DECevent) or uerf - and check if there is any additional disk error information visble for the time of the AdvFS error.
And check the main FreeAXP log (not the PuTTY /dev/console log), and see if it reports anything 'odd' for the disk unit used by dsk1.
regards,
John M
Edited by John Manger on August 15 2011 10:17
Now I got a crash again moments after running SQLPlus (we installed Oracle 8.1.7 client - same version that is already running on a physical ES40).
When I ran the program from a normal (not 'root' user account and gave random username/password, it gave some errors and came out (obviously). About 5 seconds later I got this OS crash. So, I think it did not happen due to SQLPlus.
BTW: I have already changed the network card type to DE500 (21140).
******************************************
trap: invalid memory write access from kernel mode
faulting virtual address: 0x000000011ffffca8
pc of faulting instruction: 0xfffffc0000558fd0
ra contents at time of fault: 0xfffffc0000564440
sp contents at time of fault: 0x000000011fffb0b0
trap: invalid memory write access from kernel mode
faulting virtual address: 0x000000011ffffcfc
pc of faulting instruction: 0xfffffc000055da00
ra contents at time of fault: 0xfffffc000055d9e0
sp contents at time of fault: 0x000000011fffab70
DUMP: second crash dump skipped: 'dump_savecnt' enforced.
halted CPU 0
halt code = 5
HALT instruction executed
PC = fffffc000055ed70
>>>
FreeAXP VLC log for the session that crashed:
*************************************
FreeAXP Virtual Alpha x86 version 2.0.0.377 (Jun 8 2011 11:06:05)
Windows workstation version 5.1 SP 3.0, build 2600 (Service Pack 3, v.3311) suite 100 (WMI Name: Microsoft Windows XP Professional|C:\WINDOWS|\Device\Harddisk0\Partition5)
2 processor cores of family 17, stepping 0a (WMI Name: Intel Pentium III Xeon processor)
File opened at 2011-08-12 18:48:09
%XNV-I-RESTST: NVRAM restored from F:\AXPES40\AXPES40_AXPES40.nvr
DFL-I-MOUNT: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.0(file): Mounted file F:\FreeAXP\AXP_V5.1B.iso, handle 0000019C, 1320704 512-byte blocks, 1876/16/44 as DEC RRD42 4.5d SRL0000.
DFL-I-MOUNT: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.1(file): Mounted file F:\AXPES40\SYSDISK.vdisk, handle 000001A8, 3907911 512-byte blocks, 186091/7/3 as DEC RZ73 T366 SRL0101.
DFL-I-MOUNT: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.3(file): Mounted file F:\AXPES40\swapdisk.vdisk, handle 000001A4, 1954050 512-byte blocks, 3722/15/35 as DEC RZ57 6000 SRL0303.
DFL-I-MOUNT: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.4(file): Mounted file F:\AXPES40\DATARECO.img, handle 00000198, 17773524 512-byte blocks, 493709/12/3 as DEC RZ40L 8203 SRL0404.
XTI-I-RESTST: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).toy(bq3287): TOY restored from F:\AXPES40\AXPES40_AXPES40.toy
Actor framework started with 9 thread(s)
CTS-I-NEWSESS: New session on cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).serial0(i16550) from IP address 127.0.0.1
Can not set cache size: SetSystemFileCacheSize not found in KERNEL32.DLL.
AC4-I-DECOMP: Decompressing ROM image... done.
AC4-I-PATCHROM: Patching ROM for speed.
AXP-I-CPUSTRT: cp(control).AXPES40(alpha (AS400)).cpu0(EV4): CPU Starting
ESL-I-EXIT: Normal emulator shutdown requested.
%XNV-I-SAVEST: NVRAM saved to F:\AXPES40\AXPES40_AXPES40.nvr
XTI-I-SAVEST: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).toy(bq3287): Flash saved to F:\AXPES40\AXPES40_AXPES40.toy
ASY-I-FREEMEM: cp(control).AXPES40(alpha (AS400)): Freeing memory in use by system...
Bq3287: asserted 322856936 times
de-asserted 322856936 times
re-asserted 174915 times
DFL-I-CLOSE: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.0(file): Closing file.
IOC STATISTICS FOR cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.0(file)
read (async): issued 0 times, completed 0 times
read (sync ): issued 0 times, completed 0 times
write (async): issued 0 times, completed 0 times
write (sync ): issued 0 times, completed 0 times
DFL-I-CLOSE: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.1(file): Closing file.
IOC STATISTICS FOR cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.1(file)
read (async): issued 0 times, completed 0 times
read (sync ): issued 12672 times, completed 0 times
write (async): issued 0 times, completed 0 times
write (sync ): issued 27945 times, completed 0 times
DFL-I-CLOSE: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.3(file): Closing file.
IOC STATISTICS FOR cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.3(file)
read (async): issued 0 times, completed 0 times
read (sync ): issued 5 times, completed 0 times
write (async): issued 0 times, completed 0 times
write (sync ): issued 0 times, completed 0 times
DFL-I-CLOSE: cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.4(file): Closing file.
IOC STATISTICS FOR cp(control).AXPES40(alpha (AS400)).pcibus(dc21071da).pci6(symbios).disk0.4(file)
read (async): issued 0 times, completed 0 times
read (sync ): issued 910 times, completed 0 times
write (async): issued 0 times, completed 0 times
write (sync ): issued 171 times, completed 0 times
Crash dump files saved in /var/adm/crash.
I need to know why this is unstable. If its only my instance that's behaving like this, what do I need to do get a stable system using FreeAXP.
I can confirm that SQLPlus was not involved in the crash I reported in the previous post. I repeated the same steps again (wrong username/password) after reboot and there was no crash yet. Also, I am able to communicate to a DB running in another machine through SQLPlus.
Posts: 63 Location: nr Heathrow, Middlesex, UK Joined: 18.03.10
Posted on September 03 2011 06:16
After further testing and diagnosis, it appears that this issue only occurs under the 32-bit release of FreeAXP. The 64-bit version of FreeAXP and the commercial version 'Avanti' do not have the problem. Thus a 'workaround' is to run the emulator on a 64-bit Windows platform, which will also give improved performance in comparison to a 32-bit host.
After moving to a 64 bit host (Win7) and latest version of FreeAXP the system is stable and trouble free. Thanks John, Camiel and Bruce for the prompt support.
malmberg August 04 2022 No more VAX hobbyist licenses.
Community licenses for Alpha/IA64/X86_64 VMS Software Inc.
Commercial VMS software licenses for VAX available from HPE.
ozboomer July 20 2022 Just re-visiting.. No more hobbyist licenses? Is that from vmssoftware.com, no 'community' licenses?
valdirfranco July 01 2022 No more hobbyist license...sad
mister_wavey February 12 2022 I recall that the disks failed on the public access VMS systems that included Fafner
parwezw January 03 2022 Anyone know what happened to FAFNER.DYNDS.ORG?
I had a hobbyist account here but can longer access the site.
gtackett October 27 2021 Make that DECdfs _2.1A_ for Vax
gtackett October 27 2021 I'm looking for DECdfs V2.4A kit for VAX.
Asking here just in case anyone is still listening.
MarkRLV September 17 2021 At one time, didn't this web site have a job board? I would love to use my legacy skills one last time in my career.
malmberg January 18 2021 New Hobbyist PAKs for VAX/VMS are no longer available according to reports. Only commercial licenses are reported to be for sale from HPE
dfilip January 16 2021 Can someone please point me to hobbyist license pak? I'm looking for VAX/VMS 7.1, DECnet Phase IV, and UCX/TCPIP ... have the 7.1 media, need the license paks ... thanks!