Author |
Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
sajithsathian
Member
Posts: 12
Location: India
Joined: 14.02.12 |
Posted on February 14 2012 20:23 |
|
|
Hi,
I created a two node OpenVMS cluster using TCP/IP cluster interconnect. I set the expected votes to 1. However, when one of the cluster nodes goes down, the other node hangs (remains in hung state till other node is up).
I next attached an FC shared disk to both the nodes of the cluster after which I enabled the quorum disk for use on one of the disk and rebooted after an AUTOGEN. Then I enabled the quorum disk on the second node, ran AUTOGEN with reboot. After the reboot, the second node does not boot up completely ( I do not get the login prompt for the second node). However the show cluster command shows that both the nodes are cluster member.
Could someone point me out if am missing something or going wrong somewhere. I am beginner user of OpenVMS systems.
My configuration details are as below
Node 1
-----------
SCSNODE: I64029
SYSTEMID: 10
ALLOCLASS: 10
CLUSTER Group: 10
Quorum Disk: $1$DGA30
Node 2
-------------
SCSNODE: I64030
SYSTEMID: 20
ALLOCLASS: 10
CLUSTER Group: 10
Quorum Disk: $1$DGA30
Regards,
Sajith Sathiadevan |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
malmberg
Moderator
Posts: 530
Joined: 15.04.08 |
Posted on February 15 2012 03:25 |
|
|
Expected Votes dynamically adjusts based on the number of votes seen in the running cluster. The SYSGEN value is just used for the booting of the cluster.
The shutdown parameter "REMOVE_NODE" adjusts the expected_votes parameter down to allow removing a node.
You need to make sure that your Fibre-Disk array is set up for the VMS operating system, where all systems have an active path to the disk.
|
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
abrsvc
Member
Posts: 108
Joined: 12.03.10 |
Posted on February 15 2012 03:29 |
|
|
Having the alloclass for eeach machine the same could be contributing to the problem. Each machine should have the alloclass different. This is what defines the "local" devices. In otherwords, the disks on machine A are seen as $xx$dyyy devices where xx is the allocation class. With both machines having devices with the same class, there may be confusion betweent the machines. Change one of them to another value and try again. Report the new behavior.
Also, a more detailed listing of your hardware setup would be helpful.
Dan |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
malmberg
Moderator
Posts: 530
Joined: 15.04.08 |
Posted on February 16 2012 02:58 |
|
|
Fibre-SCSI disk drives are always Alloclass 1.
|
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
abrsvc
Member
Posts: 108
Joined: 12.03.10 |
Posted on February 16 2012 07:53 |
|
|
I was referring to any local drives that may be confusing the issue not the FC drives. |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
sajithsathian
Member
Posts: 12
Location: India
Joined: 14.02.12 |
Posted on February 16 2012 08:29 |
|
|
I will give a try changing the ALLOCLASS today. Meanwhile my hardware setup is as below.
2no. RX2660 Integrity Servers with OpenVMS 8.4 booting from FC SAN disk (Netapp).
Both the nodes have three disks each. One lun0, one FC boot disk and one shared FC lun used for quorum.
I have not installed DECnet. The cluster is created over IP interconnect. |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
sajithsathian
Member
Posts: 12
Location: India
Joined: 14.02.12 |
Posted on February 17 2012 01:58 |
|
|
Even after changing the ALLOCLASS, the issue persists. Also the quorum disk is listed on both the nodes as $1$DGA30.
Anything else I could be missing ? |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
malmberg
Moderator
Posts: 530
Joined: 15.04.08 |
Posted on February 17 2012 03:22 |
|
|
There should be some cluster status messages on the console of the hung node that describe the state of the quorum.
My guess is that for some reason the hung system can not access the quorum disk. This could be for many issues.
1. Fibre interconnect not set up correctly for Fabric.
2. Bad Fibre hardware somewhere.
3. Fibre disk controller is not set to VMS mode for the disk or the host. Some Fibre Controllers need to you set the host type on them, otherwise they may default to an OS type that is not compatible with VMS. In some cases the mis-match will work for a single node, but not for cluster access.
I have no experience with using Netapp SAN devices with VMS.
|
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
Bruce Claremont
Member
Posts: 623
Joined: 07.01.10 |
Posted on February 17 2012 04:43 |
|
|
To provide a bit more comprehensive view of your hardware, run VMS_INFO on each system and post the results.
http://www.migrationspecialties.com/VMS_INFO.html |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
sajithsathian
Member
Posts: 12
Location: India
Joined: 14.02.12 |
Posted on February 17 2012 08:45 |
|
|
The console messages of the node that hangs is pasted below. Meanwhile I have collected the VMS_INFO from both the servers. I will upload it to some web location and paste the link soon.
Details:
----------------------------------
HP OpenVMS Industry Standard 64 Operating System, Version V8.4
© Copyright 1976-2010 Hewlett-Packard Development Company, L.P.
PGQBT-I-INIT-UNIT, boot driver, PCI device ID 0x2532, FW 4.04.04
PGQBT-I-BUILT, version X-33, built on Sep 15 2010 @ 15:12:50
PGQBT-I-LINK_WAIT, waiting for link to come up
PGQBT-I-TOPO_WAIT, waiting for topology ID
%DECnet-I-LOADED, network base image loaded, version = 05.17.00
%DECnet-W-NOOPEN, could not open SYS$SYSROOT:[SYSEXE]NET$CONFIG.DAT
%VMScluster-I-LOADIPCICFG, loading IP cluster configuration files
%VMScluster-S-LOADEDIPCICFG, Successfully loaded IP cluster configuration files
%CNXMAN, Using remote access method for quorum disk
%SMP-I-CPUTRN, CPU #1 has joined the active set.
%SMP-I-CPUTRN, CPU #3 has joined the active set.
%SMP-I-CPUTRN, CPU #2 has joined the active set.
%VMScluster-I-LOADSECDB, loading
the cluster security database
%EWA0, Auto-negotiation mode assumed set by console
%EWA0, Merl5704 located in 64-bit, 66-mhz PCI-X slot
%EWA0, Device type is BCM5704C (UTP) Rev B0 (21000000)
%EWB0, Auto-negotiation mode assumed set by console
%EWB0, Merl5704 located in 64-bit, 66-mhz PCI-X slot
%EWB0, Device type is BCM5704C (UTP) Rev B0 (21000000)
%EWA0, Link up: 1000 mbit, full duplex, flow control (txrx)
%EWB0, Link up: 1000 mbit, full duplex, flow control (txrx)
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%CNXMAN, Sending VMScluster membership request to system I64029
%CNXMAN, Now a VMScluster member -- system I64030
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving
%PEA0, Configuration data for IP cluster found
%PEA0, Cluster communication enabled on IP interface, WE0
%PEA0, Successfully initialized with TCP/IP services
%PEA0, Remote node Address, 10.73.64.29, added to unicast list of IP bus, WE0
%PEA0, Hello sent on IP bus WE0
%PEA0, Cluster communication successfully initialized on IP interface , WE0
--------------------------------- |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
sajithsathian
Member
Posts: 12
Location: India
Joined: 14.02.12 |
Posted on February 19 2012 03:38 |
|
|
The links for VMS_INFO from the two nodes of the cluster are pasted below. The Node 2 VMS_INFO is taken after disabling the quorum disk, as the node hangs with quorum disk enabled.
https://docs.google.com/document/d/1QPiLnLXkbM0lzGn6bMWB0K30AAOVbLjIqX39RQqsm2U/edit
https://docs.google.com/document/d/1SAMqDzWlo_tSgAdakQ37OBrRgs3lk6TAZ7AV8LkvrX0/edit |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
Bruce Claremont
Member
Posts: 623
Joined: 07.01.10 |
Posted on February 20 2012 05:03 |
|
|
In MODPARAMS.DAT, disk_quorum is not set consistently. If you are AUTOGEN'ing this would cause problems. Suggest cleaning up MODPARAMS.DAT so there is only one occurrence of each parameter to avoid confusion. I like SORT/NODUP when doing this. |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
sajithsathian
Member
Posts: 12
Location: India
Joined: 14.02.12 |
Posted on February 23 2012 06:06 |
|
|
I did a cleanup of the MODPARAMS.DAT and then saw that adding the quorum disk from CLUSTER_CONFIG.COM does not modify the EXPECTED_VOTES. I manually changed them from 1 to 2 after adding the quorum disk and now both my systems boot up with quorum disk enabled.
However, still I face that when one of the node is rebooted, the other node loses quorum (even though it has one vote contributed by the node itself and one vote by the quorum disk)
Anything else that I could be missing out here? |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
abrsvc
Member
Posts: 108
Joined: 12.03.10 |
Posted on February 24 2012 01:04 |
|
|
Try setting the vote values for each node as 2 with the quorum disk set to contribute 1 vote. The calculation should end up needing 3 for a valid cluster. I seem to recall that I set up a client's cluster this way with no problems.
Dan |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
sajithsathian
Member
Posts: 12
Location: India
Joined: 14.02.12 |
Posted on February 25 2012 00:58 |
|
|
I tried increasing the votes for the nodes to 2 and set the expected votes to 3. After an autogen and reboot one of the node again hangs (blocking user login) even though it shows its part of cluster. Pasting below the output from system analyser and modparams.dat
System Analyser
======================
SDA> show cluster
VMScluster data structures
--------------------------
--- VMScluster Summary ---
Quorum Votes Quorum Disk Votes Status Summary
------ ----- ----------------- --------------
3 4 N/A quorum
--- CSB list ---
Address Node CSID ExpVotes Votes State Status
------- ---- ---- -------- ----- ----- ------
891AC2C0 I64029 00010015 3 2 local member,qf_same
8954D5C0 I64030 00010013 3 2 open member,qf_same
--- Cluster Block (CLU 89082800 ---
Flags: 10082001 cluster,tdf_valid,init,quorum
Quorum/Votes 3/4 Last transaction code 02
Quorum Disk Votes 1 Last trans. number 41
Nodes 2 Last coordinator CSID 00010013
Quorum Disk $1$DGA30 Last time stamp 25-FEB-2012
Found Node SYSID 000000000014 11:49:12
Founding Time 24-FEB-2012 Largest trans. id 00000029
13:38:23 Resource Alloc. retry 0
Index of next CSID 0016 Figure of Merit 00000000
Quorum Disk Cntrl Block 89082FC0 Member State Seq. Num 0027
Timer Entry Address 00000000 Foreign Cluster 00000000
CSP Queue empty
--- Cluster Failover Control Block (CLUFC 89082960 ---
Flags: 00000000
Failover Step Index 0000003A CSB of Synchr. System 8954D5C0
Failover Instance ID 00000029
--- Cluster Quorum Disk Control Block (CLUDC 89082FC0 ---
State : 0001 qs_rem_ina
Flags : 0000
CSP Flags : 0001 csp_ack
Iteration Counter 0 UCB address 00000000
Activity Counter 0 TQE address 89083240
Quorum file LBN 00000000 IRP address 89082D80
Watcher CSID 00000000
--- I64029 Cluster System Block (CS 891AC2C0 ---
State: 0B local
Flags: 070A000A member,selected,send_ext_status,local,status_rcvd,send_status
qf_same
Cpblty: 00003BF2 vcc,cwcreprc,threads,cwlogicals,ipc_demult_conn,rmbxfr,wbm_shad
ow,sched_class,wbm_amcvp,wbm_type
SWVers: V8.4 LNM Seqnum: 0000000100000004
HWName: HP rx2660 (1.67GHz/9.0M
ExpVotes/Votes 3/2 Last seqnum used 0000 Sent queue 00000000
Quor. Disk Vote 1 Last seqnum ackd 0000 Resend queue 00000000
CSID 00010015 Last rcvd seqnum 0000 Block xfer Q. 891AC448
Protocol/ECO 30/0 Unacked msgs/lim 0/0 CDT address 00000000
Reconn. time 00000000 Lock Rem/Dir wt 5/0 PDT address 00000000
Ref. count 2 Forked msgs/lim 0/300 TQE address 00000000
Ref. time 25-FEB-2012 Incarnation 25-FEB-2012 SB address 88217D80
11:49:12 11:48:56 Current CDRP 00000001
--- I64030 Cluster System Block (CS 8954D5C0 ---
State: 01 open
Flags: 0202010A member,cluster,selected,status_rcvd
qf_same
Cpblty: 00003BBA vcc,ext_status,cwcreprc,threads,ipc_demult_conn,rmbxfr,wbm_shad
ow,sched_class,wbm_amcvp,wbm_type
SWVers: V8.4 LNM Seqnum: 0000000000000000
HWName: HP rx2660 (1.67GHz/9.0M
ExpVotes/Votes 3/2 Last seqnum used 1FE8 Sent queue 89779180
Quor. Disk Vote 1 Last seqnum ackd 1FE7 Resend queue 00000000
CSID 00010013 Last rcvd seqnum 1206 Block xfer Q. 8954D748
Protocol/ECO 30/0 Unacked msgs/lim 0/25 CDT address 8954D300
Reconn. time 00000000 Lock Rem/Dir wt 5/0 PDT address 893E35B8
Ref. count 2 Forked msgs/lim 0/300 TQE address 00000000
Ref. time 25-FEB-2012 Incarnation 25-FEB-2012 SB address 8954A540
11:24:38 11:24:23 Current CDRP 00000000
SDA> exit
$
==============================
Modparams.dat
==============================
$ type sys$system:modparams.dat
!
! CLUSTER_CONFIG_LAN appending for ADD operation on 14-FEB-2012 12:16:18.47
! CLUSTER_CONFIG_LAN appending for CHANGE operation on 14-FEB-2012 16:30:35.35
! CLUSTER_CONFIG_LAN appending for CHANGE operation on 14-FEB-2012 18:02:33.16
! CLUSTER_CONFIG_LAN appending for CHANGE operation on 17-FEB-2012 19:31:19.43
! CLUSTER_CONFIG_LAN appending for CHANGE operation on 20-FEB-2012 14:37:13.07
! CLUSTER_CONFIG_LAN appending for CHANGE operation on 23-FEB-2012 06:05:12.49
! CLUSTER_CONFIG_LAN appending for CHANGE operation on 23-FEB-2012 06:43:03.35
! CLUSTER_CONFIG_LAN end
! Created during installation of OpenVMS AXP V8.4 14-FEB-2012 09:30:37.62
! End of SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
AGEN$INCLUDE_PARAMS SYS$MANAGER:AGEN$NEW_NODE_DEFAULTS.DAT
ALLOCLASS=10
BOOTNODE="N"
DISK_QUORUM="$1$DGA30"
EXPECTED_VOTES=3
INTERCONNECT="NI"
MSCP_LOAD=1
MSCP_SERVE_ALL=2
NISCS_LOAD_PEA0=1
NISCS_USE_UDP=1
QDSKVOTES=1
SCSNODE="I64029"
SCSSYSTEMID=10
VAXCLUSTER=2
VOTES=2
$ |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
abrsvc
Member
Posts: 108
Joined: 12.03.10 |
Posted on February 27 2012 01:32 |
|
|
I have set up a 2 node cluster for DR with the following parameters. Please note that in my case, there is no Quorum disk. The cluster survives transitions without problems when the secondary fails (DR machine) and requires intervention when the primary fails (expected). This is due to external hardware changes required for the secondary to function as the primary. Your situation is a little different, but this scenerio may prove to be helpful in understanding your circumstances.
The machines are set up as follows:
Machine A: Votes = 2, Expected = 1
Machine B: Votres = 1, Expected = 1
Hope this helps a bit...
Dan |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
sajithsathian
Member
Posts: 12
Location: India
Joined: 14.02.12 |
Posted on March 01 2012 06:08 |
|
|
I added one more node to the cluster, removed the quorrum disk, set 3 votes for each server, expected votes as 9 and now my cluster is working fine.
Thanks for all your help guys.
Still, I will try to see why my quorum disk config did not work. Once after I finish the current configuration, I will move back to the 2 node cluster with quorum disk and try again.
Thanks and Regards,
Sajith Sathiadevan |
|
Author |
RE: Need help on OpenVMS 8.4 Integrity systems cluster with IP interconnect and FC Quorum Disk |
malmberg
Moderator
Posts: 530
Joined: 15.04.08 |
Posted on March 01 2012 16:56 |
|
|
The symptoms are indicating that only one host was allowed to connect to the quorum disk at a time.
The typical cause of this with Fibre-SCSI is that the either the disk controller does not support VMS concurrent access, or the controller has not been put in VMS mode.
|
|