Sun Cluster 3.1 Cheat Sheet

Sun Cluster 3.1 cheat sheet

Daemons

clexecd	This is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsck commands). It is also used to run cluster commands remotely (like the cluster shutdown command). This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds.
cl_ccrad	This daemon provides access from userland management applications to the CCR. It is automatically restarted if it is stopped.
cl_eventd	The cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events. The daemon is automatically respawned if it is killed.
cl_eventlogd	cluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped.
failfastd	This daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed
rgmd	The resource group management daemon which manages the state of all cluster-unaware applications.A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.fed	This is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.pmfd	This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.
pnmd	Public managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped.
scdpmd	Disk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped.

File locations

man pages	/usr/cluster/man
log files	/var/cluster/logs /var/adm/messages
sccheck logs	/var/cluster/sccheck/report.<date>
CCR files	/etc/cluster/ccr
Cluster infrastructure file	/etc/cluster/ccr/infrastructure

SCSI Reservations

Display reservation keys

scsi2:
/usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2

scsi3:
/usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2

determine the device owner

scsi2:
/usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2

scsi3:
/usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2

Cluster information

Quorum info	scstat –q
Cluster components	scstat -pv
Resource/Resource group status	scstat –g
IP Networking Multipathing	scstat –i
Status of all nodes	scstat –n
Disk device groups	scstat –D
Transport info	scstat –W
Detailed resource/resource group	scrgadm -pv
Cluster configuration info	scconf –p
Installation info (prints packages and version)	scinstall –pv

Cluster Configuration

Integrity check	sccheck
Configure the cluster (add nodes, add data services, etc)	scinstall
Cluster configuration utility (quorum, data sevices, resource groups, etc)	scsetup
Add a node	scconf –a –T node=<host><host>
Remove a node	scconf –r –T node=<host><host>
Prevent new nodes from entering	scconf –a –T node=.
Put a node into maintenance state	scconf -c -q node=<node>,maintstate Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.
Get a node out of maintenance state	scconf -c -q node=<node>,reset Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be one for that node.

Admin Quorum Device

Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together.
You can use the scsetup GUI interface to add/remove quorum devices or use the below commands.

Adding a device to the quorum	scconf –a –q globaldev=d11 Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace.
Removing a device to the quorum	scconf –r –q globaldev=d11
Remove the last quorum device	Evacuate all nodes put cluster into maint mode #scconf –c –q installmode remove the quorum device #scconf –r –q globaldev=d11 check the quorum devices #scstat –q
Resetting quorum info	scconf –c –q reset Note: this will bring all offline quorum devices online
Bring a quorum device into maintenance mode	obtain the device number #scdidadm –L #scconf –c –q globaldev=<device>,maintstate
Bring a quorum device out of maintenance mode	scconf –c –q globaldev=<device><device>,reset

Device Configuration

Lists all the configured devices including paths across all nodes.	scdidadm –L
List all the configured devices including paths on node only.	scdidadm –l
Reconfigure the device database, creating new instances numbers if required.	scdidadm –r
Perform the repair procedure for a particular path (use then when a disk gets replaced)	scdidadm –R <c0t0d0s0> - device scdidadm –R 2 - device id

Configure the global device namespace	scgdevs
Status of all disk paths	scdpm –p all:all Note: (<host>:<disk>)
Monitor device path	scdpm –m <node:disk path>
Unmonitor device path	scdpm –u <node:disk path>

Disks group

Adding/Registering	scconf -a -D type=vxvm,name=appdg,nodelist=<host>:<host>,preferenced=true
Removing	scconf –r –D name=<disk group>
adding single node	scconf -a -D type=vxvm,name=appdg,nodelist=<host>
Removing single node	scconf –r –D name=<disk group>,nodelist=<host>
Switch	scswitch –z –D <disk group> -h <host>
Put into maintenance mode	scswitch –m –D <disk group>
take out of maintenance mode	scswitch -z -D <disk group> -h <host>
onlining a disk group	scswitch -z -D <disk group> -h <host>
offlining a disk group	scswitch -F -D <disk group>
Resync a disk group	scconf -c -D name=appdg,sync

Transport cable

Enable	scconf –c –m endpoint=<host>:qfe1,state=enabled
Disable	scconf –c –m endpoint=<host>:qfe1,state=disabled Note: it gets deleted

Resource Groups

Adding	scrgadm -a -g <res_group> -h <host>,<host>
Removing	scrgadm –r –g <group>
changing properties	scrgadm -c -g <resource group> -y <propety=value>
Listing	scstat –g
Detailed List	scrgadm –pv –g <res_group>
Display mode type (failover or scalable)	scrgadm -pv -g <res_group> \| grep 'Res Group mode'
Offlining	scswitch –F –g <res_group>
Onlining	scswitch -Z -g <res_group>
Unmanaging	scswitch –u –g <res_group> Note: (all resources in group must be disabled)
Managing	scswitch –o –g <res_group>
Switching	scswitch –z –g <res_group> –h <host>

Resources

Adding failover network resource	scrgadm –a –L –g <res_group> -l <logicalhost>
Adding shared network resource	scrgadm –a –S –g <res_group> -l <logicalhost>
adding a failover apache application and attaching the network resource	scrgadm –a –j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=False –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin
adding a shared apache application and attaching the network resource	scrgadm –a –j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=True –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin
Create a HAStoragePlus failover resource	scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \ > -x FileSystemMountPoints=/oracle/data01 \ > -x Affinityon=true
Removing	scrgadm –r –j res-ip Note: must disable the resource first
changing properties	scrgadm -c -j <resource> -y <property=value>
List	scstat -g
Detailed List	scrgadm –pv –j res-ip scrgadm –pvv –j res-ip
Disable resoure monitor	scrgadm –n –M –j res-ip
Enable resource monitor	scrgadm –e –M –j res-ip
Disabling	scswitch –n –j res-ip
Enabling	scswitch –e –j res-ip
Clearing a failed resource	scswitch –c –h<host>,<host> -j <resource> -f STOP_FAILED
Find the network of a resource	# scrgadm –pvv –j <resource> \| grep –I network
Removing a resource and resource group	offline the group # scswitch –F –g rgroup-1 remove the resource # scrgadm –r –j res-ip remove the resource group # scrgadm –r –g rgroup-1

Resource Types

Adding	scrgadm –a –t <resource type> i.e SUNW.HAStoragePlus
Deleting	scrgadm –r –t <resource type>
Listing	scrgadm –pv \| grep ‘Res Type name’