ODS (online disk suite)
Sun's volume manager has many names
ODS is a disk storage management solution, which offers
Raid Levels
The disk management software offers the common raid levels
raid 0 (Striping) | A number of disks are concatenated together to give the appearance of one very large disk. Advantages Disadvantages |
raid 1 (Mirroring) | A single disk is mirrored by another disk, if one disk fails the system is unaffected as it can use its mirror. Advantages |
raid 5 | Raid stands for Redundant Array of Inexpensive Disks, the disks are striped with parity across 3 or more disks, the parity is used in the event that one of the disks fails, the data on the failed disk is reconstructed by using the parity bit. Advantages |
Metadevice and Metadevice Database
A metadevice is a name for a group of physical slices that appear as a single logical device (virtual device). The maximum default number of metadevices is 128 but this can be adjusted by editing /kernel/drv/md.conf and changing the nmd parameter (1024 maximum).
A metadevice database (otherwise know as state database) is a database that stores information about the ODS configuration, it is used to store and track changes made to ODS, this database is what makes the ODS persistent across reboots. The database has multiple copies known as replicas (minimum of 3 is required), this ensures that the database is always valid, you should keep multiple copies across different disks just in case a disk should fail and thus reducing single-points of failure, the database is never more than an 10MB and is generally stored on a single slice of each disk.
ODS uses a majority consensus algorithm to determine if a replica is corrupted or not, when changes are made each replica is updated in turn just in case a power failure happens during the update, thus when the system is started the majority replicas will be implemented, the algorithm guarantees the following
Hot Spares
ODS uses a hot spare pool, which is a collection of disk slices reserved by ODS which will automatically be used when a disk slice fails. They provide increased data protection, however i have very rarely used hot spares as i normally replace a failed disk pretty quickly. See the Sun Documentation for detail information on hot spares.
Growing/Shrinking Filesystem
Expanding filesystems is not without problems with ODS but it is possible, however shrinking a filesystem under ODS is not possible, normally you create another new smaller filesystem and copy the data across then cut over to the new filesystem.
This is one area the Veritas volume manager excels as it very easy to grow and shrink a filesystem.
Filesystem Logging
ODS uses translogs to log changes made to the filesystem, in the event that the system were to crash the log is replayed thus avoiding a fsck (which can take a long time depending on the size of the filesystem). However newer versions of Solaris offer UFS logging, here is a list of advantages/disadvantages of both
ODS logging
My preference is to use UFS logging and since its introduction in solaris 7 i have only ever used this.
Naming Convention
There is no set standard on what you call your metadevices but i have my own convention and undoubtedly there are many others.
The main metadevice (raid 0,1 or 5) which is were the filesystem will be placed will always end in 0 so for example d0, d10, d20, d30, d40, etc
A sub-mirror will either end in a 1 (first sub-mirror) or 2 (second sub-mirror) so for example d1 and d2, d11 and d12, d21 and d22, etc
A raid slice will either end in a 1..n (n = depends on number of disks) so for example d1 & d2 & d3, d21 & d22 & d23, etc
So for an example
This is my own preference and you are welcome to have your own naming convention
File Locations
ODS uses a number of different files, below are the most useful one's:
/kernel/drv/md.conf | This file is the ODS device drive configuration file, the only modifiable field is the 'nmd' which represents the number of metadevices supported by the driver, if you change this file you must reboot the system for the changes to take affect. In a configuration that uses a lot of devices I increase this to the maximum 1024. |
/etc/lvm/mddb.cf | This file keeps track of metadevice state database replicas, each metadevice state database has a unique entry in this file. You can display the file using 'cat' but do not edit it manually. |
/etc/lvm/md.tab | This file is used by metainit, metadb and metahs commands. The file contains the the rest of the commandline for use by metainit, metadb and metahs. This file can be edited manually or populated by the command 'metastat -p' |
/etc/lvm/md.cf | This file is a copy of the md.tab file and is used for disaster recovery purposes, it is automatically updated. |
Meta Commands
I am not going to explain in details how ODS works but simply supply a list of commands that I use regularly, if you want a more detail explanation then I suggest you refer to the Sun Documentation
Metadatabase Commands |
|
Create | metadb -a -f -c 3 c0t0d0s6 c1t0d0s6 c2t0d0s6 -a - attach metadatabase to device |
Add | metadb -a -c 3 c3t0d0s6 |
Remove | metadb -d c3t0d0s6 |
Display | metadb -i |
Repairing | # The only way to repair a replica is that you simply delete all the replica's on the device and # First confirm that the replicas are corrupted and you have the device name # Delete the corrupted replicas and reboot # Now recreate them making sure you have 3 copies |
Metadevice Commands |
|
Create Concatenated device | metainit d0 3 1 c1t0d0s0 1 c2t0d0s0 1 c3t0d0s0 d0 - metadevice name |
Create stripe metadevice | metainit d0 1 2 c1t0d0s0 c2t0d0s0 -i 64k d0 - metadevice name |
Create Mirror metadevice | # first create two metadevices (these will become sub-mirrors) # Then attach the second sub-mirror using the metadevice d12 create above to the mirror d10 # Display the mirrored metadevice and confirm that mirror has complete resyncing operation |
Create Raid 5 metadevice | # When creating a raid 5 metadevice you need a minimum of 3 slices metainit d10 -r c1t0d0s0 c2t0d0s0 c3t0d0s0 -r - specify that its a raid 5 configuration |
Mirroring the root filesystem | # Lets say you want to mirror the main disk which has the following filesystems configured, we will be using # The first step is to make sure the partition information is the same on the new mirror disk (c1t0d0) # Then we want to install the boot block on the new mirror device, this allows you boot the disk should # Create first metadevice which will become the a sub-mirror of d0 # Create the second metadevice which will become the sub-mirror of d0, we do not need the -f option (force) # At this point we have two metadevices d11 (contains root filesystem) and d12 (the new disk) # We now have to update the /etc/system and /etc/vfstab with the new root metadevice information # Now reboot the server so that the new mirror metadevice is mounted and the kernel parameters for ODS # Once the mirrors are sync'ed you have a root filesystem that is highly available, you can now perform |
Other ODS Commands |
|
Display Metadatabse | metadb -i |
Display Metadevices | metastat |
Display metadevice in md.tab format | metastat -p |
ODS Errors
A list of some of the more common errors of ODS
"no such file or directory error" when trying to configure a metadevice | # update the nmd parameter in the /kernel/drv/md.conf file, i normally increase this to it's maximum 1024. |
Metadevice in maintenance state | # Disks do go bad from time to time, however there is a difference between a total disk failure or a # First access the disk via format, if you can then run a analyze on the disk to repair/map out any bad metareplace d0 c1t0d0s0 <new device name> # Again confirm that the disk re-sync'ed |