On vacation from a major dental surgery I am currently learning and testing these three DRBD/LVM combinations and thinking about which one I would use on a real production setup.
1. DRBD over plain device
2. DRBD over LVM
3. LVM over DRBD
4. LVM over DRBD over LVM
1. DRBD over plain device. This puts actual device names such as sdb1 in the drbd configuration. I dont like that. There are ways around this such as using multipath or using /dev/disk/by-id. I havent tested those yet with drbd but the point is the actual device names are in the configuration files and they had better agree with the real devices (after years of uptime and changeover of sysadmins :).
2. DRBD over LVM. This puts an abstraction layer at the lowest layer and it avoids having to place actual device names in drbd resource files. For example:
/etc/drbd.d/some-resource.res
resource __ {
...
...
device /dev/drbd0
disk /dev/vg/lvdisk0
...
}
There you go, no /dev/sdb1 or whatever in the disk configuration. This avoids problems arising from devices switching device names on reboot.
3. LVM over DRBD
As the name implies, puts the flexibility of provisioning in the drbd layer, where it is closer to the application. It would make typical provisioning such as disk allocation, destruction, extensions and shrinkings much easier. Howerer I still do not like writing device names in the config files...
4. LVM over DRBD over LVM.
LVM over DRBD over LVM is probably the most flexible solution. There are no actual device names in DRBD configuration; LVM is very much resilient with machine restarts due to its auto detection of metadata in whatever order the physical disks comes up in. With this combinations I can re arrange the phsyical backing storage and at the same time have the flexibility of LVM on the upper layer. The only issue is having to ADJUST STUFF IN /etc/lvm/lvm.conf.
in /etc/lvm/lvm.conf
# filter example --
# /dev/vd* on the physical layer,
# /dev/drbd* on the drbd layer
filter = [ "a|^/dev/vd.*|", "a|drbd.*|", "r|.*/|" ]
write_cache_state = 0
use_lvmetad = 0
Just a few lines of config. This is fine. The problem is having to remember what all these configuration mean after 2 years of uptime...
---
JondZ 20171015
Sunday, October 15, 2017
Wednesday, October 11, 2017
reducing lvm drbd disk size
Here is a snippet of my notes for reducing drbd disk size (assuming that the physical device is on LVM which can be resized).
Just remember that a drbd device is a container, and HAS METADATA. Therefore think about it as a filesystem. Also, this procedure will only work if the disks are ONLINE (the disks are attached, and drbd is running).
In this example, A filesystem has only 100 megs worth of data; we want to shrink the physical store down from 500 to about 120 Megs.
WARNING: This procedure can be destructive is done wrong.
1. Note the filesytem consumed size. For this example the filesystem contains 100M worth fo data. Shrink the filesystem. Note that -M resizes to minimum size.
umount /dev/drbd0
fsck -f /dev/drbd0
resize2fs -M /dev/drbd0
At this point the filesytem on /dev/drbd0 should be at the minimum (i.e., close to the consumed size---about 100 MB in this example). If you are not sure, mount the fileystem again and use "df" or use tune2fs (if ext4) to MAKE SURE.
2. Resize the drbd device. Make sure it is higher than the fileystem size because drbd also uses disk space for metadata!
drbdadm -- --size=110M resize r0
If you would type in "lsblk" at this point, drbd0 should show about 110M.
3. Shrink the physical backing device to a bit higher than the drbd device:
on first node (drb7): lvresize -L 120M /dev/drt7/disk1
on first node (drb8): lvresize -L 120M /dev/drt8/disk1
4. Size up the drbd device to use up all available LV space:
drbdadm resize r0
4. Finally size up the filesystem:
resize2fs /dev/drbd0
5. Mount and verify that the filesystem is indeed 120 Megs.
Just remember that a drbd device is a container, and HAS METADATA. Therefore think about it as a filesystem. Also, this procedure will only work if the disks are ONLINE (the disks are attached, and drbd is running).
In this example, A filesystem has only 100 megs worth of data; we want to shrink the physical store down from 500 to about 120 Megs.
WARNING: This procedure can be destructive is done wrong.
1. Note the filesytem consumed size. For this example the filesystem contains 100M worth fo data. Shrink the filesystem. Note that -M resizes to minimum size.
umount /dev/drbd0
fsck -f /dev/drbd0
resize2fs -M /dev/drbd0
At this point the filesytem on /dev/drbd0 should be at the minimum (i.e., close to the consumed size---about 100 MB in this example). If you are not sure, mount the fileystem again and use "df" or use tune2fs (if ext4) to MAKE SURE.
2. Resize the drbd device. Make sure it is higher than the fileystem size because drbd also uses disk space for metadata!
drbdadm -- --size=110M resize r0
If you would type in "lsblk" at this point, drbd0 should show about 110M.
3. Shrink the physical backing device to a bit higher than the drbd device:
on first node (drb7): lvresize -L 120M /dev/drt7/disk1
on first node (drb8): lvresize -L 120M /dev/drt8/disk1
4. Size up the drbd device to use up all available LV space:
drbdadm resize r0
4. Finally size up the filesystem:
resize2fs /dev/drbd0
5. Mount and verify that the filesystem is indeed 120 Megs.
Friday, October 06, 2017
Stress testing drbd online verification
DRB is so nice. It is really very nice to have this skill available--I would use it personally, at home or at office production. This is a very nice talent to have, practically in the real world, to be able to string together 2 computers with a network cable and replicate disks from one to the other automatically.
I just stress tested the "online verification" procedure. Basically I wanted to see how I would formulate a recovery procedure for a corrupted disk. In summary this is what I did---
1. Configure checksum method for online verification.
2. Perform online verification to compare disks. This is as simple as typing out "drbdadm verify <resource>" and watching the logs in /var/log/messages.
STRESS TEST. To make sure I would recover from a failed disk I tested out this scenario:
3. On node1, stop drbd (systemctl stop drbd)
4. Force a disk corruption, for example dd if=/dev/zero of=/dev/vdb1
5. start drbd (systemctl start drbd)
RECOVERY PROCEDURE: Here is what I came up with as a procedure.
6. drbdadm verify r0 # r0 is the resource name
At this point I would notice the disk corruption in /var/log/messages.
7. On the "bad" node:
drbdadm secondary r0
drbdadm invalidate r0
This is the summary of procedure (perhaps with some minor detail I forgot about). After the "invalidate" instructions the disk should sync again. Just make sure that the correct disk on the correct node is identified and invalidated.
-------
JondZ
I just stress tested the "online verification" procedure. Basically I wanted to see how I would formulate a recovery procedure for a corrupted disk. In summary this is what I did---
1. Configure checksum method for online verification.
2. Perform online verification to compare disks. This is as simple as typing out "drbdadm verify <resource>" and watching the logs in /var/log/messages.
STRESS TEST. To make sure I would recover from a failed disk I tested out this scenario:
3. On node1, stop drbd (systemctl stop drbd)
4. Force a disk corruption, for example dd if=/dev/zero of=/dev/vdb1
5. start drbd (systemctl start drbd)
RECOVERY PROCEDURE: Here is what I came up with as a procedure.
6. drbdadm verify r0 # r0 is the resource name
At this point I would notice the disk corruption in /var/log/messages.
7. On the "bad" node:
drbdadm secondary r0
drbdadm invalidate r0
This is the summary of procedure (perhaps with some minor detail I forgot about). After the "invalidate" instructions the disk should sync again. Just make sure that the correct disk on the correct node is identified and invalidated.
-------
JondZ
Thursday, October 05, 2017
drbd diskless mode
I am still scratching my head over this one--that it is actually possible. Sure I ran diskless stuff like iSCSI with special hardware cards before, but drbd?
I detached the disk and then turned the resource into active. So basically the node, without a disk, is talking to a node, with a disk, and pretending that the disk is local:
[root@drb6 tmp]# drbdadm status
r0 role:Primary
disk:Diskless
drb5 role:Secondary
peer-disk:UpToDate
[root@drb6 tmp]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/drbd0 1014612 33860 964368 4% /mnt/tmp
On this node, /dev/drbd0 has no disk backing store for drbd0. drbd0 is for all practical purposes a normal block device.
That is amazing...
jondz
I detached the disk and then turned the resource into active. So basically the node, without a disk, is talking to a node, with a disk, and pretending that the disk is local:
[root@drb6 tmp]# drbdadm status
r0 role:Primary
disk:Diskless
drb5 role:Secondary
peer-disk:UpToDate
[root@drb6 tmp]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/drbd0 1014612 33860 964368 4% /mnt/tmp
On this node, /dev/drbd0 has no disk backing store for drbd0. drbd0 is for all practical purposes a normal block device.
That is amazing...
jondz
Wednesday, October 04, 2017
My first DRBD cluster test
Here is my first cluster. It took me the WHOLE MORNING to figure out (I misunderstood the meaning of the "clone-node-max" property). Anyhow this is a 4-node active/passive drbd storage cluster.
In this example, only the Primary (pcs "Master") can use the block device at any one time. The nodes work correctly in that nodes are promoted/demoted as expected when they leave/enter the cluster.
I will have to re-do this entire thing from scratch to make sure I can do it again and keep notes (so many things to remember!). I will also enable some service here to use the block device: maybe an nfs or LIO iSCSI server or something.
Here are my raw notes and a sample "pcs status" output:
------------RAW NOTES-- SORT IT OUT LATER ----------
pcs resource create block0drb ocf:linbit:drbd drbd_resource=r0
pcs resource master block0drbms block0drb master-max=1 master-node-max=1 clone-max=4
# pcs resource update block0drbms clone-node-max=3 THIS IS WRONG--SHOULD BE 1 BECAUSE ONLY 1 CLONE SHOULD RUN ON EACH NODE (see below later)
pcs resource update block0drbms meta target-role='Started'
pcs resource update block0drbms notify=true
[root@drb3 cores]# systemctl disable drbd
Removed symlink /etc/systemd/system/multi-user.target.wants/drbd.service.
pcs resource update block0drb meta target-role="Started"
pcs resource update block0drb drbdconf="/etc/drbd.conf"
pcs property set stonith-enabled=false
pcs resource update block0drbms clone-node-max=1
pcs resource enable block0drbms
Also (info from the web), fix wrong permissions if needed:
--- chmod 777 some file in /var if needed ----
chmod 777 /var/lib/pacemaker/cores
---------- EXAMPLE PCS STATUS OUTPUT --------
[root@drb3 ~]# pcs status
Cluster name: drbdemo
Stack: corosync
Current DC: drb2 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
Last updated: Wed Oct 4 11:38:16 2017
Last change: Wed Oct 4 11:30:36 2017 by root via cibadmin on drb4
4 nodes configured
4 resources configured
Online: [ drb1 drb2 drb3 drb4 ]
Full list of resources:
Master/Slave Set: block0drbms [block0drb]
Masters: [ drb2 ]
Slaves: [ drb1 drb3 drb4 ]
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@drb3 ~]#
In this example, only the Primary (pcs "Master") can use the block device at any one time. The nodes work correctly in that nodes are promoted/demoted as expected when they leave/enter the cluster.
I will have to re-do this entire thing from scratch to make sure I can do it again and keep notes (so many things to remember!). I will also enable some service here to use the block device: maybe an nfs or LIO iSCSI server or something.
Here are my raw notes and a sample "pcs status" output:
------------RAW NOTES-- SORT IT OUT LATER ----------
pcs resource create block0drb ocf:linbit:drbd drbd_resource=r0
pcs resource master block0drbms block0drb master-max=1 master-node-max=1 clone-max=4
# pcs resource update block0drbms clone-node-max=3 THIS IS WRONG--SHOULD BE 1 BECAUSE ONLY 1 CLONE SHOULD RUN ON EACH NODE (see below later)
pcs resource update block0drbms meta target-role='Started'
pcs resource update block0drbms notify=true
[root@drb3 cores]# systemctl disable drbd
Removed symlink /etc/systemd/system/multi-user.target.wants/drbd.service.
pcs resource update block0drb meta target-role="Started"
pcs resource update block0drb drbdconf="/etc/drbd.conf"
pcs property set stonith-enabled=false
pcs resource update block0drbms clone-node-max=1
pcs resource enable block0drbms
Also (info from the web), fix wrong permissions if needed:
--- chmod 777 some file in /var if needed ----
chmod 777 /var/lib/pacemaker/cores
---------- EXAMPLE PCS STATUS OUTPUT --------
[root@drb3 ~]# pcs status
Cluster name: drbdemo
Stack: corosync
Current DC: drb2 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
Last updated: Wed Oct 4 11:38:16 2017
Last change: Wed Oct 4 11:30:36 2017 by root via cibadmin on drb4
4 nodes configured
4 resources configured
Online: [ drb1 drb2 drb3 drb4 ]
Full list of resources:
Master/Slave Set: block0drbms [block0drb]
Masters: [ drb2 ]
Slaves: [ drb1 drb3 drb4 ]
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@drb3 ~]#
Subscribe to:
Posts (Atom)
Creating ipip tunnels in RedHat 8 that is compatible with LVS/TUN. Every few seasons I built a virtual LVS (Linux Virtual Server) mock up ju...
-
Occasionally we are troubled by failure alerts; since I have installed the check_crm nagios monitor plugin the alert seems more 'sensiti...
-
This is how to set up a dummy interface on RedHat/CentOS 8.x. I cannot make the old style of init scripts in /etc/sysconfig/network-scripts...
-
While testing pacemaker clustering with iscsi (on redhat 8 beta) I came upon this error: Pending Fencing Actions: * unfencing of rh-8beta...