Friday, October 06, 2017

Stress testing drbd online verification

DRB is so nice.  It is really very nice to have this skill available--I would use it personally, at home or at office production.  This is a very nice talent to have, practically in the real world, to be able to string together 2 computers with a network cable and replicate disks from one to the other automatically.

I just stress tested the "online verification" procedure.  Basically I wanted to see how I would formulate a recovery procedure for a corrupted disk.  In summary this is what I did---

1. Configure checksum method for online verification.
2. Perform online verification to compare disks.  This is as simple as typing out "drbdadm verify <resource>" and watching the logs in /var/log/messages.

STRESS TEST.  To make sure I would recover from a failed disk I tested out this scenario:

3.  On node1, stop drbd (systemctl stop drbd)
4. Force a disk corruption, for example dd if=/dev/zero of=/dev/vdb1
5. start drbd (systemctl start drbd)

RECOVERY PROCEDURE: Here is what I came up with as a procedure.

6. drbdadm verify r0 # r0 is the resource name

At this point I would notice the disk corruption in /var/log/messages.

7.  On the "bad" node:

drbdadm secondary r0
drbdadm invalidate r0

This is the summary of procedure (perhaps with some minor detail I forgot about).  After the "invalidate" instructions the disk should sync again.  Just make sure that the correct disk on the correct node is identified and invalidated.

-------
JondZ

No comments:

Creating ipip tunnels in RedHat 8 that is compatible with LVS/TUN. Every few seasons I built a virtual LVS (Linux Virtual Server) mock up ju...