JondZ: Stress testing drbd online verification

Friday, October 06, 2017

Stress testing drbd online verification

DRB is so nice. It is really very nice to have this skill available--I would use it personally, at home or at office production. This is a very nice talent to have, practically in the real world, to be able to string together 2 computers with a network cable and replicate disks from one to the other automatically.

I just stress tested the "online verification" procedure. Basically I wanted to see how I would formulate a recovery procedure for a corrupted disk. In summary this is what I did---

1. Configure checksum method for online verification.
2. Perform online verification to compare disks. This is as simple as typing out "drbdadm verify <resource>" and watching the logs in /var/log/messages.

STRESS TEST. To make sure I would recover from a failed disk I tested out this scenario:

3. On node1, stop drbd (systemctl stop drbd)
4. Force a disk corruption, for example dd if=/dev/zero of=/dev/vdb1
5. start drbd (systemctl start drbd)

RECOVERY PROCEDURE: Here is what I came up with as a procedure.

6. drbdadm verify r0 # r0 is the resource name

At this point I would notice the disk corruption in /var/log/messages.

7. On the "bad" node:

drbdadm secondary r0
drbdadm invalidate r0

This is the summary of procedure (perhaps with some minor detail I forgot about). After the "invalidate" instructions the disk should sync again. Just make sure that the correct disk on the correct node is identified and invalidated.

-------
JondZ

JondZ

Friday, October 06, 2017

Stress testing drbd online verification

No comments:

Search This Blog