JondZ: March 2017

Friday, March 31, 2017

Expermiment on learning active-active httpd

Not a bad way to spend a friday afternoon..Here are my raw notes. I make up these tech notes for myself and this is not a bad addition:

Fri Mar 31 14:56:32 EDT 2017 LESSON: SIMPLE ACTIVE-ACTIVE HTTPD CLUSTER

This is based on the RedHat manual "Linux 7 High Availability Add On
Administration" except that this follows an active-active setup and assumes
there is a cluster filesystem available.

PACKAGES NEEDED:

wget - needed by pacemaker (for status checks; supposedly "curl" is also
       supported and must be specified by the ocf client= option)
lynx - OPTIONAL; to test status yourself.

ASSUMED CONDITIONS:

- httpd is already installed; furthermore it is enabled and running as stock
via systemd
- there is a clustered gfs filesystem on /volumes/data1 (for common
html content)

PART 1: HTTPD

Set up the document root as desired. In this example, the html documents
are rooted at /volumes/data1/www and is common to all nodes. In the config
file /etc/httpd/conf/httpd.conf:

        DocumentRoot "/volumes/data1/www"
        <Directory "/volumes/data1/www">
            AllowOverride None
            Require all granted
        </Directory>

Put some data on the directory; in this simple example there would be a file
named /volumes/data1/www/index.html

        <html>
        <body>
        <h1>hello</h1>
        This is a test website from JondZ
        </body>
        </html>

At the end of the config, put the following; this is used by pacmaker to check
status.

        <Location /server-status>
        SetHandler server-status
        Order deny,allow
        Deny from all
        Allow from 127.0.0.1
        </Location>

Use lynx to check that it works:

        lynx http://127.0.0.1/server-status

When satisfied that things are working, disable httpd activation by systemd;
the service will be managed by pacemaker instead.

        systemctl disable httpd
        systemctl stop httpd

PART 2: LOGROTATE

Edit the file /etc/logrotate.d/httpd and modify the "postrotate" section:

        # This is the old stuff. Comment this out.
        # Since httpd is going to be managed
        # by pacemaker, not by systemd, this is no longer valid:
        #
        # /bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true
        #
        # This is the correct line that RedHat recommends. Note that
        # the PID file is produced by pacemaker (or httpd itself?) and is
        # probably true only as long as httpd is not managed by systemd.
        #
        /usr/sbin/httpd -f /etc/httpd/conf/httpd.conf -c \
        "PidFile /var/run/httpd.pid" -k graceful > /dev/null 2>/dev/null \
        || true
        #
        # This is how I personally respawn apache on old production systems
        # but this NO LONGER RELIABLY WORKS (need testing).
        #
    # /sbin/apachectl graceful > /dev/null 2>/dev/null && true


Test logrotate. First of all make sure that /var/run/httpd.pid is current.
Then force rotations by "logrotate -f /etc/logrotate.conf". Also watch the
pid changes on the httpd processes (on a separate terminal you could say
watch -n1 "ps -efww | grep httpd" and watch the pid's being replaced.

PART 3: PCS RESOURCE ENTRY

I added the pcs resource as follows:

       pcs resource create batwww apache
       configfile="/etc/httpd/conf/httpd.conf"
       statusurl="http://127.0.0.1/server-status" clone

The option "clone" makes the httpd run an all nodes (instead of just one
instance).

JondZ 201703

Thursday, March 30, 2017

Old Fashioned

I value my data: I possses an organizer that has no internet connection and I use a tape drive for backup.

Unlike modern gadgets, I do not have to recharge my organizer every day. It also does not require backlight so is easier on my eyes. I also do no trust the "cloud". I had an android-based cell phone password organizer a while ago: not any more.

Tape is very cheap and I do not have to worry about having to replace spinning disks every 2-5 years. Tape is still the last expensive option and is extremely easy to use. I can just type this in the morning:

screen
tar cvfpb /dev/st1 128 files...

Contrary to popular myths, tape drives are actually fast. I have a slow server (an athlon 5350 motherboard) and slow disk (actually a QLA iSCSI over a NetGear NAS device, and I measure tape speed at 20 to 30 MegaBytes per second. It sounds as if my server cannot deliver the bytes fast enough, resulting in a motor pause every few seconds. That implies that the LTO-3 tape drive is capable of more thoroughput. In my case, I might also use an LTO-1 drive for smaller jobs just to keep the motor humming nicely.

Tape drive (LTO-3) bought from ebay.


A Palm Organizer

Wednesday, March 22, 2017

learning experiments on gfs2 clustering: no-quorum-policy,interleave, ordered

It has been probably a week of gfs2 (global filesystem, 2) crash course in my personal study on clustered filesystems. Here is an in depth experiment result on 3 detail points that is mentioned in the RedHat manual:

Point 1: set no-quorum-policy to freeze
Point 2: when creating dlm and clvmd clones, set interleave=true
Point 3: when creating dlm and clvmd clones, set ordered=true

Experiments and explanations:

Point 1: What does "no-quorum-policy=freeze" do?

To differentiate a "freeze" with something else, a gfs2 cluster filesystem is tested with the following two options:

pcs property set no-quorum-policy=stop
pcs property set no-quorum-policy=freeze

With "stop", the resources are stopped, resulting in the gfs2 filesystems being unmounted (because the filesystems are just services).

With "freeze", I/O is blocked, until the problem is corrected. Specifically, commands like this are frozen:

ls > /path/to/gfs2/filesystem/sample-output.txt

When the problem gets fixed and the cluster becomes quorate again, the command resumes normally.

Point 2: interleave=true

This is the parameter that caused me much grief for a day or so. When I had my first sucessful gfs2 clustered filesystem configured i was dissapointed that the filesystems were being unmounted when the nodes re-enter the cluster. I found the answer by searching the web: ALL instances dlm or/and clvmd clones need to restart before ANY gfs mount, when interleave = false.

So basically if resource2 clone is dependent on resource1 clone, and interleave=false, then ALL instances of resource1 has to be present before ANY instance of resource1. This results in the gfs2 filesystems being unmounted and re-mounted (in our example, where resource1 is the gfs2 mounts).

Thank you for the person who posted it, which I found on google.

Point 3: ordered=true

I have no observable difference to report, seems to make no difference either way. I have tested this to true and false and the lm/clvm processes seem to start the same way on all nodes.

JondZ 201603

Monday, March 13, 2017

DISCARD EFFECT ON THIN VOLUMES
Notes by JondZ
2017-03-14

This note was prompted by my need for the use of SNAPPER to protect a massive amount of data. This morning I realized the very good effect ofdiscard in space-savings as when dealing with Terabytes of data it is good to save as much space as possible

In this example the thin POOL is tp1 and the thin VOLUME of interest is te1. It is like this because I am merely testing out a configuration thatalready exists.

These are dumped-out unedited notes.

INITIAL CONDITIONS
------------------------------------------------------------------------
te1 is a 1-Gig (thin) disk mounted on /volumes/te1.
The actual, physical thin volume POOL is sized at 10.35 Gigs right now.

lvs -a | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         4.77
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             22.62 15.28
[tp1_tdata]     bmof Twi-ao---- 10.35g
[tp1_tmeta]     bmof ewi-ao----   8.00m
root@epike-OptiPlex-7040:/volumes/te1#

EFFECT 1: A 500-MEG FILE INSERTED
-------------------------------------------------------------------
Notice the increase in useage of "te1" now up to 52.36.   The thin volume tp1 increased as well, 27.22 usage.

dd if=/dev/zero of=500MFILE21 bs=1024 count=500000

root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs -a | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         52.36
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             27.22 18.26
[tp1_tdata]     bmof Twi-ao---- 10.35g
[tp1_tmeta]     bmof ewi-ao----   8.00m
root@epike-OptiPlex-7040:/volumes/te1#

EFFECT 2: 500 FILE REMOVED
---------------------------------------------------------------------
Removing a file did not reduce the Thin volume usage. The numbers are the same for the pool use percentages.

root@epike-OptiPlex-7040:/volumes/te1# rm 500MFILE21
root@epike-OptiPlex-7040:/volumes/te1# df -h -P .
Filesystem            Size Used Avail Use% Mounted on
/dev/mapper/bmof-te1 976M 1.3M 908M   1% /volumes/te1
root@epike-OptiPlex-7040:/volumes/te1#

root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs -a | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         52.45
te1-snapshot1   bmof Vri---tz-k   1.00g tp1 te1
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             27.23 18.46
[tp1_tdata]     bmof Twi-ao---- 10.35g
[tp1_tmeta]     bmof ewi-ao----   8.00m
root@epike-OptiPlex-7040:/volumes/te1#

EFFECT 3: fstrim
-----------------------------------------------------------------
FSTRIM will reclaim spaces on the thin volume AND the thin pool:

root@epike-OptiPlex-7040:/volumes/te1# fstrim -v /volumes/te1
/volumes/te1: 607.2 MiB (636727296 bytes) trimmed
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs -a | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         4.77
te1-snapshot1   bmof Vri---tz-k   1.00g tp1 te1
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             27.23 18.55
[tp1_tdata]     bmof Twi-ao---- 10.35g
[tp1_tmeta]     bmof ewi-ao----   8.00m

Well it does not show here, but I recall that the thin POOL also is reduced. Perhaps the snapshot gets in the way? It has been put there automatically while I was composing this text.

There, much better:

root@epike-OptiPlex-7040:/volumes/te1# snapper -c te1 delete 1
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs -a | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         4.77
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             22.62 15.28
[tp1_tdata]     bmof Twi-ao---- 10.35g
[tp1_tmeta]     bmof ewi-ao----   8.00m
root@epike-OptiPlex-7040:/volumes/te1# fstrim -v /volumes/te1
/volumes/te1: 239.4 MiB (251031552 bytes) trimmed
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs -a | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         4.77
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             22.62 15.28
[tp1_tdata]     bmof Twi-ao---- 10.35g
[tp1_tmeta]     bmof ewi-ao----   8.00m
root@epike-OptiPlex-7040:/volumes/te1#

The numbers are down to 4.77 consumed on the Thin VOLUME, and 22.62 percent on the thin POOL.

EFFECT 4: mount with DISCARD option automatically reclaims THIN space
----------------------------------------------------------------------
This example demonstrates that thin volume space are automatically reclaimed
and returned to POOL automatically, without needing to manually run fstrim.

root@epike-OptiPlex-7040:/volumes/te1# mount -o remount,discard /dev/mapper/bmof-te1

root@epike-OptiPlex-7040:/volumes/te1# !dd
dd if=/dev/zero of=500MFILE24 bs=1024 count=500000
500000+0 records in
500000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 0.553593 s, 925 MB/s
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs -a | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         52.39
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             27.22 18.26
[tp1_tdata]     bmof Twi-ao---- 10.35g
[tp1_tmeta]     bmof ewi-ao----   8.00m
root@epike-OptiPlex-7040:/volumes/te1# rm 500MFILE24
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs -a | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         52.39
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             27.22 18.26
[tp1_tdata]     bmof Twi-ao---- 10.35g
[tp1_tmeta]     bmof ewi-ao----   8.00m
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs -a | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         4.79
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             22.62 15.28
[tp1_tdata]     bmof Twi-ao---- 10.35g
[tp1_tmeta]     bmof ewi-ao----   8.00m
root@epike-OptiPlex-7040:/volumes/te1# ls
lost+found
root@epike-OptiPlex-7040:/volumes/te1#

-----------------------------------------------------------------------
But does the reclamation work thru snaphots layers? Well it would be difficult to test all combinations, but lets at least verify that the spaces are reclaimed when the snaphots are deleted.

First, mount with the discard mode

root@epike-OptiPlex-7040:~# !mount
mount -o remount,discard /dev/mapper/bmof-te1
root@epike-OptiPlex-7040:~#

the initial conditions are:

te1             bmof Vwi-aotz--   1.00g tp1         4.77
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             22.62 15.28

Ok, so LV is 4.77 percent, LV POOL is 22.62 percent.

So..consume space.

root@epike-OptiPlex-7040:/volumes/te1# !dd
dd if=/dev/zero of=500MFILE24 bs=1024 count=500000
500000+0 records in
500000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 0.559463 s, 915 MB/s
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         52.37
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             27.22 18.26
root@epike-OptiPlex-7040:/volumes/te1#

snapshot, and consume space some more..

root@epike-OptiPlex-7040:/volumes/te1# snapper -c te1 create
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         52.45
te1-snapshot1   bmof Vri---tz-k   1.00g tp1 te1
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             27.23 18.36
root@epike-OptiPlex-7040:/volumes/te1#

root@epike-OptiPlex-7040:/volumes/te1# !dd:p
dd if=/dev/zero of=500MFILE24 bs=1024 count=500000
root@epike-OptiPlex-7040:/volumes/te1# dd if=/dev/zero of=200mfile bs=1024 count=200000
200000+0 records in
200000+0 records out
204800000 bytes (205 MB, 195 MiB) copied, 0.211273 s, 969 MB/s

root@epike-OptiPlex-7040:/volumes/te1# !snapper
snapper -c te1 create
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         71.53
te1-snapshot1   bmof Vri---tz-k   1.00g tp1 te1
te1-snapshot2   bmof Vri---tz-k   1.00g tp1 te1
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             29.08 19.82
root@epike-OptiPlex-7040:/volumes/te1#

Then remove the files. The numbers should not go down since there are snap volumes.

root@epike-OptiPlex-7040:/volumes/te1# rm 200mfile 500MFILE24
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         4.78
te1-snapshot1   bmof Vri---tz-k   1.00g tp1 te1
te1-snapshot2   bmof Vri---tz-k   1.00g tp1 te1
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             29.08 20.12

Ok so I stand corrected: The LVM VOLUME usage went down, but the LVM POOL did not.
That actually makes sense since the snapshot consumes the space.
What happens when the snaphots are removed, are the spaces reclaimed into the thin POOL?

root@epike-OptiPlex-7040:/volumes/te1# snapper -c te1 delete 1
root@epike-OptiPlex-7040:/volumes/te1# snapper -c te1 delete 2
root@epike-OptiPlex-7040:/volumes/te1# !lvs
lvs | grep tp1
te1             bmof Vwi-aotz--   1.00g tp1         4.78
te2             bmof Vwi-aotz--   1.00g tp1         97.66
te3             bmof Vwi-aotz--   1.00g tp1         4.75
te4             bmof Vwi-aotz--   3.00g tp1         42.32
tp1             bmof twi-aotz-- 10.35g             22.62 15.28
root@epike-OptiPlex-7040:/volumes/te1#

It does!!! When the snap volumes are removed, the spaces are reclaimed into the thin pool.

CONCLUSION:
--------------------
When working with thin volumes, use DISCARD mount option, even (or, especially) when not using SSD's.

OTHER TESTS
-----------
I tested mounting normally, consume space, then mount with discard option. What happens is that the space are not automatically reclaimed just by mounting. fstrim needs to run, and snapshots need to be deleted.   Still there is no harm and in fact an advantage to add "discard" option in fstab even for existing (thin volume) mounts.

JondZ 20170314