Monday, March 08, 2021

Creating ipip tunnels in RedHat 8 that is compatible with LVS/TUN.

Every few seasons I built a virtual LVS (Linux Virtual Server) mock up just to keep in shape.  LVS-TUN (lvs/tunnel mode) ipip configuration for RH8 had me stumped for a while.  

There was a change from RH7 to RH8 in that network scripts are being deprecated.  So, /etc/sysconfig/ network scripts behave just a bit different.  For one thing I could not build a reliable ipip tunnel interface in RH8 even after installing and using traditional RH-style network scripts.  

After a lot of difficult tests I am posting the solution for keeping up an ipip tunnel interface for RH8.  Here it is:

# In this example 192.168.131.239 is the RIP (Real Server IP) IP address, 
# 192.168.131.235 is the VIP address.
nmcli con add type ip-tunnel ip-tunnel.mode ipip con-name tun0
          ifname tun0 remote 0.0.0.0 local 192.168.131.239
nmcli con mod tun0 ipv4.addresses 192.168.131.235
nmcli con mod tun0 ipv4.method manual
nmcli con up tun0

The key thing to remember is that the remote ip should be 0.0.0.0.    In RH7 you could use an ipip for LVS-TUN without specifying remote and local IP's.  

If this helps somebody out there with a similar problem, please let me know by replying to this blog.



Wednesday, February 17, 2021

Raspberry pi simple fan control script

Here is a simple fan control script that I made to turn on and off a raspberry pi fan.  It turns on a fan when the temp reaches a threshold (in this case, 60 degrees).  

#!/bin/bash
#
# Wed Feb 17 19:14:54 EST 2021 JondZ very simple fan controller
GPIO_PIN=4
GPIO_NAME=gpio${GPIO_PIN}
GPIO_PATH=/sys/class/gpio/$GPIO_NAME/value
TEMP_THRESHOLD=60000
TEMP_SOURCE_FILE=/sys/class/thermal/thermal_zone0/temp
# 600: 10 minutes
# 900: 15 minutes
# 1800: 30 minutes
SLEEP_TIME=600
SLEEP_POLL=10
function get_temp {
         if [[ -e $TEMP_SOURCE_FILE ]] ; then
            cat $TEMP_SOURCE_FILE
         fi
         }
function init_ports {
         if [[ ! -e /sys/class/gpio/$GPIO_NAME ]]; then
            echo $GPIO_PIN > /sys/class/gpio/export && \
            sleep 5 && \
            echo out > /sys/class/gpio/$GPIO_NAME/direction
         fi
         }
init_ports
function turn_on_fan {
         temp=$(get_temp)
         if [[ -e $GPIO_PATH ]]; then
            echo "$(date) fan on ($temp)"
            v=$(< $GPIO_PATH )
            if [[ $v == 0 ]]; then
               echo 1 > $GPIO_PATH
            fi
         fi
         }
function turn_off_fan {
         temp=$(get_temp)
         if [[ -e $GPIO_PATH ]]; then
            v=$(< $GPIO_PATH )
            if [[ $v == 1 ]]; then
               echo "$(date) fan off ($temp)"
               echo 0 > $GPIO_PATH
            fi
         fi
         }
while :
   do
   CT=$(get_temp)
   if [[ $TEMP_THRESHOLD -lt $CT ]]; then
      turn_on_fan
      sleep $SLEEP_TIME
   else
      turn_off_fan
   fi
   sleep $SLEEP_POLL
   done
 

Here is the systemd script:

[Unit]
Description=Simple Fan Controller
DefaultDependencies=no
After=sysinit.target

[Service]
Type=simple
ExecStart=/root/scriptsd/system-fan-control.sh

[Install]
WantedBy=graphical.target multi-user.target

As far as I can see it is working fine.

Here is a picture of the circuit that i soldered.  I copied the circuit from some other site on the web.   This comprises of a resistor, a transistor, and a diode.


Here is what the fan circuit looks like on my setup.  The case is already crowded with an OLED display and an RTC clock:



There may be minor bugs.  I was tired from soldering so I only did the coding in a few minutes' rush so comments are welcome...

EP 20210217


Sunday, February 16, 2020

RH/CentOS 8 dummy network interface

This is how to set up a dummy interface on RedHat/CentOS 8.x.  I cannot make the old style of init scripts in /etc/sysconfig/network-scripts/ work anymore for the dummy network interface.

Its all NetworkManager now--here is what I pieced together from surfing the web.  This one stays up even after reboot:

nmcli connection add type dummy ifname dummy0 con-name dummy0
nmcli con mod dummy0 ipv4.address 192.168.8.88/32
nmcli con mod dummy0 ipv4.method manual autoconnect yes
nmcli con up dummy0
nmcli con show


Monday, September 16, 2019

Old DDS/4mm Tape drive speed issues

Ok, so I bought a tape drive from ebay..its a Dell PowerVault 100T dds4 tape drive (internal seagate std1400lw).  I really like the 4mm tape format because theyre so small.  Just perfect for freezing my usual data output (some notes, documents).

At first I thought I bought a lemon, since the drive was super slow.  As usual I played around with block sizes.  Normally I leave the blocksize to 512, which is usually what I find tape drives to have the default set up as.  I usually would do this:

mt -f /dev/st1 setblk 512  # if necessary
tar cvfp /dev/st1 ....

Or, sometimes, If I want the default tar blocking factor to be the same as the hardware block size:

mt -f /dev/st1 setblk 10240 # thats 20 tar blocks of 512 each
tar cvfp /dev/st1 ...

But the tape drive is SUPER SLOW no matter how I tried...Until I tried 4096 (page size?) on a whim.  And this drive just flew!  So this is how to make this drive spin fast:

mt -f /dev/st1 setblk 4096 # page size
tar cvfpb /dev/st1 8 ...
tar tvfb /dev/st1 8 .....

What a surprise that was.. and I was using tapes for years.

--------------UPDATE-------------

20190918

The slow speed is for READING BACK DATA, not writing them.  For some reason the tape drive stalls delivering data for data transfers > 4k.  Therefore, to read back data, specify transfer buffers of 4k or less:

mbuffer_style#  mbuffer -s 4096 -i /dev/st2 | tar tvf -
normal_tar# tar xvfpb /dev/st2 8



JondZ 20190917

Wednesday, July 03, 2019

pacemaker fail count timing experiments

Occasionally we are troubled by failure alerts; since I have installed the check_crm nagios monitor plugin the alert seems more 'sensitive'.  I have come to understand that pacemaker needs manual intervention when things fail.  The things that especially fail on our site is the vmware fencing---every few weeks one or two logins to vmware would fail, forcing me to login and issue a "pcs resource cleanup" to reset the failure.

I am doing this experiment now to undestand the parameters that I need to adjust on a live system.  These observations are done on a RH7.6 cluster, with a dummy shell script as a resource.

dummy shell script, configured to fail:

#! /bin/bash
exit 1 # <-- this is removed when we want the result to succeed
while :
  do
  date
  sleep 5
  done > /tmp/do_nothing_log


OBSERVATIONS 1:

  • CLUSTER CONDITIONS: cluster property "start-failure-is-fatal" to defaults (true).
  • RESOURCE CONDITIONS: defaults
  • RESULT: node1 is tried ONCE, node2 is tried ONCE, then nothing is tried again.  When a resource fails the fail count is immediately set to INFINITY (1000000).  This is why the documentation says "the node will no longer be allowed to run the failed resource" until a manual intervention happens.
OBSERVATIONS 2:

  •  CLUSTER CONDITIONS: "start-failure-is-fatal" to FALSE ("pcs property set start-failure-is-fatal=false; pcs resource cleanup")
  • RESOURCE CONDITIONS: defaults
  • RESULT: resource is tried to restart on node1 nonstop (to infinity?).  It does not appear to be attempted to restart on another node.
OBSERVATIONS 3:
  • CLUSTER CONDITIONS:  start-failure-is-fatal" to FALSE ("pcs property set start-failure-is-fatal=false; pcs resource cleanup")
  • RESOURCE CONDITIONS:  migration-threshold=10 ("pcs resource update resname meta migration-threshold=10; pcs resource cleanup")
  • RESULT:  Resource is retried 10 times on node1, then retried 10 times on node2, then retried no longer.
OBSERVATIONS 4:
  • CLUSTER CONDITIONS:  start-failure-is-fatal=false,  cluster-recheck-interval=180.
  • RESOURCE CONDITIONS:  migration-threshold=10 and failure-timeout=2min ("pcs resource update resname  meta failure-timeout=2min")
  • RESULT: Resource is retried 10 times on node 1, 10 times on node2.  Errors are cleared after 2 minutes.  After that, resource is tried ONCE for node1 but 10 times on node2 every  cluster-recheck-interval (3 minutes).  Thats because the errors condition is gone but the counters do not necessarily reset (but sometimes they do on other nodes, when its tried on one node).
GROUP RESOURCES CONSIDERATION:
  • I an unable to apply the resources migration-threshold, failure-timeout at this moment.  It seems to still be a property of the individual resources. 
  • Update resource meta resname as usual regardless of whether it is part of a group or not; behavior should be as expected (group proceeds from one resource to the next in the list anyway).

JondZ 20190703

Friday, March 15, 2019

pacemaker unfencing errors

While testing pacemaker clustering with iscsi (on redhat 8 beta) I came upon this error:

Pending Fencing Actions:
* unfencing of rh-8beta-b pending: client=pacemaker-controld.2603, origin=rh-8beta-b

It took me almost the whole morning to understand how to clear the error.  Since the stonith resource includes the clause "meta provides=unfencing", this means that the fencing agent should account for unfencing, meaning we should simply reboot the node (rh-8beta-b in this case).

RedHat documentation explains this as well: " ...The act of booting in this case implies that unfencing occurred..."

Wednesday, October 31, 2018

VDO Troubles

Once in a while, on very large VDO volumes, storage just dissapears.  In my case I had one 42TB volume that just refused to boot.  Here are some of the symptoms:
  • Boot dropped single user into a shell
  • "vdo start" says the vdo volume was already started, but there is no entry in /dev/mapper/.
In this case I retried "vdo stop" then "vdo start" a few times until the vdo volume became available.

Further observation implies that systemd kills the vdo startup process.  What happened was that vdo was interrupted on its long boot up---a timing issue.

A working solution for me was to include a timeout value in /etc/fstab for the particular vdo volumes:

# /etc/fstab
 /dev/vgsample/lvsample /somedirectory ext4 defaults,discard,x-systemd.requires=vdo.service,x-systemd.device-timeout=10min 0 0

Working so far, survived a few reboots.

JondZ 20181031

Sunday, September 02, 2018

Clustering legacy websites with keepalived and haproxy.

We had a legacy website we could not get rid of, it was very stable on debian lenny 5.0.  After much grief I managed to compile and RPM version of php 5.2 which it needed --- not php 5.3, not 5. anything, it HAS to be 5.2.  Ok so, finally I was able to get it running in CentOS 6.

I also clustered the filesystem using gfs2.  I did not remember until  yesterday that I configured this server to also server SMB share.  It was like 5 years ago and users are still on it!  So I also had to use clustered samba (CTDB).  Finally I got it working again.

I was able to get this thing clustered using keepalived cookie tricks--basically haproxy tricks users browser into connecting to only one and the same backend webserver, but different users connect to different backends.  Thats basically Active/Active already.  I just finished and tested this right now even on a holiday weekend.

So what we have now is

---ext ip ----- frontserver1/frontserver2 ------- webserver1/webserver2

The webservers themselvfes also host mysql.  Basically a four-pack HA setup.  I tested rebooting nodes and the websites continue to work.

Its good unfortunately a lot of IP addresses were allocated.  about 7 instead of just one .  But everythings virtual so I guess thats ok.

JondZ

Tuesday, August 28, 2018

a day at the office, spare rh ha license

Ok so I finally got HA/GFS2 nodes at work.  I'm getting a lot of mileage out of the licenses I was allowed to use, more than I expected.  First of all, a RH "unit" actually counts as one half when ran as virtual machines.  This enabled me to run a 6-pack stack consisting of 2 routers (keepalive/haproxy), 2 fileservers (GFS2), and 2 PGSQL (Active/Passive nodes).

Today I discovered that the RH Resilient Storage (GFS2) license already includes HA (pacemaker) license.  I called RH support to verify and make sure.  So, I can take that license and build corosync Q devices.  Our 2-node clusters are going to be upgraded with Q devices. 

It is a nice day.

Wednesday, May 02, 2018

vdo in an lvm sandwich

I have finished migrating one of the backup servers that we have (where I work) from a standard LVM setup to one that has a VDO engine.  It turns out the compression ratio was 41% or something like 10TB physical to 25T logical blocks--or something like that if I undertood the vdostats output correctly.

Someday I might post a quick how-to here.  I already posted a maintenance document in out internal website so i am a little tired right now.

Anyhow what I ended up doing was an LVM-over-VDO-over-LVM setup like this:

--------------------------------------------------
LVM disks -- actual exposed disks -- /data1 /data2, etc
-------------------------------------------------------
VGS (actual usable volume group, vg1 for example)
-------------------------------------------------------
vdo1 (dedup/compression/zero-elimination)
-------------------------------------------------------
Logical Volume /dev/vg0base/disk1
-------------------------------------------------------
VGS (vg0base for example)
-------------------------------------------------------
PHYSICAL DISKS
-------------------------------------------------------
What I like about this (except for the added complexity) is that the scheme doesnt really change existing setups too much. Provisioning is done in a normal way starting from the vg1 volume group above. After set up I can pretty much forget about the lower levels and do the usual disk creation such as "lvcreate -n disk1 -L 1G vg1".

The advantages of this setup are as follows:

1. Physical disks can be added if needed. Unfortunately I had to write a maintenance manual (where I work) becuase all the elements of the stack need to be expanded as well.
2. The VDO layer is easy to expand. I wanted to over provision I can simply expand the logical size of vdo1. The upper layers will follow the size (though not automatically--maintenance manual again).

Since there was existing data already (of about 25T) I had to bootstrap from a small existing space and slowly expand the vdo layer.  Fortunately the backup data were contained in separate partitions (/volumes/data{1,2,3,4}) so I was able to incrementaly migrat them, though the whole process took me a week to accomplish.


jondz

Wednesday, February 28, 2018

host to host nc/tar tricks

Here are some tricks to transfer files between hosts using tar.  In the old days by habit I would just do something like (tar cvfp - ) | ( ssh host "cd /path/to" && tar xvfp - ) but here are even more simpler things, assuming you have open terminals on both hosts.

CASE 1: SENDER IS LISTENING
Here are some commands that will work; the sender command needs to be typed first then it will block until a consumer appears on the other side of the pipe.

sender% tar cvf - . | nc -l -p 1234
sender% tar cvf >(nc -l -p 1234) . # process substitution

receiver% tar xvf - < /dev/tcp/ip-of-sender/1234
receiver% tar xvf <( dd obs=10240 < /dev/tcp/ip-of-sender/1234)

The magic number 10240 is 512*20, or the default 20 512-block records of the tar command.

CASE 2. RECEIVER IS LISTENING

This is the case when you are on the receiving end and want to initiate a receive before switching over to the sending terminal.  Type the receiver command first and IO will block until there is a sender:

receiver% nc -l -p 1234 | tar xvf -

sender% tar cvf - . > /dev/tcp/host-or-ip-of-receiver/1234
sender% < /dev/tcp/host-or-ip/1234 tar cvf - .
sender% tar cvf >(cat > /dev/tcp/host-or-ip/1234) .

NOTES:
Sometimes it is necessary to add "-N" option to nc, depending on platform/distro.

Edward 20180228


Saturday, December 16, 2017

tape drive over iscsi problems

One of the problem with running a tape drive as an iscsi target is that there was no software I found that worked.  I tried IETD, TGT, and of course TARGETCLI.  After thinking and googling about this problem for about a day or so I decided to see if I could patch the python code on which targetcli runs.  I am suprised I can still code!!! This tool me perhaps half an hour to figure, it has been a long time since i wrote any thing. 

This file is ... rtslib/utils.py

-----------PATCH 1 of 2 --------------------------------------
1. In function convert_scsi_path_to_hctl

OLD:
    try:
        hctl = os.listdir("/sys/block/%s/device/scsi_device"
                          % devname)[0].split(':')
    except:
        return None
    return [int(data) for data in hctl]

NEW:
    try:
       hctl = os.listdir("/sys/block/%s/device/scsi_device"
                             % devname)[0].split(':')
       return [int(data) for data in hctl]
    except OSError: pass

    try:
       hctl = os.listdir("/sys/class/scsi_tape/%s/device/scsi_device"
                          % devname)[0].split(':')
       return [int(data) for data in hctl]
    except OSError: pass

    return None

-----------PATCH 2 of 2 --------------------------------------
In function convert_scsi_hctl_to_path
OLD:
    for devname in os.listdir("/sys/block"):
        path = "/dev/%s" % devname
        hctl = [host, controller, target, lun]
        if convert_scsi_path_to_hctl(path) == hctl:
            return os.path.realpath(path)
NEW:
    for devname in os.listdir("/sys/block"):
        path = "/dev/%s" % devname
        hctl = [host, controller, target, lun]
        if convert_scsi_path_to_hctl(path) == hctl:
            return os.path.realpath(path)
    try:
        for devname in os.listdir("/sys/class/scsi_tape"):
            path = "/dev/%s" % devname
            hctl = [host, controller, target, lun]
            if convert_scsi_path_to_hctl(path) == hctl:
                return os.path.realpath(path)
    except OSError: pass

Friday, December 01, 2017

pgsql archive command notes

This is a critique of documented "archive_command" usage in pgsql.  There is an example which says:

  archive_command = 'test ! -f /destination/%f && cp %p /destination/%f"

I wouldnt use this.  The problem is if the disk is full, I tested cp to produce short files (partial files).  Even if it does exit with a nonzero exit code, the next copy attempt will be seen as a success ("test ! -f __" is false since the file is already there). 
What I would do is use rsync.  In its default setting, it does not create short files:

  archive_command = 'test ! -f /destination/%f && rsync %p /destination/%f"

ep

Sunday, October 15, 2017

drbd and lvm: so many combinations

On vacation from a major dental surgery I am currently learning and  testing these three DRBD/LVM combinations and thinking about which one I would use on a real production setup. 

1. DRBD over plain device
2. DRBD over LVM
3. LVM over DRBD
4. LVM over DRBD over LVM

1. DRBD over plain device.  This puts actual device names such as sdb1 in the drbd configuration.  I dont like that.  There are ways around this such as using multipath or using /dev/disk/by-id.  I havent tested those yet with drbd but the point is the actual device names are in the configuration files and they had better agree with the real devices (after years of uptime and changeover of sysadmins :).

2. DRBD over LVM.  This puts an abstraction layer at the lowest layer and it avoids having to place actual device names in drbd resource files.   For example:

/etc/drbd.d/some-resource.res

resource __ {
  ...
  ...
  device /dev/drbd0
  disk /dev/vg/lvdisk0
  ...
  }

There you go, no /dev/sdb1 or whatever in the disk configuration.  This avoids problems arising from devices switching device names on reboot.

3. LVM over DRBD

As the name implies, puts the flexibility of provisioning in the drbd layer, where it is closer to the application.  It would make typical provisioning such as disk allocation, destruction, extensions and shrinkings much easier.  Howerer I still do not like writing device names in the config files...

4. LVM over  DRBD over LVM.

LVM over DRBD over LVM is probably the most flexible solution.  There are no actual device names in DRBD configuration; LVM is very much resilient with machine restarts due to its auto detection of metadata in whatever order the physical disks comes up in.   With this combinations I can re arrange the phsyical backing storage and at the same time have the flexibility of LVM on the upper layer.  The only issue is having to ADJUST STUFF IN /etc/lvm/lvm.conf. 

in /etc/lvm/lvm.conf

    # filter example -- 
    # /dev/vd* on the physical layer, 
    # /dev/drbd* on the drbd layer
    filter = [ "a|^/dev/vd.*|", "a|drbd.*|", "r|.*/|" ]
    write_cache_state = 0
    use_lvmetad = 0

Just a few lines of config.  This is fine.   The  problem is having to remember what all these configuration mean after 2 years of uptime...

---

JondZ 20171015



Wednesday, October 11, 2017

reducing lvm drbd disk size

Here is a snippet of my notes for reducing drbd disk size (assuming that the physical device is on LVM which can be resized). 

Just remember that a drbd device is a container, and HAS METADATA.  Therefore think about it as a filesystem.  Also, this procedure will only work if the disks are ONLINE (the disks are attached, and drbd is running).

In this example, A filesystem has only 100 megs worth of data; we want to shrink the physical store down from 500 to about 120 Megs.

WARNING: This procedure can be destructive is done wrong.

1. Note the filesytem consumed size.  For this example the filesystem contains 100M worth fo data.  Shrink the filesystem.  Note that -M resizes to minimum size.

   umount /dev/drbd0
   fsck -f /dev/drbd0
   resize2fs -M /dev/drbd0

At this point the filesytem on /dev/drbd0 should be at the minimum (i.e., close to the consumed size---about 100 MB in this example).  If you are not sure, mount the fileystem again and use "df" or use tune2fs (if ext4) to MAKE SURE. 

2. Resize the drbd device.  Make sure it is higher than the fileystem size because drbd also uses disk space for metadata!

   drbdadm -- --size=110M resize r0

If you would type in "lsblk" at this point, drbd0 should show about 110M.

3. Shrink the physical backing device to a bit higher than the drbd device:

   on first node (drb7): lvresize -L 120M /dev/drt7/disk1
   on first node (drb8): lvresize -L 120M /dev/drt8/disk1

4. Size up the drbd device to use up all available LV space:

   drbdadm resize r0

4. Finally size up the filesystem:

   resize2fs /dev/drbd0

5. Mount and verify that the filesystem is indeed 120 Megs.

Friday, October 06, 2017

Stress testing drbd online verification

DRB is so nice.  It is really very nice to have this skill available--I would use it personally, at home or at office production.  This is a very nice talent to have, practically in the real world, to be able to string together 2 computers with a network cable and replicate disks from one to the other automatically.

I just stress tested the "online verification" procedure.  Basically I wanted to see how I would formulate a recovery procedure for a corrupted disk.  In summary this is what I did---

1. Configure checksum method for online verification.
2. Perform online verification to compare disks.  This is as simple as typing out "drbdadm verify <resource>" and watching the logs in /var/log/messages.

STRESS TEST.  To make sure I would recover from a failed disk I tested out this scenario:

3.  On node1, stop drbd (systemctl stop drbd)
4. Force a disk corruption, for example dd if=/dev/zero of=/dev/vdb1
5. start drbd (systemctl start drbd)

RECOVERY PROCEDURE: Here is what I came up with as a procedure.

6. drbdadm verify r0 # r0 is the resource name

At this point I would notice the disk corruption in /var/log/messages.

7.  On the "bad" node:

drbdadm secondary r0
drbdadm invalidate r0

This is the summary of procedure (perhaps with some minor detail I forgot about).  After the "invalidate" instructions the disk should sync again.  Just make sure that the correct disk on the correct node is identified and invalidated.

-------
JondZ

Thursday, October 05, 2017

drbd diskless mode

I am still scratching my head over this one--that it is actually possible.  Sure I ran diskless stuff like iSCSI with special hardware cards before, but drbd?

I detached the disk and then turned the resource into active.  So basically the node, without a disk, is talking to a node, with a disk, and pretending that the disk is local:

[root@drb6 tmp]# drbdadm status
r0 role:Primary
  disk:Diskless
  drb5 role:Secondary
    peer-disk:UpToDate

[root@drb6 tmp]# df
Filesystem          1K-blocks    Used Available Use% Mounted on
/dev/drbd0            1014612   33860    964368   4% /mnt/tmp



On this node, /dev/drbd0 has no disk backing store for drbd0.  drbd0 is for all practical purposes a normal block device.

That is amazing...


jondz

Wednesday, October 04, 2017

My first DRBD cluster test

Here is my first cluster.  It took me the WHOLE MORNING to figure out (I misunderstood the meaning of the "clone-node-max" property).  Anyhow this is a 4-node active/passive drbd storage cluster.

In this example, only the Primary (pcs "Master") can use the block device at any one time.  The nodes work correctly in that nodes are promoted/demoted as expected when they leave/enter the cluster.

I will have to re-do this entire thing from scratch to make sure I can do it again and keep notes (so many things to remember!).  I will also enable some service here to use the block device: maybe an nfs or LIO iSCSI server or something.

Here are my raw notes and a sample "pcs status" output:

------------RAW NOTES-- SORT IT OUT LATER ----------

pcs resource create block0drb ocf:linbit:drbd drbd_resource=r0
pcs resource master block0drbms block0drb master-max=1 master-node-max=1 clone-max=4
# pcs resource update block0drbms clone-node-max=3 THIS IS WRONG--SHOULD BE 1 BECAUSE ONLY 1 CLONE SHOULD RUN ON EACH NODE (see below later)

pcs resource update block0drbms meta target-role='Started'
pcs resource update block0drbms notify=true

[root@drb3 cores]# systemctl disable drbd
Removed symlink /etc/systemd/system/multi-user.target.wants/drbd.service.

pcs resource update block0drb meta target-role="Started"
pcs resource update block0drb drbdconf="/etc/drbd.conf"

pcs property set stonith-enabled=false
pcs resource update block0drbms clone-node-max=1

pcs resource enable block0drbms



Also (info from the web), fix wrong permissions if needed:

--- chmod 777 some file in /var if needed ---- 

chmod 777 /var/lib/pacemaker/cores

---------- EXAMPLE PCS STATUS OUTPUT --------

[root@drb3 ~]# pcs status
Cluster name: drbdemo
Stack: corosync
Current DC: drb2 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
Last updated: Wed Oct  4 11:38:16 2017
Last change: Wed Oct  4 11:30:36 2017 by root via cibadmin on drb4

4 nodes configured
4 resources configured

Online: [ drb1 drb2 drb3 drb4 ]

Full list of resources:

 Master/Slave Set: block0drbms [block0drb]
     Masters: [ drb2 ]
     Slaves: [ drb1 drb3 drb4 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@drb3 ~]#

Wednesday, May 03, 2017

Random Ansible stuff -- commenting out variables

I am currently learning Ansible.  This is due to the fact that I realized I had to have a way to simultaneously configure many servers: I was up to using 6 (six) virtual centOS servers to learn glusterfs and manually coniguring each one was geting troublesome.  Anyhow I think I am a week into this already.

Todays lesson is: How to replace variables with comments on top.  This is a personal favorite style of mine, specifically in the form:

   # Previous value was VAR=value changed 20170504
   VAR=newvalue

I use this variable A LOT, in fact on all of my configuration changes whenever I can. 

Here are two examples from my personal tests.

One way is to simply use newlines:

         This task:
         ====================================================
         - name: positional backrefs embedded newline hacks
           lineinfile:
              dest: /tmp/testconfig.cfg
              regexp: '^(TESTCONFIGVAR9)=(.*)'
              line: '# \1 modified {{mod_timestamp_long}}
                    \n# \1 = (old value was) \2
                    \n\1=newvalue'
              backrefs: yes
              state: present
         ====================================================
         Produces this output:
         ====================================================
         # TESTCONFIGVAR9 modified 20170504T001157
         # TESTCONFIGVAR9 = (old value was) test
         TESTCONFIGVAR9=newvalue
         ====================================================


Another way is to split up the task:

         These tasks:
         ====================================================
         - name: another attempt at custom mod notes, step 1
           lineinfile:
              dest: /tmp/testconfig.cfg
              regexp: '^(TESTCONFIGVAR6=.*)'
              line: '# OLD VALUE: \1 {{ mod_timestamp_long }}'
              backrefs: yes
         - name: another attempt at custom mod notes, step 2
           lineinfile:
              dest: /tmp/testconfig.cfg
              insertafter: '# OLD VALUE: '
              line: 'TESTCONFIGVAR6=blahblahblah'
         ====================================================
         Result in this output:
         ====================================================
         # OLD VALUE: TESTCONFIGVAR6=test 20170504T001157
         TESTCONFIGVAR6=blahblahblah
         ====================================================


If anybody is reading this, I am open to suggestions (since I am still learning this at the moment).

JondZ Thu May  4 00:16:35 EDT 2017

Wednesday, April 05, 2017

todays random thoughts

As I write this my home server is down; I was learning glusterfs when I accidentally rebooted the Xen server which was holding all my virtual machines.  It has been a few minutes; it is unusual so the server may have crashed already.

Anyhow--

Today's lesson is: FIX HOSTNAMES FIRST before setting up glusterfs.  Glusterfs needs a good working hostname resolution in order to work.  Gluster is miserable with a broken DNS.

It also does not help that somewhere along the line, something, or somebody (*cough* ISP *cough*) modifies the dns queries that returns some far-off IP addresses on failed resolution.

Specifically make sure these work and actually point to your servers:

       node-testing-1.yoursubdomain.domain.net
       node-testing-2.yoursubdomain.domain.net
       node-testing-3.yoursubdomain.domain.net
       node-client-test.yoursubdomain.domain.net

ALSO make sure these work and actually point your servers (this is the part that someting in the dns query path might return some random IP address making the gluster server contact some unknown far-off host):

      node-testing-1
      node-testing-2
      node-testing-3
      node-client-test

The way I did this was put "yoursubdomain.domain.net" in the "search" parameter of /etc/resolv.conf.  Others will probably just put the entries in /etc/hosts.  Whatever works. 

By the way, configuring the search parameter on resolv.conf differs between debian and redhat-derived distributions.  For deiban-derived it is best to install "resolvconf" and put a keyword in /etc/network/interfaces; for redhat-derived it is easer to just use "nmtui" or put a keyword in /etc/sysconfig/network-scripts/whatever/ifcfg-whatever

My server is back online...thank you for reading this.

JondZ 20170505

Creating ipip tunnels in RedHat 8 that is compatible with LVS/TUN. Every few seasons I built a virtual LVS (Linux Virtual Server) mock up ju...