JondZ

Monday, March 08, 2021

Creating ipip tunnels in RedHat 8 that is compatible with LVS/TUN.

Every few seasons I built a virtual LVS (Linux Virtual Server) mock up just to keep in shape. LVS-TUN (lvs/tunnel mode) ipip configuration for RH8 had me stumped for a while.

There was a change from RH7 to RH8 in that network scripts are being deprecated. So, /etc/sysconfig/ network scripts behave just a bit different. For one thing I could not build a reliable ipip tunnel interface in RH8 even after installing and using traditional RH-style network scripts.

After a lot of difficult tests I am posting the solution for keeping up an ipip tunnel interface for RH8. Here it is:

# In this example 192.168.131.239 is the RIP (Real Server IP) IP address,

# 192.168.131.235 is the VIP address.

nmcli con add type ip-tunnel ip-tunnel.mode ipip con-name tun0

ifname tun0 remote 0.0.0.0 local 192.168.131.239

nmcli con mod tun0 ipv4.addresses 192.168.131.235

nmcli con mod tun0 ipv4.method manual

nmcli con up tun0

The key thing to remember is that the remote ip should be 0.0.0.0. In RH7 you could use an ipip for LVS-TUN without specifying remote and local IP's.

If this helps somebody out there with a similar problem, please let me know by replying to this blog.

Wednesday, February 17, 2021

Raspberry pi simple fan control script

Here is a simple fan control script that I made to turn on and off a raspberry pi fan. It turns on a fan when the temp reaches a threshold (in this case, 60 degrees).

#!/bin/bash

# Wed Feb 17 19:14:54 EST 2021 JondZ very simple fan controller

GPIO_PIN=4

GPIO_NAME=gpio${GPIO_PIN}

GPIO_PATH=/sys/class/gpio/$GPIO_NAME/value

TEMP_THRESHOLD=60000

TEMP_SOURCE_FILE=/sys/class/thermal/thermal_zone0/temp

# 600: 10 minutes

# 900: 15 minutes

# 1800: 30 minutes

SLEEP_TIME=600

SLEEP_POLL=10

function get_temp {

if [[ -e $TEMP_SOURCE_FILE ]] ; then

cat $TEMP_SOURCE_FILE

}

function init_ports {

if [[ ! -e /sys/class/gpio/$GPIO_NAME ]]; then

echo $GPIO_PIN > /sys/class/gpio/export && \

sleep 5 && \

echo out > /sys/class/gpio/$GPIO_NAME/direction

}

init_ports

function turn_on_fan {

temp=$(get_temp)

if [[ -e $GPIO_PATH ]]; then

echo "$(date) fan on ($temp)"

v=$(< $GPIO_PATH )

if [[ $v == 0 ]]; then

echo 1 > $GPIO_PATH

}

function turn_off_fan {

temp=$(get_temp)

if [[ -e $GPIO_PATH ]]; then

v=$(< $GPIO_PATH )

if [[ $v == 1 ]]; then

echo "$(date) fan off ($temp)"

echo 0 > $GPIO_PATH

}

while :

CT=$(get_temp)

if [[ $TEMP_THRESHOLD -lt $CT ]]; then

turn_on_fan

sleep $SLEEP_TIME

else

turn_off_fan

sleep $SLEEP_POLL

done

Here is the systemd script:

[Unit]
Description=Simple Fan Controller
DefaultDependencies=no
After=sysinit.target

[Service]
Type=simple
ExecStart=/root/scriptsd/system-fan-control.sh

[Install]
WantedBy=graphical.target multi-user.target

As far as I can see it is working fine.

Here is a picture of the circuit that i soldered. I copied the circuit from some other site on the web. This comprises of a resistor, a transistor, and a diode.

Here is what the fan circuit looks like on my setup. The case is already crowded with an OLED display and an RTC clock:

There may be minor bugs. I was tired from soldering so I only did the coding in a few minutes' rush so comments are welcome...

EP 20210217

Sunday, February 16, 2020

RH/CentOS 8 dummy network interface

This is how to set up a dummy interface on RedHat/CentOS 8.x. I cannot make the old style of init scripts in /etc/sysconfig/network-scripts/ work anymore for the dummy network interface.

Its all NetworkManager now--here is what I pieced together from surfing the web. This one stays up even after reboot:

nmcli connection add type dummy ifname dummy0 con-name dummy0
nmcli con mod dummy0 ipv4.address 192.168.8.88/32
nmcli con mod dummy0 ipv4.method manual autoconnect yes
nmcli con up dummy0
nmcli con show

Monday, September 16, 2019

Old DDS/4mm Tape drive speed issues

Ok, so I bought a tape drive from ebay..its a Dell PowerVault 100T dds4 tape drive (internal seagate std1400lw). I really like the 4mm tape format because theyre so small. Just perfect for freezing my usual data output (some notes, documents).

At first I thought I bought a lemon, since the drive was super slow. As usual I played around with block sizes. Normally I leave the blocksize to 512, which is usually what I find tape drives to have the default set up as. I usually would do this:

mt -f /dev/st1 setblk 512 # if necessary
tar cvfp /dev/st1 ....

Or, sometimes, If I want the default tar blocking factor to be the same as the hardware block size:

mt -f /dev/st1 setblk 10240 # thats 20 tar blocks of 512 each
tar cvfp /dev/st1 ...

But the tape drive is SUPER SLOW no matter how I tried...Until I tried 4096 (page size?) on a whim. And this drive just flew! So this is how to make this drive spin fast:

mt -f /dev/st1 setblk 4096 # page size
tar cvfpb /dev/st1 8 ...
tar tvfb /dev/st1 8 .....

What a surprise that was.. and I was using tapes for years.

--------------UPDATE-------------

20190918

The slow speed is for READING BACK DATA, not writing them. For some reason the tape drive stalls delivering data for data transfers > 4k. Therefore, to read back data, specify transfer buffers of 4k or less:

mbuffer_style# mbuffer -s 4096 -i /dev/st2 | tar tvf -
normal_tar# tar xvfpb /dev/st2 8

JondZ 20190917

Wednesday, July 03, 2019

pacemaker fail count timing experiments

Occasionally we are troubled by failure alerts; since I have installed the check_crm nagios monitor plugin the alert seems more 'sensitive'. I have come to understand that pacemaker needs manual intervention when things fail. The things that especially fail on our site is the vmware fencing---every few weeks one or two logins to vmware would fail, forcing me to login and issue a "pcs resource cleanup" to reset the failure.

I am doing this experiment now to undestand the parameters that I need to adjust on a live system. These observations are done on a RH7.6 cluster, with a dummy shell script as a resource.

dummy shell script, configured to fail:

#! /bin/bash
exit 1 # <-- this is removed when we want the result to succeed
while :
do
date
sleep 5
done > /tmp/do_nothing_log

OBSERVATIONS 1:

CLUSTER CONDITIONS: cluster property "start-failure-is-fatal" to defaults (true).
RESOURCE CONDITIONS: defaults
RESULT: node1 is tried ONCE, node2 is tried ONCE, then nothing is tried again. When a resource fails the fail count is immediately set to INFINITY (1000000). This is why the documentation says "the node will no longer be allowed to run the failed resource" until a manual intervention happens.

OBSERVATIONS 2:

CLUSTER CONDITIONS: "start-failure-is-fatal" to FALSE ("pcs property set start-failure-is-fatal=false; pcs resource cleanup")
RESOURCE CONDITIONS: defaults
RESULT: resource is tried to restart on node1 nonstop (to infinity?). It does not appear to be attempted to restart on another node.

OBSERVATIONS 3:

CLUSTER CONDITIONS: start-failure-is-fatal" to FALSE ("pcs property set start-failure-is-fatal=false; pcs resource cleanup")
RESOURCE CONDITIONS: migration-threshold=10 ("pcs resource update resname meta migration-threshold=10; pcs resource cleanup")
RESULT: Resource is retried 10 times on node1, then retried 10 times on node2, then retried no longer.

OBSERVATIONS 4:

CLUSTER CONDITIONS: start-failure-is-fatal=false, cluster-recheck-interval=180.
RESOURCE CONDITIONS: migration-threshold=10 and failure-timeout=2min ("pcs resource update resname meta failure-timeout=2min")
RESULT: Resource is retried 10 times on node 1, 10 times on node2. Errors are cleared after 2 minutes. After that, resource is tried ONCE for node1 but 10 times on node2 every cluster-recheck-interval (3 minutes). Thats because the errors condition is gone but the counters do not necessarily reset (but sometimes they do on other nodes, when its tried on one node).

GROUP RESOURCES CONSIDERATION:

I an unable to apply the resources migration-threshold, failure-timeout at this moment. It seems to still be a property of the individual resources.
Update resource meta resname as usual regardless of whether it is part of a group or not; behavior should be as expected (group proceeds from one resource to the next in the list anyway).

JondZ 20190703

Friday, March 15, 2019

pacemaker unfencing errors

While testing pacemaker clustering with iscsi (on redhat 8 beta) I came upon this error:

Pending Fencing Actions:
* unfencing of rh-8beta-b pending: client=pacemaker-controld.2603, origin=rh-8beta-b

It took me almost the whole morning to understand how to clear the error. Since the stonith resource includes the clause "meta provides=unfencing", this means that the fencing agent should account for unfencing, meaning we should simply reboot the node (rh-8beta-b in this case).

RedHat documentation explains this as well: " ...The act of booting in this case implies that unfencing occurred..."

Wednesday, October 31, 2018

VDO Troubles

Once in a while, on very large VDO volumes, storage just dissapears. In my case I had one 42TB volume that just refused to boot. Here are some of the symptoms:

Boot dropped single user into a shell
"vdo start" says the vdo volume was already started, but there is no entry in /dev/mapper/.

In this case I retried "vdo stop" then "vdo start" a few times until the vdo volume became available.

Further observation implies that systemd kills the vdo startup process. What happened was that vdo was interrupted on its long boot up---a timing issue.

A working solution for me was to include a timeout value in /etc/fstab for the particular vdo volumes:

# /etc/fstab
/dev/vgsample/lvsample /somedirectory ext4 defaults,discard,x-systemd.requires=vdo.service,x-systemd.device-timeout=10min 0 0

Working so far, survived a few reboots.

JondZ 20181031