NVMe I/O errors on Proxmox

I migrated my system from an (unstable) NUC to a home-build AMD-based tower system with a little bit more space for cooling. After a couple of days the system crashed reporting I/O errors on the secondary NVMe.

After some investigation this seems to be related to a power saving mechanism that puts the NVMe (partially) to sleep. To disable this power saving mechanism I had to change this file /etc/default/grub:

The problem was of a SSD features, the Autonomous Power State Transitions(APST) was causing the freezes.


GRUB_TIMEOUT=10

GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=0"

GRUB_CMDLINE_LINUX=""

Mount disk based on UUID

When disks have exactly the same brand and type (this is quite normal on virtualisation platforms) Linux can mess up the logical assignment of these disks and swap, for example, sdb1 and sda1 after a reboot. To solve this we need to mount those disks based on their UUID.

Use the following command to retrieve the disk UUID:

blkid

Result:

/dev/sdb1: UUID="c924eac3-eb44-453e-8eb3-e9c8c8afebc0" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="f61232c6-01"
/dev/sda5: UUID="64fbe892-0cb1-4854-87ee-775dea97f60d" TYPE="swap" PARTUUID="cfd518f5-05"
/dev/sda1: UUID="c2f61b6b-4f90-48b7-a823-2b0a6863eb6e" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="cfd518f5-01"

Determine which disk you want to mount and adjust /etc/fstab accordingly:

UUID=c924eac3-eb44-453e-8eb3-e9c8c8afebc0       /data   ext4    defaults        0       0

Writing top output to file

In this case we sort based on memory and filter on dotnet8 processes:

top -b -c -w 200 -o %MEM -d 30 | grep dotnet8 >> top-output.log

Install a clean Kubernetes cluster on Debian 12

Ensure the /etc/hosts files are equal on all three the machines:

192.168.20.218 K8S01.verhaeg.local K8S01
192.168.20.219 K8S02.verhaeg.local K8S02
192.168.20.220 K8S03.verhaeg.local K8S03

Disable swap:

systemctl --type swap

  UNIT          LOAD   ACTIVE SUB    DESCRIPTION
  dev-sda3.swap loaded active active Swap Partition

systemctl mask dev-sda3.swap
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
reboot

Prepare the installation of containerd:

cat <<EOF | tee /etc/modules-load.d/containerd.conf 
overlay 
br_netfilter
EOF

modprobe overlay && modprobe br_netfilter

cat <<EOF | tee /etc/sysctl.d/99-kubernetes-k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1 
net.bridge.bridge-nf-call-ip6tables = 1 
EOF

Install containerd:

apt-get update && apt-get install containerd -y

Configure contianerd so that it works with Kubernetes:

containerd config default | tee /etc/containerd/config.toml >/dev/null 2>&1

Both the kubelet and the underlying container runtime need to interface with control groups to enforce resource management for pods and containers and set resources such as cpu/memory requests and limits. To interface with control groups, the kubelet and the container runtime need to use a cgroup driver. Set cgroupdriver to systemd (true) on all the nodes:

nano /etc/containerd/config.toml

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = ""
CriuImagePath = ""
CriuPath = ""
CriuWorkPath = ""
IoGid = 0
IoUid = 0
NoNewKeyring = false
NoPivotRoot = false
Root = ""
ShimCgroup = ""
SystemdCgroup = true

Restart and enable containerd on all nodes:

systemctl restart containerd && systemctl enable containerd

Add Kubernetes apt repository:

apt-get install curl pgp -y
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /" | tee /etc/apt/sources.list.d/kubernetes.list
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

Install Kubernetes tools:

apt-get update && apt-get install kubelet kubeadm kubectl -y && apt-mark hold kubelet kubeadm kubectl

Install Kubernetes cluster with Kubeadm. Kubelet doesn’t appreciate the command-line options anymore (these are deprecated). Instead, I suggest to create a configuration file, say ‘kubelet.yaml’ with following content.

Create the kubelet.yaml file on the master node (K8S01):

nano kubelet.yaml

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: "1.30.0" # Replace with your desired version
controlPlaneEndpoint: "K8S01"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration

Initialise the cluster:

kubeadm init --config kubelet.yaml --upload-certs

Result:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join k8s01.verhaeg.local:6443 --token 965cpz.xvmun07kjrezlzg9 \
        --discovery-token-ca-cert-hash sha256:3ea38e43e5304e0124e55cd5b3fb00937026a2b53bc9d930b6c2dab95482225a \
        --control-plane --certificate-key e48ada5b6340b8e217bcf4c7c5427ae245704be43eee46c07bfa0b6e1c4abdd8

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join k8s01.verhaeg.local:6443 --token 965cpz.xvmun07kjrezlzg9 \
        --discovery-token-ca-cert-hash sha256:3ea38e43e5304e0124e55cd5b3fb00937026a2b53bc9d930b6c2dab95482225a

To start interacting with cluster, run following commands on master node,

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

Let the other nodes join the cluster:

kubeadm join k8s01.verhaeg.local:6443 --token bcd2xw.32pzfgroijg1sax3 \
        --discovery-token-ca-cert-hash sha256:0c0f18cf32bc2342024efce9313e0e4fcf8a2b87275fd33e9ceb853d77b41f8b \
        --control-plane

Result:

root@K8S01:~# kubectl get nodes
NAME    STATUS     ROLES           AGE   VERSION
k8s01   NotReady   control-plane   62s   v1.28.11
k8s02   NotReady   <none>          26s   v1.28.11
k8s03   NotReady   <none>          21s   v1.28.11

Install Calico (container networking and security):

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml

Moving measurement data between InfluxDB databases

I want to move my energy measurement data to another InfluxDB database on the same server to create a new downsampling policy.

select * into Verhaeg_Energy..[measurement_name_destination] from Verhaeg_IoT..[measurement_name_source] group by *

Be aware of the .. in between the database name and the measurement name.

InfluxDB api unavailable after x attempts

The InfluxDB start-up script checks if the HTTP service is running by trying to connect to it. However, I have disabled HTTP in my configuration and use HTTPS. This behavior is also described on Github.

You can workaround this issue by adjusting the InfluxDB service configuration file (/etc/systemd/system/influxd.service). The commented out lines are old configuration and replaced by the used lines.

[Unit]
Description=InfluxDB is an open-source, distributed, time series database
Documentation=https://docs.influxdata.com/influxdb/
After=network-online.target

[Service]
User=influxdb
Group=influxdb
LimitNOFILE=65536
EnvironmentFile=-/etc/default/influxdb
ExecStart=/usr/bin/influxd -config /etc/influxdb/influxdb.conf $INFLUXD_OPTS
#ExecStart=/usr/lib/influxdb/scripts/influxd-systemd-start.sh
KillMode=control-group
Restart=on-failure
Type=simple
#Type=forking
PIDFile=
#PIDFile=/var/lib/influxdb/influxd.pid

[Install]
WantedBy=multi-user.target
Alias=influxd.service

Downsampling smart meter data with InfluxDB

This article is based on the official InfluxDB documentation on Downsampling and data retention.

I’m using a P1 (smart energy meter) database for this example.

First, change your default retention policy and create at least one additional retention policy:

CREATE RETENTION POLICY "168_hours" ON "P1_External" DURATION 168h REPLICATION 1 DEFAULT
CREATE RETENTION POLICY "2yr" ON "P1_External" DURATION 104w REPLICATION 1

Create a test query that summarizes the data that needs to be stored in the downsampled data:

SELECT mean("current_delivery") as "current_delivery", mean("current_usage") as "current_usage", last("total_usage_gas") as "total_usage_gas", last("total_usage_t1") as "total_usage_t1", last("total_usage_t2") as "total_usage_t2", last("total_delivery_t1") as "total_delivery_t1", last("total_delivery_t2") as "total_delivery_t2" FROM energy_p1_actual GROUP BY "name", time(1h) ORDER BY time DESC LIMIT 10

Then, define a continuous query from this:

CREATE CONTINUOUS QUERY "cq_60m" on "P1_External" BEGIN SELECT mean("current_delivery") as "current_delivery", mean("current_usage") as "current_usage", last("total_usage_gas") as "total_usage_gas", last("total_usage_t1") as "total_usage_t1", last("total_usage_t2") as "total_usage_t2", last("total_delivery_t1") as "total_delivery_t1", last("total_delivery_t2") as "total_delivery_t2" INTO "2yr"."energy_p1_history" FROM energy_p1_actual GROUP BY "name", time(1h) END

As our retention policy is set to 2 hours the continuous query will run every two hours to summarize the data.

Configure TLS for Mosquitto using a self-signed certificate

This article describes how to configure TLS for Mosquitto using a self-signed certificate. I assume that Mosquitto is installed and running.

Browse to the right directory:

cd /etc/mosquitto/certs

Generate a 3DES private key using OpenSSL and put it in the moquitto directory for certificates:

openssl genrsa -des3 -out ca.key 2048

Generate the 3DES certificates using the private key:

openssl req -new -x509 -days 3650 -key ca.key -out ca.crt

Copy the certificate to the right directory:

sudo cp ca.crt /etc/mosquitto/ca_certificates/

Generate an RSA private key :

openssl genrsa -out server.key 2048

Generate the RSA public key:

openssl req -new -out server.csr -key server.key

Generate the RSA certificates using the private key:

openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 3650

Configure Mosquitto to listen for TLS connections:

cd /etc/mosquitto/conf.d
nano listener.conf

listener xxxx 192.168.x.x
cafile /etc/mosquitto/ca_certificates/ca.crt
certfile /etc/mosquitto/certs/server.crt
keyfile /etc/mosquitto/certs/server.key
require_certificate false

I don’t enforce the usage of a certificate.

Go to the certificates folder and give the right permissions to the generated certificates.

cd /etc/mosquitto/certs
chmod 400 server.key
chmod 444 server.crt
chown mosquitto server*

Restart the Mosquitto service:

systemctl restart mosquitto.service

This is working for me now. However, while I was documenting this process I figured out I might have mixed up the 3DES and RSA certificates in the Mosquitto configuration. Something to look into at a later moment in time.

Make Proxmox VLAN aware

I’m using Proxmox as a hypervisor to run my virtual machines and use two VLANs in my home network: one for normal traffic and one separate VLAN for IoT traffic. Virtual machines should be connected to one of those networks. The normal network is typically untagged (vlan ID 20) while the IoT traffic is tagged with VLAN 21.

Configuration file: /etc/network/interfaces

auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet manual
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

auto vmbr0.20
iface vmbr0.20 inet static
        address 192.168.20.x/24
        gateway 192.168.20.1

This should result in the following Proxmox network configuration:

Proxmox host system network configuration

Now you can easily add a network adapter to a virtual machine and tag it with the correct VLAN.

Virtual machine network adapter configuration, including tagged VLAN.

Remove X11 from Raspberry Pi

sudo apt-get remove --purge x11-common
sudo apt-get autoremove
sudo apt-get update --allow-releaseinfo-chang