Sunday, March 22, 2026

Setting Up a 4-Node GlusterFS Cluster on CentOS 9: Is it Still Worth it?


Hi there. In this blog post of mine, I'll be setting up a storage cluster on Linux machines using a hyperconverged architecture, but first let me clarify what these mambo jambo terms mean. By hyperconverged, I mean, I don't have any disk enclosure or any hardware dedicated to store data. Instead, I have multiple identical machines and each of them have bunch of unused disks. I'll unify this disk space in a redundant and highly available configuration and serve it to the other machines, i.e. clients. It's important to have identical machines, because files will be split into chunks and these chunks will be distributed across all nodes for redundancy and performance. Therefore, if the smallest disk in the cluster fills up, new file chunks cannot be written to all disks. Additionally, the CPU and memory must be also equal, so that when one machine finishes writing, it doesn't have to wait for the other nodes.

I'll be using GlusterFS as distributed file system, primarily because its initial setup is relatively simple, whereas advanced configuration can get quite complicated. I won't be going into those details in this article. As an alternative, I could have used Ceph or Quobyte, but Ceph, in my opinion, is much more complicated than GlusterFS. I worked with Quobyte on a project, and unlike GlusterFS and Ceph, its setup and management is incredibly easy. However, GlusterFS is (was) directly supported by RHEL. But I'll get to that.

Normally, servers with terabyte-sized storage are used for such kind of project. To demonstrate a proof of concept, I'll use four machines with 20 GB disks and share a total of 60 GB of disk space with 75% efficiency, calculated as 20 GB * 75% * 4 = 60 GB. The remaining %25 of the space will be used for high availability and error correction. This capacity is directly proportional to the capacity of the underlying disks. On the other hand, the number of machine plays a more important role here. If I were to setup this cluster with three machines, the recommended configuration is a RAID1 like setup with 33% efficiency. This means, you can get a 20 GB shared disk from three machines, each with a 20 GB disk. Of course, the cluster could also be set up using RAID0 like logic, but in this case, even if just a single machine in the cluster gets rebooted, it will yield data corruption and loss. A four-machine cluster offers a quite high efficiency in terms of usable space with the smallest number of nodes and without compromising on high availability. In this configuration, the disks work under a RAID5 like logic, and the system doesn't get affected even if one machine crashes or gets rebooted. On the other hand, I mentioned that the configuration could get complicated. For example, four machines could also be configured with a RAID10 like logic, where half of the file chunks are stored on one node, and the other half on another, with the remaining two machines mirroring the data of the first two. In this case, the system can withstand the failure of two machines, as long as these two machines are not in same mirror group, but the storage efficiency drops to 50%.

I will use the samba service to share the disk over the network. If there are solely Linux clients on the network, NFS is also an option. Another reason to choose samba is that it can provide high availability with ctdb service. Ctdb is a part of samba and ctdb also supports NFS. On the other hand, if there are Windows clients on the network, too, samba is the only viable option. And if disk or directory authorization is going to be managed via Active Directory (AD), it is easier to handle this with samba.

As distro, I will be using CentOS9, but I also tested the GlusterFS commands and samba configuration on Ubuntu. In that regard, the configuration steps will be distro-agnostic as much as possible, except dnf. I will clone the machines from Cloud Image.

First of all, since I'll be installing on multiple machines, I opened four different panes in tmux and after connecting to each node on each pane, I ran set syn command of tmux. In this way, the commands will be run on all machines simultaneously.

An update on four panes in tmux

After establishing ssh session to each machine, I installed the necessary packages. The first line is for the tools for troubleshooting. The second line installs the repository containing GlusterFS packages, and the third line installs the glusterfs-server package. In the following lines, I install samba related packages, update the system and finally reboot the system if necessary using needs-restart .

dnf -y install tcpdump telnet wget epel-release
dnf -y install centos-release-gluster9
dnf -y install glusterfs-server

dnf -y install samba cifs-utils samba samba-common-tools samba-winbind ctdb --enablerepo=resilientstorage

dnf update
needs-restarting -r || reboot

GlusterFS requires, that the machines are able to reach each other using their hostnames. This should be normally done using DNS records. Since I'm working in a tiny test environment, I manually add the machines' IP addresses to their hosts file. I name the machines server01 thru server04 as shown below. I then assign the hostnames to the machines using the second command. Of course, this command must be entered individually, and if you want, bash to display the new hostname, you must log out and log back in (for Ubuntu use hostnamectl set-hostname server01).

cat >> /etc/hosts << EOF
172.18.186.101   server01
172.18.186.102   server02
172.18.186.103   server03
172.18.186.104   server04
EOF

nmcli gen hostname server01
chronyc sources

GlusterFS is a very time-sensitive service, so there should be theoretically no time difference between machines. Therefore, I check whether chrony is running using the last command above. By the way, Ubuntu has its own NTP client instead of chronyd but you can install chronyd explicitly and the system's own NTP client will be uninstalled automatically.

output of chronyc sources

Now, setting up the disks. First, I check the disks using lsblk . My disk is /dev/vdb. To make it more flexible, I create a disk partition, and set up a logical volume using LVM inside this partition and then use it.

fdisk /dev/vdb
# I won't go into the partitioning details here. You can find how to do it in previous posts.

pvcreate /dev/vdb1
vgcreate vg_gluster /dev/vdb1
lvcreate -l 100%FREE -n lv_brick vg_gluster
mkfs.xfs -f -i size=512 /dev/vg_gluster/lv_brick
mkdir -p /data/glusterfs/brick1
echo "/dev/vg_gluster/lv_brick  /data/glusterfs/brick1  xfs  defaults  1  2" >> /etc/fstab
systemctl daemon-reload
mount -a

If everything has gone smoothly so far, the final command should not produce any output, and the newly mounted partition should appear in df -hP output. At this point, the machines are ready for GlusterFS installation.

I create a GlusterFS volume called shared_storage using the commands below. The systemctl command will be run on all machines. The remaining gluster commands will be run on the first machine, only.

systemctl enable --now glusterd

gluster peer probe server02
gluster peer probe server03
gluster peer probe server04
gluster volume create shared_storage disperse 4 redundancy 1 \
server01:/data/glusterfs/brick1/brick \
server02:/data/glusterfs/brick1/brick \
server03:/data/glusterfs/brick1/brick \
server04:/data/glusterfs/brick1/brick

gluster volume start shared_storage

gluster volume status shared_storage

gluster volume status shared_storage

In the image above, when I entered the command on all machines at the same moment, the first two machines are waiting for the command to finish on the other machines. This is the normal case. The command likely ran on server03 and server04 with a few milliseconds difference. Normally, it should have produced output on just a single machine.

By the way, I do not make any configuration on the host firewall, as firewalld isn't running on my machines.

At this point, I have to use a little workaround for samba. Normally, samba had GlusterFS integration, and it used to work quite well [1]. However, RedHat decided to remove GlusterFS support as of the end of 2024 and shift the resources to Ceph [2][3]. For this reason, this integration has been removed from the distros [5]. In other words, samba-vhs-glusterfs package is no longer available for RedHat and its compatibles. This issue can be worked around as follows (to be run on all machines). Fedora still supports this [4], but I do not know for how long.

mkdir -p /mnt/gluster_shared
echo "localhost:shared_storage  /mnt/gluster_shared  glusterfs  defaults,_netdev  0  0" >> /etc/fstab
systemctl daemon-reload
mount -a

In this way, all machines mount GlusterFS shared disk on themselves. Since I cannot get samba to communicate directly with GlusterFS, I mount and serve this resource via samba as if it is a normal directory. Its drawback is slightly less performance, due to the file system operations having to go through the kernel twice instead of once [1].

Now I can configure ctdb and share the disk over the network, but before I do that, I need to set selinux to Permissive mode. A few sources mention that a ctdb cluster can also be set up without disabling selinux by adjusting only couple booleans, but it didn't work for me.

sed -i -e "s/SELINUX=enforcing/SELINUX=permissive/" /etc/selinux/config
setenforce 0

Only the IPs of the nodes should be in /etc/ctdb/nodes file. In /etc/ctdb/public_addresses file, you must enter the floating IP of the cluster and the network interface to which this IP will be assigned.

cat > /etc/ctdb/nodes << EOF
172.18.186.101
172.18.186.102
172.18.186.103
172.18.186.104
EOF

cat > /etc/ctdb/public_addresses << EOF
172.18.186.200/24    eth0
EOF

The contents of smb.conf file should be like this:

[global]
        netbios name = GLUSTER_CLUSTER
        workgroup = SAMBA
        #security = user
        clustering = yes

        passdb backend = tdbsam
        idmap config * : backend = tdb
        idmap config * : range = 1000000-1999999

        #printing = cups
        #printcap name = cups
        #load printers = yes
        #cups options = raw

[shared_storage]
    comment = GlusterFS
    path = /mnt/gluster_shared
    valid users = sambauser
    read only = no
    guest ok = yes
    create mask = 0664
    directory mask = 0775

And I configure ctdb then as follows:

systemctl stop smb nmb
systemctl disable smb nmb
ctdb event script enable legacy 00.ctdb
ctdb event script enable legacy 10.interface
ctdb event script enable legacy 50.samba
systemctl enable --now ctdb

The enabled scripts are for ctdb to manage the floating IP and clustered samba. The status of the cluster can be checked using ctdb status command. If everything is configured correctly, all nodes should be "OK" in the output, a few seconds after running the last command. By the way, it is also possible to assign multiple floating IPs to the cluster, I only assigned one. The status of the IP(s) can be checked using ctdb ip all command.

ctdb status

Now final step is to create a user on all samba nodes:

groupadd -g 2000 sambagroup
useradd -u 2000 -g sambagroup -s /sbin/nologin sambauser
smbpasswd -a sambauser

The system password of sambauser isn't needed as samba keeps its own database.

Thus, the storage cluster is up and running. Now I need a client to mount and test it. Setting this up is much easier. I run following commands on a fifth machine on the same subnet:

dnf -y install centos-release-gluster9
dnf -y install glusterfs-fuse samba-client cifs-utils
mkdir -p /mnt/my_shared_storage

cat >> /etc/hosts << EOF
172.18.186.101   server01
172.18.186.102   server02
172.18.186.103   server03
172.18.186.104   server04
EOF

mount -t glusterfs 172.18.186.200:/shared_storage /mnt/my_shared_storage/
df -hP

With the first command, I install the glusterfs repository. Then I install the package, needed to mount glusterfs and also samba client, with the second command. On the client, I also create the hosts file containing the list of all servers. Even though the client mounts the storage via a floating IP, it has to be able to resolve IP addresses of cluster nodes for intracluster communication. This step is of course not necessary in an environment with DNS server. In the final step, I mount shared_storage as glusterfs, and when I check with df a 60 GB disk was mounted. But I have not mounted it as samba, yet.

Before mounting as samba, I can first view shared resources using the first command below, and then connect to this share using the second command:

smbclient -L //172.18.186.200 -U sambauser
smbclient //172.18.186.200/shared_storage -U sambauser

Before mounting this share, I unmount the GlusterFS, mounted in previous step, and then mount the resource to the same mount point as a samba share:

umount /mnt/my_shared_storage
mount -t cifs //172.18.186.200/shared_storage /mnt/my_shared_storage -o username=sambauser

And add the following line to /etc/fstab to make it permanent:

//10.0.100.231/shared_storage  /mnt/my_shared_storage  cifs  credentials=/etc/samba/user.cred,iocharset=utf8,_netdev 0 0

/etc/samba/user.cred has the login credentials.

cat >> /etc/samba/user.cred << EOF
username=sambauser
password=secret
domain=SAMBA
EOF

chmod go= /etc/samba/user.cred
chown root:root /etc/samba/user.cred
systemctl daemon-reload
mount -a
df -hP

I conclude this post here for now, as it has got long enough. I plan to explain how to integrate this with AD or LDAP in a future post.


Sources:

[1]: https://lalatendu.org/2014/04/20/glusterfs-vfs-plugin-for-samba/
[2]: https://en.wikipedia.org/wiki/Gluster#cite_ref-10
[3]: https://www.reddit.com/r/kubernetes/comments/zojdl7/whats_the_story_behind_the_abandonment_with/
[4]: https://pkgs.org/search/?q=samba-vfs-glusterfs
[5]: https://www.samba.org/samba/docs/4.7/man-html/vfs_glusterfs.8.html

Friday, February 20, 2026

Using CUPS on Raspberry 4 to Connect Canon MF4450 to Wi-Fi


Hi there. As the title suggests, I'll explain how I converted my old Canon MF4450 printer, which doesn't have a network port, into a shared network printer in my home network using a Raspberry Pi4. This is actually quite easy to do. I actually wanted to explain the steps I took and the difficulties I encountered while doing this. Therefore, I first want to discuss the design before the real configuration.

Note: In the rest of this article, I will use the abbreviation RPi instead of Raspberry Pi.


The first thing is selecting the platform. As I mentioned in the title, I worked with RPi4, or I had to work with it, to be more precise. I also have a RPi 1B at home, and it's actually more than sufficient just for this task in terms of hardware. If this machine will be a dedicated print server, an 8 GB SD card (half of it just for operating system) is just fine as storage. And since RPi 1B has no onboard WiFi adapter, a WiFi dongle is also needed. I'm using a 128 GB SD card on RPi 4, because I run other services on it, but it is also mostly empty. RPi 4 has an onboard WiFi, so no additional hardware is required for wireless networking.

Using an RPi 4 solely for this task is clearly wasting resources, but here's what happened to me with RPi 1:


Selecting OS Distro

For print server, I first planned to use RPi 1B. This has a 32-bit CPU. 32-bit version of Raspberry Pi OS (formerly Raspbian) can be downloaded from the official website. This is a Debian-based distro. I opted for the lite version of it, even though RPi 1 is able to launch GUI, it is beyond usable slow. Apart from that, I also evaluated Alpine RPi, DietPi and piCore, but only found DietPi somewhat useful. It is also Debian-based and gave me the impression of being a customized version of RPi OS. However, DietPi configuration script requires a working internet connection during the very first boot. I'll address this issue later.

RPi4 is 64-bit. For this, I downloaded the 64-bit version of RPi OS from its official website. I also downloaded Manjaro ARM, Ubuntu and Fedora IOT, but I didn't have much time to use the first two. Fedora IOT's approach is quite different from both classic Fedora and other distributions. It uses a script called ignition for the initial configuration, which is similar to cloud-init, but based on JSON. RPi OS, on the other hand, is configured using a classic startup script.


WiFi Config on Raspberry Pi 1

I had a TP-Link AC600 WiFi dongle to use with my RPi 1, and I realized that this stick is not recognized automatically, because the driver for this chipset (Realtek rtl8821au) is missing in RPi OS as well as Dietpi. There is a driver for this chipset, openly developed on github, but you still need internet to download it. Therefore, I had to connect the RPi initially via cable. As I mentioned earlier, DietPi cannot be initially configured without internet. This creates a chicken-egg problem. Alternative: Download the github repo directly to the SD card on the computer.

In the meantime, I ordered TP-Link's TL-WM725N model (rtl8188eu chipset). Its driver is shipped with RPi OS and the card is recognized automatically. While my order was on its way, I compiled the driver I downloaded from github. gcc, make and bc is prerequisites and it gets compiled with a simple install-driver.sh script and installed via DKMS, but due to the RPi 1's CPU speed, compilation takes about 4-5 hours. I also didn't like having to install extra packages, that I'll be only using once or twice, especially on a distribution like DietPi, which is supposed to be optimized for being small.


Printer Config on Raspberry Pi 1

The second problem I had with the RPi1 (after WiFi), was that there is no official 32-bit ARM driver for Canon MF4450. If you have a different model printer, that works perfectly on RPi 1, feel free to skip this section.

I downloaded the printer driver from Canon's official website. According to their website, Linux ARM is supported. The fire (linux-UFRII-drv-v620-m17n-20.tar.gz) contains .deb and .rpm files for ARM64, x86 and x84 platforms but no ARM32. Therefore, I decided to compile the source code in the Sources directory. But this means installing a bunch of packages again, which will be only used for compilation. Moreover, spoiler alert: I couldn't succeed after compiling the driver (if you're not curious about the process, you can skip this part, too). To compile the driver, you'd need to install a total of 11 prerequisite packages.

apt install autoconf automake libcups2-dev libglib2.0-dev libltdl-dev libtool-bin libglade2-0 libglade2-dev libgtk-3-dev libxml2 libxml2-dev

After installing these, I ran allgen.sh script in the cnrdrvcups-common-6.20 and cnrdrvcups-lb-6.20 directories in sequence, to compile the code and copied the files with make install. When compiling cnrdrvcups-lb-6.20 on DietPi, a pointer conversion error occurs. Therefore, CFLAGS variable needs to include -fpermissive flag. To do this, I added the following case to the case structure in allgen.sh :

    "armv6l")
        _machine_type="MACHINETYPE=armv6"
        _cflags="CFLAGS=-fpermissive"
    ;;

After setting all these up, I did not get any errors from compilation and neither did I from CUPS while sending print jobs, but even the local print jobs did not reach the printer. I haven't solved the problem yet, but I still have some cards up my sleeve. If I can solve this issue later, it could be a future blog post. TL;DR: Due to both the WiFi issue -even if it's resolved now- and the printer driver, I decided not to waste time with RPi 1B and to focus on RPi 4 instead.


Configuration Steps

In this section, I based my work on the "Raspberry Pi Print Server" article. I won't go into details of RPi 4 setup. After writing the image file to the SD card and booting it up, a wizard asks for details such as username, WiFi settings and prepares the system accordingly. If SSH isn't enabled yet, enable it with following command:

sudo systemctl enable --now ssh

Then I installed the packages, mentioned in the article:

sudo apt install cups printer-driver-gutenprint hplip samba cups-bsd

Frankly, hplip and printer-driver-gutenprint packages didn't work for me. According to the article, they contain drivers for many printers, including Canon, but apparently the MF4450 model isn't there. cups-bsd is just a dependency of Canon driver.

CUPS is configured through a web UI, and the user must be in the lpadmin group or be root, to configure it. Following commands add the current user to this group and grant external access to CUPS:

sudo usermod -a -G lpadmin $USER
cupsctl --remote-any
sudo systemctl restart cups

After this step, you should be able to access CUPS webUI via http://<machineIP>:631/ from a computer on your network. If not, the host firewall may be running. In this case, you can disable it by running sudo ufw disable or allow the port to the network. If it's still not working, check whether there is a process listening on that port and whether CUPS is running.

If everything is fine so far, I need to install the printer driver.


Printer Config on Raspberry Pi 4

To install the printer on RPi 4, I used the same driver package, I used for RPi 1. After connecting the printer, I installed the .deb file in ARM64/Debian directory, using following command.

sudo dpkg -i cnrdrvcups-ufr2-uk_6.20-1.20_arm64.deb

There is one big detail here: If a window manager is running, a window will pop up (see the image right) on the desktop during the installation. If you're installing it via SSH, it may seem like the installation is frozen, yet the installer is waiting for input via GUI. Therefore, a monitor should be connected during the installation. The installer finds the connected printer automatically. In next step, you select the printer model and the installation is complete.



After spending hours compiling in various ways on RPi 1, I couldn't reach this point, where I got to in about ten minutes with RPi 4. Now, I went back to service configuration.


samba, which I installed above, is necessary to share the printer with Windows clients. I didn't need to make any change in cups.conf. Since I won't be creating any Windows share, I commented out the [homes] section in smb.conf. In the [printers] and [print$] sections, I set guest ok = yes. I checked for errors using testparm -s command and loaded the new settings with systemctl reload smbd command.

These steps are slightly different in DietPi. First, it's recommended to install CUPS and samba using dietpi-software command. Second, DietPi has a customized smb.conf. You should copy the default [printers] and [print$] sections from smb.conf.example file.

CUPS and the driver are independent of eachother. Even if the driver isn't installed, CUPS can find the printer, display it in the web interface and add it without any issue. To do this, I first clicked 'Administration' from the upper bar. A SSL certificate warning should appear in the browser, because Administration page is served with HTTPS and you're getting redirected to HTTPS now. I skipped it and logged in with the user I added to lpadmin. When I clicked "Find New Printers", it automatically found the printer. After I clicked "Add This Printer", a page appeared, asking me what alias to give it to. The only thing important is to select "Share This Printer" here. If the driver is already installed, Canon's model list appears on the next page. You can still go further without installing the driver, by providing the PPD file for the printer. The correct file for Canon MF4450 printer is CNRCUPSMF4400ZK.ppd. And I can extract this file from any .deb or .rpm file in the driver package. In Fedora, this can be achieved with

rpm2cpio cnrdrvcups-ufr2-uk-6.20-1.20.aarch64.rpm | cpio -idmv

command. PPD files are definition files, that contain the specs of printers. They're platform and distribution independent. The driver selection page looks very similar to the one in the image below, but since I had already added the printer, I have included an image of "Modify Printer" page.


After adding the printer, it can be now listed under Printers page, on the rightmost side of upper bar. (At this point, I had to delete the printer and add it again with a different name, this is why the printer name is different in the following screenshots).


When I click on this printer, "Printer and Queue Settings" page opens (on the left). Here, I'll check whether I have loaded the correct PPD file. Default paper size is different (A4 vs Letter) in European and American versions of the same PPD file. If I accidentally load the US version, the print job is delivered to the printer with Letter sized paper, and an annoying warning appears during printing. As it can be seen from the screenshot, the settings are correct here. If Letter appears in these settings, you should select 'Set Default Options' from 'Administration' menu

... and change the paper size to A4 on the page that appears next.


Finally, I took a test printout on this interface via Maintenance -> Print Test Page.


If this gets printed correctly and without problems, the printer is installed. When I tried this for the first time, just input the PPD file without installing the driver, there was no printer activity, and when I clicked the 'Show All Jobs' button, it was showing an error with rastertoufr2. This mean, the driver has/could not been installed correctly. This file should be normally located at /usr/lib/cups/filter/rastertoufr2. At this point, there is no way around installing the driver. However, it is the PPD file, specifies that this file must be called. Perhaps something can be still done by changing corresponding line in PPD file.

And finally, one last magic touch: Debian shuts down the wireless network card, when it hasn't detected any activity for a long time. To prevent this, you need to run

nmcli connection mod <SSID> 802-11-wireless.powersave 2

command. So, the printer is now online on the local network.


Installing the Printer to Windows and Linux Clients

Installing a network printer to clients is quite easy. Moreover, I must admit, that it's easier in Windows than Fedora. This procedure is probably similar in other Linux distros, but I have Fedora and I'll focus on that.


To add a printer in Fedora, open System Settings -> Printers, click on '+ Add' button above. The network printer is found quickly. When I clicked on 'Select Recommended Driver' here (below), IPP Everywhere gets selected.


I continued by selecting this, and the form on the right image appeared. I continued without changing anything here. Queue name must be a unique name. Description and Location can be changed as desired, if the defaults are not descriptive enough.

Since there is no other printer installed in my system, 'Default printer' option isn't important. If everything is OK, click 'Add' to add the printer, and it should appear in the printers list.


A test job can be sent, by clicking on this printer and then 'Print Test Page' button in the window.

In Windows (v10 ve v11), when you select printers from settings or type "Printers" on the Start Menu, a form appears with an 'Add a printer or scanner' entry and 'Add device' button next to it. The printer gets listed automatically, when you click on this button. There is another 'Add device' button, next to the newly found printer. When you click it, the printer is installed automatically, but this step may take 1-2 minutes. The drivers are probably being downloaded from the internet during this time.