Hi there. In this blog post of mine, I'll be setting up a storage cluster on Linux machines using a hyperconverged architecture, but first let me clarify what these mambo jambo terms mean. By hyperconverged, I mean, I don't have any disk enclosure or any hardware dedicated to store data. Instead, I have multiple identical machines and each of them have bunch of unused disks. I'll unify this disk space in a redundant and highly available configuration and serve it to the other machines, i.e. clients. It's important to have identical machines, because files will be split into chunks and these chunks will be distributed across all nodes for redundancy and performance. Therefore, if the smallest disk in the cluster fills up, new file chunks cannot be written to all disks. Additionally, the CPU and memory must be also equal, so that when one machine finishes writing, it doesn't have to wait for the other nodes.
I'll be using GlusterFS as distributed file system, primarily because its initial setup is relatively simple, whereas advanced configuration can get quite complicated. I won't be going into those details in this article. As an alternative, I could have used Ceph or Quobyte, but Ceph, in my opinion, is much more complicated than GlusterFS. I worked with Quobyte on a project, and unlike GlusterFS and Ceph, its setup and management is incredibly easy. However, GlusterFS is (was) directly supported by RHEL. But I'll get to that.
Normally, servers with terabyte-sized storage are used for such kind of project. To demonstrate a proof of concept, I'll use four machines with 20 GB disks and share a total of 60 GB of disk space with 75% efficiency, calculated as 20 GB * 75% * 4 = 60 GB. The remaining %25 of the space will be used for high availability and error correction. This capacity is directly proportional to the capacity of the underlying disks. On the other hand, the number of machine plays a more important role here. If I were to setup this cluster with three machines, the recommended configuration is a RAID1 like setup with 33% efficiency. This means, you can get a 20 GB shared disk from three machines, each with a 20 GB disk. Of course, the cluster could also be set up using RAID0 like logic, but in this case, even if just a single machine in the cluster gets rebooted, it will yield data corruption and loss. A four-machine cluster offers a quite high efficiency in terms of usable space with the smallest number of nodes and without compromising on high availability. In this configuration, the disks work under a RAID5 like logic, and the system doesn't get affected even if one machine crashes or gets rebooted. On the other hand, I mentioned that the configuration could get complicated. For example, four machines could also be configured with a RAID10 like logic, where half of the file chunks are stored on one node, and the other half on another, with the remaining two machines mirroring the data of the first two. In this case, the system can withstand the failure of two machines, as long as these two machines are not in same mirror group, but the storage efficiency drops to 50%.
I will use the samba service to share the disk over the network. If there are solely Linux clients on the network, NFS is also an option. Another reason to choose samba is that it can provide high availability with ctdb service. Ctdb is a part of samba and ctdb also supports NFS. On the other hand, if there are Windows clients on the network, too, samba is the only viable option. And if disk or directory authorization is going to be managed via Active Directory (AD), it is easier to handle this with samba.
As distro, I will be using CentOS9, but I also tested the GlusterFS commands and samba configuration on Ubuntu. In that regard, the configuration steps will be distro-agnostic as much as possible, except dnf. I will clone the machines from Cloud Image.
First of all, since I'll be installing on multiple machines, I opened four different panes in tmux and after connecting to each node on each pane, I ran set syn command of tmux. In this way, the commands will be run on all machines simultaneously.
![]() |
| An update on four panes in tmux |
After establishing ssh session to each machine, I installed the necessary packages. The first line is for the tools for troubleshooting. The second line installs the repository containing GlusterFS packages, and the third line installs the glusterfs-server package. In the following lines, I install samba related packages, update the system and finally reboot the system if necessary using needs-restart .
dnf -y install centos-release-gluster9
dnf -y install glusterfs-server
dnf -y install samba cifs-utils samba samba-common-tools samba-winbind ctdb --enablerepo=resilientstorage
dnf update
needs-restarting -r || reboot
GlusterFS requires, that the machines are able to reach each other using their hostnames. This should be normally done using DNS records. Since I'm working in a tiny test environment, I manually add the machines' IP addresses to their hosts file. I name the machines server01 thru server04 as shown below. I then assign the hostnames to the machines using the second command. Of course, this command must be entered individually, and if you want, bash to display the new hostname, you must log out and log back in (for Ubuntu use hostnamectl set-hostname server01).
172.18.186.101 server01
172.18.186.102 server02
172.18.186.103 server03
172.18.186.104 server04
EOF
nmcli gen hostname server01
chronyc sources
GlusterFS is a very time-sensitive service, so there should be theoretically no time difference between machines. Therefore, I check whether chrony is running using the last command above. By the way, Ubuntu has its own NTP client instead of chronyd but you can install chronyd explicitly and the system's own NTP client will be uninstalled automatically.
| output of chronyc sources |
Now, setting up the disks. First, I check the disks using lsblk . My disk is /dev/vdb. To make it more flexible, I create a disk partition, and set up a logical volume using LVM inside this partition and then use it.
# I won't go into the partitioning details here. You can find how to do it in previous posts.
pvcreate /dev/vdb1
vgcreate vg_gluster /dev/vdb1
lvcreate -l 100%FREE -n lv_brick vg_gluster
mkfs.xfs -f -i size=512 /dev/vg_gluster/lv_brick
mkdir -p /data/glusterfs/brick1
echo "/dev/vg_gluster/lv_brick /data/glusterfs/brick1 xfs defaults 1 2" >> /etc/fstab
systemctl daemon-reload
mount -a
If everything has gone smoothly so far, the final command should not produce any output, and the newly mounted partition should appear in df -hP output. At this point, the machines are ready for GlusterFS installation.
I create a GlusterFS volume called shared_storage using the commands below. The systemctl command will be run on all machines. The remaining gluster commands will be run on the first machine, only.
gluster peer probe server02
gluster peer probe server03
gluster peer probe server04
gluster volume create shared_storage disperse 4 redundancy 1 \
server01:/data/glusterfs/brick1/brick \
server02:/data/glusterfs/brick1/brick \
server03:/data/glusterfs/brick1/brick \
server04:/data/glusterfs/brick1/brick
gluster volume start shared_storage
gluster volume status shared_storage
![]() |
| gluster volume status shared_storage |
In the image above, when I entered the command on all machines at the same moment, the first two machines are waiting for the command to finish on the other machines. This is the normal case. The command likely ran on server03 and server04 with a few milliseconds difference. Normally, it should have produced output on just a single machine.
By the way, I do not make any configuration on the host firewall, as firewalld isn't running on my machines.
At this point, I have to use a little workaround for samba. Normally, samba had GlusterFS integration, and it used to work quite well [1]. However, RedHat decided to remove GlusterFS support as of the end of 2024 and shift the resources to Ceph [2][3]. For this reason, this integration has been removed from the distros [5]. In other words, samba-vhs-glusterfs package is no longer available for RedHat and its compatibles. This issue can be worked around as follows (to be run on all machines). Fedora still supports this [4], but I do not know for how long.
echo "localhost:shared_storage /mnt/gluster_shared glusterfs defaults,_netdev 0 0" >> /etc/fstab
systemctl daemon-reload
mount -a
In this way, all machines mount GlusterFS shared disk on themselves. Since I cannot get samba to communicate directly with GlusterFS, I mount and serve this resource via samba as if it is a normal directory. Its drawback is slightly less performance, due to the file system operations having to go through the kernel twice instead of once [1].
Now I can configure ctdb and share the disk over the network, but before I do that, I need to set selinux to Permissive mode. A few sources mention that a ctdb cluster can also be set up without disabling selinux by adjusting only couple booleans, but it didn't work for me.
setenforce 0
Only the IPs of the nodes should be in /etc/ctdb/nodes file. In /etc/ctdb/public_addresses file, you must enter the floating IP of the cluster and the network interface to which this IP will be assigned.
172.18.186.101
172.18.186.102
172.18.186.103
172.18.186.104
EOF
cat > /etc/ctdb/public_addresses << EOF
172.18.186.200/24 eth0
EOF
The contents of smb.conf file should be like this:
netbios name = GLUSTER_CLUSTER
workgroup = SAMBA
#security = user
clustering = yes
passdb backend = tdbsam
idmap config * : backend = tdb
idmap config * : range = 1000000-1999999
#printing = cups
#printcap name = cups
#load printers = yes
#cups options = raw
[shared_storage]
comment = GlusterFS
path = /mnt/gluster_shared
valid users = sambauser
read only = no
guest ok = yes
create mask = 0664
directory mask = 0775
And I configure ctdb then as follows:
systemctl disable smb nmb
ctdb event script enable legacy 00.ctdb
ctdb event script enable legacy 10.interface
ctdb event script enable legacy 50.samba
systemctl enable --now ctdb
The enabled scripts are for ctdb to manage the floating IP and clustered samba. The status of the cluster can be checked using ctdb status command. If everything is configured correctly, all nodes should be "OK" in the output, a few seconds after running the last command. By the way, it is also possible to assign multiple floating IPs to the cluster, I only assigned one. The status of the IP(s) can be checked using ctdb ip all command.
![]() |
| ctdb status |
Now final step is to create a user on all samba nodes:
useradd -u 2000 -g sambagroup -s /sbin/nologin sambauser
smbpasswd -a sambauser
The system password of sambauser isn't needed as samba keeps its own database.
Thus, the storage cluster is up and running. Now I need a client to mount and test it. Setting this up is much easier. I run following commands on a fifth machine on the same subnet:
dnf -y install glusterfs-fuse samba-client cifs-utils
mkdir -p /mnt/my_shared_storage
cat >> /etc/hosts << EOF
172.18.186.101 server01
172.18.186.102 server02
172.18.186.103 server03
172.18.186.104 server04
EOF
mount -t glusterfs 172.18.186.200:/shared_storage /mnt/my_shared_storage/
df -hP
With the first command, I install the glusterfs repository. Then I install the package, needed to mount glusterfs and also samba client, with the second command. On the client, I also create the hosts file containing the list of all servers. Even though the client mounts the storage via a floating IP, it has to be able to resolve IP addresses of cluster nodes for intracluster communication. This step is of course not necessary in an environment with DNS server. In the final step, I mount shared_storage as glusterfs, and when I check with df a 60 GB disk was mounted. But I have not mounted it as samba, yet.
Before mounting as samba, I can first view shared resources using the first command below, and then connect to this share using the second command:
smbclient //172.18.186.200/shared_storage -U sambauser
Before mounting this share, I unmount the GlusterFS, mounted in previous step, and then mount the resource to the same mount point as a samba share:
mount -t cifs //172.18.186.200/shared_storage /mnt/my_shared_storage -o username=sambauser
And add the following line to /etc/fstab to make it permanent:
/etc/samba/user.cred has the login credentials.
username=sambauser
password=secret
domain=SAMBA
EOF
chmod go= /etc/samba/user.cred
chown root:root /etc/samba/user.cred
systemctl daemon-reload
mount -a
df -hP
I conclude this post here for now, as it has got long enough. I plan to explain how to integrate this with AD or LDAP in a future post.
Sources:
[1]: https://lalatendu.org/2014/04/20/glusterfs-vfs-plugin-for-samba/
[2]: https://en.wikipedia.org/wiki/Gluster#cite_ref-10
[3]: https://www.reddit.com/r/kubernetes/comments/zojdl7/whats_the_story_behind_the_abandonment_with/
[4]: https://pkgs.org/search/?q=samba-vfs-glusterfs
[5]: https://www.samba.org/samba/docs/4.7/man-html/vfs_glusterfs.8.html



















