The Code Segment

Saturday, August 6, 2022

Linear Algebra and Its Relationship between Video Subtitles

In memory of Prof. Dr. Metin DEMİRALP, who helped me understand the linear algebra topics and the relationship between them, that I benefit even today.

Hi there. I was recently syncing subtitles to some of my content I have, and I realized that this is actually a linear algebra problem and it is a quite simple one: Linear equations in two variables. In this post, I will first define the problem and then discuss how I solved it.

First of All, Introduction and Basic Definitions
The most important thing to know when syncing subtitles is the frame rate of a video. Each video is actually made up of consecutive frames, which is also called frame per seconds (FPS). This is chosen by its creator from among some predefined values, like 23.976, 24, 25 etc. These values are given with a subtitle. You need to choose the correct one when downloading.

Example from planetdp.org

There are dozens of different formats for subtitles files, but I will focus on two. The first one is MicroDVD format with .sub file extension. These are pretty simple: the frame number between two curly braces, where the text appears, another frame number between two curly braces right next to first one, where the text disappears and the subtitle text. Example:

{0}{25}Hello

This will display "Hello" in the first second of a 25 FPS video. This format was especially popular in the early 2000s. It can be easily noticed that this subtitle won't fit to same video with different FPS and it needs to be edited. If a 24 FPS subtitle is used for a 25 FPS video, the texts move forward one frame for each second and after 25 seconds, the difference becomes 1 sec. Quite annoying!

SubRip is the second format with .srt file extension. This one is also quite simple:
1- A sequence number at the first line
2- Two timestamps in HH:MM:SS,TTT format, separated by " --> ". The first timestamp is when the text appears and the second when it disappears.
3- Subtitle text
4- An empty line defines the end of the text.

Example:

1
00:00:00,000 --> 00:00:01,000
Hello

This is the same as the first example. Since this format is FPS independent, it can be used with any FPS. Even though this format is available since the 2000s, it outstripped MicroDVD format in prevalence. This format is more human readable as there are no obfuscated frame numbers.

Of course, subtitle formats are not limited to these. For instance, Aegisub format, used extensively with anime, is way more complex than these two. As far as I know, it stores the text as vectors and the text can be printed anywhere on screen in any font, size and color. But it is beyond the scope of this post, as it is more difficult to edit than text-based formats.

Even though Subrip format is FPS independent, there are some subtitles on the internet that are somehow out of sync. These are probably subtitles converted from .sub to .srt.

Description of the Problem
As I've given the definitions, I can now describe the problem. In general, there can be two problematic situations with subtitles: (a) Subtitle is constantly ahead or behind. e.g. The subtitled video has an intro at the beginning and somebody else has cut it out while editing the video. So, the subtitle appears later or earlier than it should, but the time difference between the audio and the subtitles is always constant. Or (b) the subtitle is in sync with the audio at some point but it goes fast or stays slow and after a while sync disappears. As I mentioned above, this is seen especially in .sub files with wrong FPS and/or .srt files derived from them. And more frustrating is that, these two coexist. The subtitle is never in sync and the difference is changing constantly.

I've visualized both situations for a better understanding. For simplicity, I assume FPS rate is always 25. Let the X-axis be the time and Y-axis be the frame counter. The frame counter increments by 25 every second and an appropriate subtitle line must be "overlapping" with the video line. If the subtitle is ahead or behind, the subtitle line is parallel to video, either above or below it.

In case of FPS incompatibility, subtitle is in sync at the beginning but it disappears in time. Since they are in sync, both lines intersect at the beginning, and they diverge because the slope of the subtitle line is not equal to the slope of video.

And situations, where both problems coexist:

After formulating the problem with slope, I can now express everything mathematically.

Linear Equations in Two Variables and Solving Systems of Equations
An equation of the form y = mx + c, where m and c are constants, is called an equation with two variables. For instance, let m = 3 and c = 4. It gives y = 3x + 4. For each value of x, I get a y. i.e. if x = 1 then y = 7. If I find all y's that correspond to all x's and mark them as points on the plane, a line is formed. This line intersects the vertical axis at 4 (x = 0 case) and goes up (increases) 3 units for each unit where it goes to the right (where x increases). Therefore, m is called the slope parameter or just slope of the line. As m decreases, slope gets less steep. If m = 0, the line becomes a straight line and if m is negative, the slope is downwards. I can also drag the line up or down, by changing c parameter.

In previous section, I expressed a video with a line. The slope of the video line will be equal to its FPS and c = 0 because the audio starts obviously with the video. For example y = 25x. All thick blue lines are like this. And if the subtitle is compatible with the video, its equation must be the same. If not, my aim is to make them equal. Nice and easy.

As we saw on the graph, if the subtitle is ahead or behind, the lines are parallel. Parallel lines have equal slopes. And having the subtitle behind or ahead is actually the case where at frame zero (x = 0), the subtitle line doesn't intersect vertical axis at zero. It implies that the subtitle equation is actually y = 25x + c, where c is not equal to zero. What I need to do, is to find c and subtract (or sum) this from all timestamps in the subtitle.

Constantly widening gap between a subtitle and the video means that the slope of the subtitle line is different from that of the video. I need to change the slope parameter i.e. the value of 'm' here. Since 'm' is a multiplier, I need to find the value that gives the correct time, when multiplied by the subtitle's timestamps. E.g. if the subtitle is 24 FPS and the video is 25, I multiply all timestamps by ²⁵/₂₄ to rearrange. If the subtitle gets synced as soon as I do it, I can assume that c = 0 in subtitle equation, in other words the subtitle equation is y = 24x. In case c is non-zero, I also need to sum the difference after equalizing the slope value, like I explained in previous paragraph.

This is all about equations in two variables. How about systems of equations? They contain more than one equation, and they represent that many lines on a plane. Example:

Solving them is easy. Because y's are equal, I can equate both of them and add 9 to both sides to eliminate the nine on left hand side (LHS) of the equation. (I could have eliminated 3 on right hand side (RHS), then there would be -6 on LHS but I don't want to deal with negative values). If I subtract x from both sides, I find x = 6. The operation I applied is given after // chars.

When I substitute x with 6 in the second equation, it gives y = 6 - 3 = 3. (x = 6, y = 3) denotes a point, where two lines intersect. Because they are lines, they only intersect at one and only one point. This can be seen on the next figure.

Because the line slopes are not equal, one will increase less, the other will increase more and they always catch each other, unless they have same slope (parallel). They may also be parallel and never intersect. An example of this is, when the subtitle is constantly behind the audio.

Finally, the lines may be overlapping. In this case, all x values satisfy the system, but whether this is a meaningful solution or not depends on the problem.

Matrix Multiplication and Matrix Solutions of Systems
I had explained matrix multiplication in the BLAS and LAPACK article in detail. In summary, first row of the first matrix is multiplied by the first column of second matrix element by element. The values are summed up and the first element of the product matrix is found. Then first row and second column multiplied and summed, gives the second element (first row second column) etc... p. row * q. column = p x q element. This operation is better understandable with an example, but I'll first rearrange our system:

Now, I can easily write them in matrix notation:

First row elements, 2 and -1, multiplied by the first and the only column of the second matrix and added together, gives 2*x + (-1)y = 9 and the same operation with second row elements gives x + (-1)y = 3. The matrix multiplication above is exactly the same as the system of the equations. It is moreover both tidy and stylish. Since the first matrix consists of the coefficients of the equations, it is called the coefficient matrix. The matrix to the right of the equation, which is actually a vector because it is 2-by-1 matrix, is called right hand side vector or simply RHS Vector.

In case, there is no solution of the system, this corresponds to the case where the rows or columns of the matrix are proportional to eachother. In linear algebra, this means that the determinant of the matrix is zero, in other words at least one of its eigenvalues is zero but if I get into that stuff, I really get away from the real topic.

Now, I have a video, whose equation I know and I have subtitle whose equation I don't know but I want to synchronize to the video. Let's call the subtitles equation y = m₁x + c₁ and the video equation y = m₂x + c₂. I need to multiply m₁ by such a number (let's call it m'), so that the result gives m₂. I mean: m₁m' = m₂ and c₁m' + c' = c₂. This part could be a bit confusing but it's OK, because I'll explain it later in a simpler way:

Note: The term "subtitle" is refering to all the texts from the video. To distinguish the individual texts appearing on the screen at a given moment, I will refer it as "subtext" throughout the rest of this article. The term "subtitle" geometrically corresponds to the y = mx + c line, while the "subtext" term will be the value of y for a given x.

To adapt the subtitle to a video, I first need two subtexts, of which I know their exact timing (when they are spoken in the video). I can find their timings by listening or if I can for example find a fully synchronous subtitle from another source in any other language, that's even better. Of course, to understand if it is synchronous, it has to be in a language, that I can more or less understand (or check on google translate) or I can choose proper names from the audio and try to understand them approximately.

Okay, but why two? Let's say, I have a timestamp from the video and a corresponding subtext with wrong timing and I want to sync. A concrete example; I heard a "Hello" at the 33rd minute of the video and the timestamp of the "Hello" subtext is at the 35th minute. Should I just subtract by two minutes of should I multiply by ³⁵/₃₃? I cannot know because maybe the difference between the audio and subtext becomes four minutes at the 60th minute (slope is different). To solve m and c precisely, I need two subtexts. This is exactly the same as the Euclidean axiom, "one and only one line passes through two distinct points", we are stumbling upon.

So, I have two subtexts. For instance, "Hello" is spoken at 33rd minute and "Good bye" at 48th minute of the video. The subtitle I have contains corresponding subtexts at 35th and 51st minutes. And I will give my asyncronous subtitle to a function as an input, such that it will produce syncronized subtitle as its output. General expression of the function is y = mx + c. The x's are the timestamps of the asynchronous subtitle (input) and y's are the synched timestamps (output). In this case:

I could find the unknowns by multipling and subtracting side by side, but I'll do it aesthetically:

Since the coefficient of c is always 1, so is the values of the second column. At this point, I will ignore the method to solve this and interpret the results. I left the solution to wolfram alpha and found c = ³/₁₆ and m = ¹⁵/₁₆. This means, that the FPS ratio between the video and the subtitle is ¹⁵/₁₆ (maybe the subtitle is 24 and the video is 22.5 FPS) and after this skew is corrected, there is still a difference of ³/₁₆ minutes between audio and subtitle, which I need to subtract.

But how accurate is this function? Is the subtitle synced with a perfect accuracy for each second of the video or are there any errors (and if yes, how much)? Of course, I am not talking about the case that the subtitle itself is inaccurate (maybe some subtext is missing). How reliable is this function I found?

Robustness
The reliability of the function is a measure of the accurately it can sync the subtitle throughout the video. In math literature, it is expressed as "robustness" which means how much a system can tolerate errors.

The timestamps I found by listening are quite prone to errors, as they depend on how well I listen and how quickly I react. Furthermore, I don't know, how much error do the selected subtext contain initially. Maybe the subtext, I chose, had originally a skew with the audio. Even a skew of 0.2 seconds can lead to serious discrepancies. Let's examine this graphically:

I assume, that the subtext at minute 4 is perfectly correct, but I made a mistake of 0.2 seconds at minute 5 (4.8 frames for a 24 FPS video). This causes a skew of 12 seconds at first hour of the video. The lower part of the above graph shows the impact of my error on the entire video. When I diverge the second point from the first, for example, first subtext at minute 4 without error (again) and this time second subtext is at the 60th minute with same error of 0.2 sec:

The discrepancy on the both ends has shrunk to an inconspicous extent. The second graph looks way better than the first one. The error of 0.2 seconds at the 60th minute just became 0.343 seconds at the 100th minute. If I had synched with a subtext at the very end, it would be much less. So, it makes sense to get one subtext from the beginning and another one towards the end of the video for least error.

If I may go a bit back to linear algebra, the robustness of a linear system is related to how big the absolute value of the determinant of the coeff. matrix is. (or how big the differences between eigenvalues .... Anyway, I am calming down). I don't need to go to the definition of determinant in detail or its calculation. Because of the special form of the matrix above (all ones in second column), the determinant is equal to the difference between two timestamps. Therefore, the mathematical background of "having a larger gap between two subtexts to reduce the error", has also been proven.

Python Code
The coolest thing is that, in order to sync subtitles, the knowlegde of linear equation systems is not necessarily needed. Given the coeff. matrix and RHS vector, the linalg.solve function in Python's numpy library returns the result. This code snippet below reads two timestamps, that are in the subtitle and should be, converts them to seconds using time2num(). It creates the system and calculates m and c.

#!/usr/bin/python3

# input should be like:
# 00:03:29,632     00:02:13,532
# 00:43:56,890     00:44:00,486
# -------------    --------------
# is in subtitle   needs to be
#
# Formula:
# m * 00:03:29,632 + c*1 = 00:02:13,532
# m * 00:43:56,890 + c*1 = 00:44:00.486
#
# [ 209.632    1 ]   [ m ] = [ 133.532 ]
# [ 2636.890   1 ] * [ c ] = [ 2640.486 ]
# Calculate m, c params with this script then apply
# this transformation with subtitle.py script. 

import numpy
import re

def time2num(time):
    timei = re.sub(r",", ".", time);
    (h, m, s) = timei.split(':');
    return round(int(h) * 3600 + int(m) * 60 + float(s), 3);

if __name__ == '__main__':
    print("is in subtitle    needs to be\n--------------    -------------");

    (OL1, OG1) = input().split();
    (OL2, OG2) = input().split();

    A = numpy.array( [[time2num(OL1), 1.0], [time2num(OL2), 1.0]], float );
    b = numpy.array( [time2num(OG1), time2num(OG2)], float);

    x = numpy.linalg.solve(A, b)

    print(x);

Output values are given as input to the second code snippet below, but first input is the subtitle file as command line parameter. The code reads all the timestamps in .srt file. Here, a regex search is performed with re.search(). The string "-->" is present in all timestamp lines are highly unlikely to be found in the subtext. The appearance and disappearance times of subtexts are captured from these lines, converted to seconds using time2num() and the fitting operation is applied with m and c values. Newly found values converted to timestamps back with a formula and written to a copy of the file.

#!/usr/bin/python3

import argparse
import os
import re

# Using linsolver.py script, calculate first, what the m and c 
# parameters are. Then enter them as input to this script. 
# These will be used at lines 41-42

def time2num(time):
    timei = re.sub(r",", ".", time);
    (h, m, s) = timei.split(':');
    return round(int(h) * 3600 + int(m) * 60 + float(s), 3);

def num2time(numtime):
    s = round(numtime % 60, 3);
    m = int((numtime / 60) % 60);
    h = int(numtime / 3600);
    return re.sub(r"\.", ",", ("%02d:%02d:%06.3f" % (h, m, s)))

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Modify subtitle timings');
    parser.add_argument('arg_filename', type=str,
                        help='Subtitle file name to open');
    args = parser.parse_args();

    hFileIn  = open(args.arg_filename, "r", encoding="ISO-8859-9");
    hFileOut = open(''.join( os.path.splitext(args.arg_filename)[0] + ".mod.srt" ), "w", newline = "\r\n");

    print("Input m and c separated by space");
    (m, c) = input().split();

    for line in hFileIn.readlines():
        if(re.search(r"-->", line)):
            substart = time2num(line.split()[0]);
            subfinal = time2num(line.split()[2]);

            #####  Calculations here  #####
            substart = substart * m + c;
            subfinal = subfinal * m + c;

            hFileOut.write(num2time(substart) + ' --> ' + num2time(subfinal) + '\n');
        else:
            hFileOut.write(line);

    hFileIn.close();
    hFileOut.close();

Long story short, even linear algebra can be useful to learn without complaining about what it is used for in real life.

Note: This article was actually ready to publish in April 2022, however it has delayed four months, first because of house moving, then due to a big problem caused by the worst mobile operator ever, that I couldn't receive MFA codes to login and finally because of the summer vacation.

Saturday, December 18, 2021

Linux from Scratch and Creating initrd for LVM

Hi there. Even though, I wanted to write about electronics for a while, it did not happen again, unfortunately. Today's topic is configuration that took me two weeks. More precisely creating an initrd. So, why do I need such a thing?

Let me start from LVM first. Although I had bad experiences with it at first, I've gotten used to it so much since 2014, that it's now unimaginable to have a linux installation without it. I do not have partitions spanning multiple disks, but I think LVM's most useful feature is its flexibility in moving and resizing partitions.

Linux from Scratch (LFS) project is on the other side of this subject. It is a project, which allows users to compile necessary tools and linux kernel from scratch i.e. from their source code, in order to build a working linux environment. It starts with a working compiler of course. After compiling gcc as a cross compiler, basic tools are compiled with it and a minimal chroot (change root) environment is created. Linux kernel is compiled at the last step and made bootable with grub. I got to know about this project from Viktor Engelmann's videos. I will follow a different path here from his. We both are actually following slightly different paths from official LFS documentation (or "the book" in LFS terminology) but they are both based on the book in the end. A few steps that I will follow, will be from the (B)eyond LFS book.

There are actually two versions of the book, systemd and initd versions but they are basically same up until the chapter 8: Packages. This post is based on the eleventh version of the book for both systemd and initd. And there are two different formats of the book for each version under "Download" and "Read Online" in website. Although the .pdf file in Download page looks more compact, copying and pasting command is definitely easier from the .html version. Btw, most of the time in this project is spent compiling packages (takes around 16h on my machine to compile it entirely, excluding tests). Therefore, I will only mention the steps which I didn't follow from the book.

Note that the hyperlinks in this post are linked to the latest version of the book as of the date of writing. When a more recent version than v11 is released, some links may lead to different pages.

System Requirements

The book says that a 10 GB partition would be enough to compile all packages, but a 30 GB partition is recommended for growth. The disk usage had exceeded 20 GB while trying to compile Fedora kernel because too many kernel features are enabled.

I will install the OS on the first disk of a two-disk virtual machine (VM) and compile packages on the second disk. The advantage of this is that I do not risk to corrupt the disk of my own (host) system in case I run a command on the host instead of chroot environment, accidentally. Compiling code requires high disk performance. Therefore, working on SSD is highly recommended. VM will give an IO performance close to the IO performance of its host. Another advantage of working in a VM is that it can provide an isolated environment even if the physical disk doesn't have enough free space to allocate a dedicated partition.

In his videos, Viktor Engelmann downloads and compiles the packages on an USB stick. I personally don't find this logical because even spinning disks can provide more IOPS than USB sticks. Therefore working on an USB is not efficient for this project. If LFS is to be booted from USB, everything can be configured and compiled on a disk and then copied to an USB stick before the grub step and the bootloader can be written to USB at the end.

There are no other requirements than disk in this project. A fast CPU will of course shorten the compile time but you will get the same result on a slow CPU.

There is also no restriction on the OS to be installed on the first disk. Since I feel myself comfortable with RedHat based systems and wanted to try CentOS 8 Stream for a long time, I'll go for it.

VM Setup and Creating Partitions

To keep this post as short as possible, I won't go into the minor details of this setup. I used both VBox and vmware virtualization platforms. In the previous post, I have written that two more kernel features must be enabled when compiling the kernel for vmware, I will mention it again in kernel compilation. I created a VM with 2 GB RAM and 20 GB disk. I chose "Minimal Install" as base and selected "Development Tools" as additional software (left image). All settings are as in the image below. LVM is created automatically during installation, I did not configured it manually. My main concern is creating LVM for LFS and that I will configure manually. But I have to add another 40 GB disk. A new disk cannot be added to a VM on the fly in VBox (revision: it is actually possible). In vmware, new disk can be rediscovered with echo "- - -" > /sys/class/scsi_host/host2/scan command or reboot the VM if you add a new disk on the fly. Same command can be applied in VBox. The new disk was connected to host2 in my VM.

On CentOS, sshd is enabled by default. I found its IP and connected via SSH because it is easier to copy and paste commands into terminal than to type them on console. If VM has a NAT configuration, you have to configure "port forwarding" to connect to VM. I had mentioned this in one of the previous posts .

My LVM template is, a 512 MB /boot partition at the beginning of the disk and an LVM partition on the rest. In LVM partition, two 4 GB partitions for /var and /home, two 2 GB partitions for /tmp and swap and rest for root partition. Total usage on partitions except root does not exceed 60 MB. As a result 12.5 GB space remains untouched. As I mentioed above, root partition usage can reach 16 GB. Therefore, I added a relatively large disk of 40 GB. When the LFS compilation finished, net size of vmdk disk file was 28.8 GB.

I quickly partitioned the disk with following command:

echo -ne "n\np\n1\n\n+512M\nn\np\n2\n\n\nt\n2\n8E\np\nw\n" | sudo fdisk /dev/sdb

The result can be checked from the output:

Device     Boot   Start      End  Sectors  Size Id Type
/dev/sdb1          2048  1050623  1048576  512M 83 Linux
/dev/sdb2       1050624 83886079 82835456 39.5G 8e Linux LVM

Then I created partitions in LVM:

sudo pvcreate /dev/sdb2
sudo vgcreate vg_lfs /dev/sdb2
sudo lvcreate -n lv_var  -L 4G vg_lfs
sudo lvcreate -n lv_home -L 4G vg_lfs
sudo lvcreate -n lv_swap -L 2G vg_lfs
sudo lvcreate -n lv_tmp  -L 2G vg_lfs
sudo lvcreate -n lv_root -l100%FREE vg_lfs
sudo lvscan

In the output of last command, I should see both new and existing partitions. Partitions need to be formatted after they are created:

sudo mkfs.ext4 /dev/sdb1
sudo mkfs.ext4 /dev/vg_lfs/lv_root
sudo mkfs.ext4 /dev/vg_lfs/lv_tmp
sudo mkfs.ext4 /dev/vg_lfs/lv_home
sudo mkfs.ext4 /dev/vg_lfs/lv_var 
sudo mkswap    /dev/vg_lfs/lv_swap

The book of LFS assumes ext4 FS is used during the installation (section 2.5).

Let's Create LFS Work Environment

When I saved and ran the script in section 2.2, only python3 and makeinfo were not found. I installed python3 with sudo dnf install python3 command. I will come to the latter package later.

I created an LFS env. variable with export LFS=/mnt/lfs command and added this to .bash_profile as well (section 2.6). I also created this directory. Now, I need to mount partitions (section 2.7) but I wrote a script, to not manually mount all four partitions:

#!/bin/bash

if [[ x$LFS == "x" ]]; then
    echo '$LFS' variable is empty.
    exit 1
fi

STEP=1
for PARTITION in "/" "var" "home" "tmp"; do 
    if [[ $PARTITION == "/" ]]; then
        LVMNAME="root";
    else
        LVMNAME=$PARTITION;
    fi

    echo "[ $STEP / 5 ] Mounting $LVMNAME partition"
    if [ ! -d "$LFS/$PARTITION" ]; then
        sudo mkdir -pv "$LFS/$PARTITION"; 
        sudo chown $USER:$GROUPS "$LFS/$PARTITION";
    fi

    sudo mount "/dev/vg_lfs/lv_$LVMNAME" $LFS/$PARTITION
    sudo chown $USER:$GROUPS "$LFS/$PARTITION";

    STEP=$((STEP+1))
done

echo "[ 5 / 5 ] Activating swap.."
sudo swapon /dev/vg_lfs/lv_swap  2> /dev/null

If this script is called by just a single user, $USER and $GROUPS variables can be substituted with the username and group name. And swap doesn't necessarily need to be activated but I did it nevertheless.

LFS Packages

When I mentioned "package", .rpm or .deb files should not be understood. These are source code packages. Before downloading them, I need some additional software, i.e. wget, vim-enhanced and makeinfo, which I previously skipped. makeinfo comes in texinfo package but its repository "powertools" is disabled. So, I installed with following command:

sudo dnf install --enablerepo="powertools" texinfo wget vim-enhanced

Then I created "sources" directory (section 3.1), downloaded the files in this directory using wget list and checked their hashes. If there are some problems with download, you can give --no-check-certificate parameter to wget.

I switched to root to run the commands in section 4.2, but first I exported LFS variable again, for root (because I had first exported it for normal user). Then I ran commands. I don't need to create an lfs user (4.3) since I am in a VM. I changed the ownership of the directories to my normal user. Then I saved the given .bashrc to my home directory (not root) with the name "lfs_env.sh" and loaded it with source command. If I reboot the VM, I will run this again.

I followed the fifth and sixth chapters exactly as they are. I ran the commands in section 7.2, up to 7.3.2. Since next commands (starting from section 7.3.3) will run on each entry to the chroot env. and the mounted resources must be unmounted in reverse order on exit, I created a script from the commands:

#!/bin/bash

if [[ x$LFS == "x" ]]; then
    echo '$LFS' variable is empty.
    exit 1
fi

mount -v --bind /dev $LFS/dev
mount -v --bind /dev/pts $LFS/dev/pts
mount -vt proc proc $LFS/proc
mount -vt sysfs sysfs $LFS/sys
mount -vt tmpfs tmpfs $LFS/run

if [ -h $LFS/dev/shm ]; then
  mkdir -pv $LFS/$(readlink $LFS/dev/shm)
fi

chroot "$LFS" /usr/bin/env -i HOME=/root  TERM="$TERM"  PS1='(lfs chroot) \u:\w\$ ' PATH=/bin:/usr/bin:/sbin:/usr/sbin /bin/bash --login +h

umount -v $LFS/run
umount -v $LFS/sys
umount -v $LFS/proc
umount -v $LFS/dev/pts
umount -v $LFS/dev

If /dev/pts is not unmounted while exiting chroot, a new terminal cannot be opened in VM and VM needs to be restarted.

I entered chroot and continued to create necessary files and directories. Btw, /etc/passwd and /etc/group files in section 7.6 are the first point where systemd and sysVinit differ.

Since the script does unmount the resources when exiting chroot, the unmount commands in section 7.14 are not needed anymore. And I also can create a VM snapshot for backup, so the rest is also not needed.

In section 8.25, while compiling shadow with cracklib support, I got "undefined reference to `FascistCheck'" error. I reconfigured with following command and then the compilation succeeded:

LDFLAGS=-lcrack ./configure --sysconfdir=/etc --with-libcrack --with-group-name-max-length=32

The "make -k check" step in section 8.26 takes so long, that I started the test before going to bed and it was still incomplete when I woke up. From my understanding, there are more than 350K tests and some of them are stress tests. It is also written in this section that some tests are known to fail. Test results are available here. My results were almost the same as these. There is very simple sanity chech at the end of the section. IMHO, just doing this test would be enough but the book considers "make check" step as critical and not to be skipped.

In section 8.69, systemd and sysVinit packages are getting significantly different.

At the end of chapter 8, I removed the +h parameter that I gave to bash, in lfs_enter_chroot.sh script and saved it.

The ninth chapter handles initd/systemd settings. This means these two chapters are completely different in each version. In this chapter, I followed the book exactly. I entered KEYMAP=trq in /etc/sysconfig/console for initd or in /etc/vconsole.conf for systemd for setting Turkish keyboard layout. If there is no KEYMAP set, English layout is loaded by default. I skipped section 9.10.3 of systemd book because I have a separate partition for /tmp.

Section 10.2 is very important because fstab be created. Here, I have to add all partitions, I created in the beginning, into fstab. /boot partition is currently on /dev/sdb1 because it is still on the second disk. But when I detach the first disk from the VM to boot from LFS, this partition will become /dev/sda1. Hence I cannot use this device name. Each disk under linux, has a unique and fixed UUID. I have to use this, so that /boot can always be mounted regardless it is on first or second disk. The value linked to /dev/sdb1 in ls -la /dev/disk/by-uuid output is the UUID, I need. Or using,

lsblk -o NAME,MAJ:MIN,RM,SIZE,RO,TYPE,MOUNTPOINT,UUID

command, I list the disks and their UUIDs. lsblk output is more verbose but UUID column is empty in chroot environment. Therefore, I ran the command outside of chroot (in VM), noted the value down and created fstab with this value:

/dev/mapper/vg_lfs-lv_root   /       ext4   defaults  1  1
 /dev/mapper/vg_lfs-lv_var    /var    ext4   defaults  0  0
/dev/mapper/vg_lfs-lv_home   /home   ext4   defaults  0  0
/dev/mapper/vg_lfs-lv_tmp    /tmp    ext4   defaults  0  0
/dev/mapper/vg_lfs-lv_swap   swap    swap   pri=1     0  0
UUID=01234567-89ab-cdef-0123-456789abcdef /boot ext4 defaults 0 0

and for initd version; proc, sysfs, devpts etc. entries must be added as well. I have not included them here. These are already in the book.

Compiling the Linux Kernel

After compiling all packages, it comes to compiling the kernel. Section 10.3 of both versions is about compiling the kernel. I created a default configuration by running make mrproper and make defconfig. I have explained these commands in detail in my previous article. I will use make menuconfig to select other features.


make menuconfig TUI

There is not much to change in defconfig for

systemd features

initd version of the book. It is enough to have uevent helper turned off and devtmpfs support on. For systemd version of the book, more features needs to be enabled (next image).

It is mentioned in Systemd's Errata, that CONFIG_SECCOMP is not under "Processor type and features" submenu, but that's OK, because this feature comes on in defconfig. It can still be searched in menuconfig or found in .config:

(lfs chroot) root:/sources/linux-5.13.12# grep -n SECCOMP  .config 
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set

Btw, this is just my personal opinion, but I prefer to keep such errors to myself, as I think, some people in LFS support channels are not friendly at all, based on my own experience in IRC and their mailing list archive.

Back to the topic. My goal is installing LFS with LVM. LFS does contain the bare minimum, for example it doesn't have a window manager. LVM is also not counted as basic and it is a part of another project called (B)eyond LFS or BLFS in short. In LVM section of the BLFS book, there is another list of features that needs to be enabled for LVM support. I will enable them and include the modular ones to kernel (just personal preference). In the meantime, I looked at the Gentoo documentation for LVM and enabled a few more features that are not mentioned in BLFS but recommended by Gentoo. Instead of recompiling the kernel because of missing features, I prefer to boot with a few KBs larger kernel.

Finally, the readers working with vmware have to activate "Fusion MPT device support" as well, which is the main topic of the previous article. This is basically the driver of SCSI controller in vmware and without it, the kernel cannot find the hard disk and cannot boot. This feature is under "Device Drivers" submenu.

After completing all these steps, I went back to LFS section 10.3.1, compiled the kernel with make and then installed the modules. Since my /boot partition is on /dev/sdb1, mounted it with mount /dev/sdb1 /boot command and copied vmlinuz (kernel), System.map and config files there.

GRUB Bootloader

After compiling the kernel and copying it to /boot, now it's time to set up GRUB. At the moment, I have two options: I can install GRUB on /dev/sdb (actually, this was my original plan) or I can add LFS to CentOS's existing GRUB on /dev/sda. The advantage of the first option is that the disk is configured to run independently of CentOS. The advantage of the latter is that it is easy to set up.

First, I will configure first option: I ran grub-install /dev/sdb command. grub.cfg file is slightly different than the one in the book:

set default=0
set timeout=5

insmod ext2
set root=(hd0,1)

menuentry "GNU/Linux, Linux 5.13.12-lfs-11.0-systemd" {
    linux   /vmlinuz-5.13.12-lfs-11.0-systemd root=/dev/vg_lfs/lv_root ro
}

With "set root" keyword, GRUB's root device is set to first partition of zeroth disk. From GRUB's point of view, LFS disk is not zeroth yet, but it will be when CentOS disk is detached. There is no need to specify paths prefixed with "/boot" because /boot partition is separated. The root argument, given to the kernel is the device path of the root partition. Btw, the configuration above is for systemd. For initd, the "menuentry" and "linux" lines will not have "-systemd", that's all.

I exited chroot env. after saving this. Then I shut the VM down and removed its first disk. When I powered the VM up again, I saw GRUB menu, I just created and got kernel panic while trying to boot with this entry: Yaay!. OK, it's not something to be happy about it but this indicates two things: (1) grub is set up correctly, (2) kernel is properly compiled and copied to /boot.

So, why did I get a kernel panic, then? As seen in call trace, in mount_block_root function, kernel could not find the disk (specified with root= in GRUB) to mount to root directory. Why? Because LVM has not been activated yet. Unfortunately, there is nothing to do here, so I added virtual disk back and returned to CentOS.

Do I have to remove the disk to boot to LFS and add it back to boot to CentOS when any problem occurs? Hell, no! I appended following lines to /etc/grub.d/40_custom in CentOS:

menuentry "GNU/Linux, Linux 5.13.12-lfs-11.0-systemd" {
  set root=(hd1,1)
  linux   /vmlinuz-5.13.12-lfs-11.0-systemd root=/dev/mapper/vg_lfs-lv_root ro
}

This is my second GRUB configuration option which I mentioned above. It is essentially the same configuration except "set root" keyword is in menuentry block. This snippet is for systemd again and there will be no "-systemd" part for initd. Then I transferred the line I added into CentOS' grub.cfg:

GRUB_DISABLE_OS_PROBER=true  grub2-mkconfig -o /boot/grub2/grub.cfg

OS prober is a very nice feature to find other OSes installed and to add them to grup automatically but it doesn't work well due to a bug. Now, it is easier to switch between OSes, so I can continue from where I left off.

initramfs

I opened LVM section (or systemd LVM section) in BLFS book. The second to last paragraph of About LVM says, that an initramfs is required to use LVM on root file system. initramfs, is a compressed virtual disk, containing some basic programs and configs. Acronym for 'Initial RAM Filesystem'. If this file exists, it is unpacked to root directory by bootloader and the programs and configs in it do the necessary operations for the system to continue to boot. "Rescue kernel", which is coming with many distros, is actually a simple initramfs containing a shell.

BLFS has its own initramfs creation script. To add LVM support to initramfs, first LVM must be installed. And for this;

1) First libaio, which is the prerequisite of LVM
2) which is actually not for LVM but it's a very useful utility for troubleshooting
3) mdadm. Its test can be skipped, the command doesn't even run the tests.*
4) cpio to compress initramfs
5) LVM. Its tests also take long time and some of them are even problematic. I configured LVM with --with-thin* ve --with-cache* parameters, given in the book as well as --with-vdo=none parameter. There is an extra command in systemd LVM though it's not critical.
6) and initramfs script must be installed.

* I haven't tested but LVM should also work without mdadm.

mkinitramfs script consists of two parts. The first part is the script itself and the second part is the file named init.in, which will be copied to the initramfs file by the script. In LFS v10.1, this script had a bug. It was searching for coreutils and util-linux components (like ls, cp, mount, umount etc), which are essential for initramfs, in /bin instead of /usr/bin and in /lib instead of /usr/lib. The script was ending with an error. As a workaround, I had linked missing files to where they should be in /usr/bin and /usr/lib. This bug is fixed in v11.

Since I gave kernel version as a parameter to the script, it added kernel modules (.ko files) to initramfs and created it.

(lfs chroot) root:~# mkinitramfs 5.13.12
Creating initrd.img-5.13.12... done.

I copied the file to /boot (the partition must be mounted first) and changed /boot/grub/grub.cfg as follows before exiting the chroot:

menuentry "GNU/Linux, Linux 5.13.12-lfs-11.0-systemd" {
  linux /vmlinuz-5.13.12-lfs-11.0-systemd root=/dev/vg_lfs/lv_root ro
  initrd  /initrd.img-5.13.12
}

The configuration I made above will only work when CentOS disk is removed. I am actually using CentOS' GRUB. So, after exiting chroot, I added the same initrd line to /etc/grub.d/40_custom and rerun this command:

GRUB_DISABLE_OS_PROBER=true  grub2-mkconfig -o /boot/grub2/grub.cfg

I restarted the VM, chose LFS and voilà:

and same result for systemd version:

Although, some services are still failing on systemd machine, they both are booting without any problem in general.