Hi there. In this article of the series, I will write about the FAT, file allocation table. Writing about FAT was initially my main goal, but as I mentioned in the first article, FAT would be confusing without boot sector, as well as boot sector would be also up in the air without MBR.
Basic Concepts
FAT is the a data structure that comes after the boot sector on disks and shows whether clusters are empty and in case they are not empty, their sequential cluster. I will explain in the next section what the clusters are.
Directories are special files which cannot be directly read by users and they consist of list of files within the directory. This list is called as directory table. The elements of these tables are called directory entries.
Basic Concepts
FAT is the a data structure that comes after the boot sector on disks and shows whether clusters are empty and in case they are not empty, their sequential cluster. I will explain in the next section what the clusters are.
Directories are special files which cannot be directly read by users and they consist of list of files within the directory. This list is called as directory table. The elements of these tables are called directory entries.
Standard FAT has types such as FAT12, FAT16 and FAT32. Other than these, many vendors have customized FAT by keeping different data (e.g. file owner) in reserved fields of file/directory entries. I will not mention these as they are not standard. FAT numbers are the number of bits representing any cluster in the allocation table. Interestingly FAT12 is the most complex FAT among these. Apart from these, exFAT is another FAT version from MS optimized for flash disks.
VFAT, on the other hand, is not a special FAT version but an expansion over existing FAT versions to store long file names (LFNs), longer than 8+3. FAT normally supports legacy CP/M 8+3 filenames. Microsoft has done some magic to extend this limit.
Clusters
Clusters are smallest building blocks of the FAT. What a sector means for the disk system, has the same relationship between clusters and file system. A cluster can consist one or more sectors and either contains data from files or of course it can be empty, too. A file also consists of one or more clusters.
As the FAT first released (originally created as FAT8), there was no cluster concept yet. Addressing sectors directly was not a problem for FAT for disks of that time. Although the capacities increased over time, when the disk could not be used as a single partition, there was the option of partitioning the disk even if it is impractical.
FAT12, with 12 bits, could address up to 4096 sectors, i.e. max. 2 MB (in practice this limit was slightly lower). In 1984, MSDOS 3.0 came with FAT16 support. At the same time, IBM PC ATs were just released with their 20 MB hard drives and MS didn't want to lose these computers. First FAT16, could address up to 65 536 sectors or 32 MB using 16 bits. This was not the same as what we know as FAT16 today. Today's FAT16 came out with Compaq DOS v3.31 in 1987. With the upper limit of the file system units remaining the same, a certain number of sectors would be grouped under new file system unit named "clusters". So the units could be bigger than sectors, by adding only a multiplier to the algorithm and thus partition space would be increased: FAT12 partitions up to 16 MB and FAT16 partitions up to 2GB. This multiplier is the "sectors per cluster" value, mentioned in the previous post [1].
In the middle of 90s, disk capacities reached the 2 GB limit. Since FAT16 had reached its limit, FAT32 is designed and the largest addressable disk became 2 TB. With FAT32, the number of clusters increased so much that calculating free space in the partition or finding first free cluster became an issue. Therefore other FAT file systems most likely will not be developed.
Although high number of clusters causes performance problems, low number of clusters means that the same disk consists of larger clusters. Since a cluster is the fundamental unit of the file system, any file smaller or equal to cluster size occupies one cluster. In other words, a 1 KB file occupies 4 KB space on a disk with 4 KB clusters. Similarly, a 5 KB file will take up 8 KB of space with its two clusters on the same disk. This wasted space is called "slack space". Having a lot of small files increased this wasted space.
There is a number in FAT corresponding to the aforementioned clusters. This number holds whether the cluster is empty or full and if full, where the file continues. If a cluster is the last one of a file, FAT contains an end of file (EOF) or end of chain (EOC) mark. If OS encounters IO errors while reading disk, it marks the cluster containing that sector as bad cluster in FAT.
File Allocation TableBefore going into the details of FAT, let's take a look at the layout of disk structures (next figure). As we already know, MBR is at the beginning of any disk except for floppies or any other non-partitioned media. Partition entries in MBR point to boot sectors. As we can see, FAT is (almost always) kept as two copies and its beginning (fat_start) is equal to the sum of hidden sectors and reserved sectors values in boot sector, as I mentioned in previous article. The number of FATs and their length in sectors also exist in boot sector. Number of FATs is multiplied by FAT sectors and added to fat_start value, gives the sector (root_dir_start) where the root directory (purple area) starts. The size of root directory depends on how much space is preallocated for root directory entries. When this size is added to root_dir_start value, it implies (data_start) the beginning of 2nd cluster, the data area. The zeroth and first clusters have a special purpose and they are not found on disk.
These are also known from my previous article. After calculating data_start value, any cluster number can be translated to sector number by decreasing the cluster number by two then multiplying it with sectors per cluster value and adding data_start to that finally:
For simplicity, the figure contains only one partition and its not scaled of course.
FAT12
FAT12 is still used with floppy disks today (are floppies still used today at all?). Normally, FAT12 supports up to 4096 clusters using 12 bit entries but in practice, this limit is only 4078 when reserved cluster number are excluded.
To analyze FAT12, I will use the FreeDOS system floppy, I created for the previous article. (Recall: I created a freedos.ima file of 1 474 560 bytes using dd. Then I inserted it into FreeDOS VM and formatted it with FORMAT A: /S command). I chose this for analysis because the data on it is smaller compared to MSDOS system floppy. There are two files on the floppy, KERNEL.SYS and COMMAND.COM. When I divide their file sizes by 512 (size of a cluster in bytes, obtained from bytes per sector times number of sectors per cluster) and round up, first file has 92 clusters and second has 131 clusters. Now, I check the contents of the floppy image file with following command:
In the previous article, I had calculated the fat_start value as 1 for floppies. So, the FAT starts at 512th byte (0x0200). A snippet of the FAT is below:
00000200 f0 ff ff 03 40 00 05 60 00 07 80 00 09 a0 00 0b |....@..`........|VFAT, on the other hand, is not a special FAT version but an expansion over existing FAT versions to store long file names (LFNs), longer than 8+3. FAT normally supports legacy CP/M 8+3 filenames. Microsoft has done some magic to extend this limit.
Clusters
Clusters are smallest building blocks of the FAT. What a sector means for the disk system, has the same relationship between clusters and file system. A cluster can consist one or more sectors and either contains data from files or of course it can be empty, too. A file also consists of one or more clusters.
As the FAT first released (originally created as FAT8), there was no cluster concept yet. Addressing sectors directly was not a problem for FAT for disks of that time. Although the capacities increased over time, when the disk could not be used as a single partition, there was the option of partitioning the disk even if it is impractical.
FAT12, with 12 bits, could address up to 4096 sectors, i.e. max. 2 MB (in practice this limit was slightly lower). In 1984, MSDOS 3.0 came with FAT16 support. At the same time, IBM PC ATs were just released with their 20 MB hard drives and MS didn't want to lose these computers. First FAT16, could address up to 65 536 sectors or 32 MB using 16 bits. This was not the same as what we know as FAT16 today. Today's FAT16 came out with Compaq DOS v3.31 in 1987. With the upper limit of the file system units remaining the same, a certain number of sectors would be grouped under new file system unit named "clusters". So the units could be bigger than sectors, by adding only a multiplier to the algorithm and thus partition space would be increased: FAT12 partitions up to 16 MB and FAT16 partitions up to 2GB. This multiplier is the "sectors per cluster" value, mentioned in the previous post [1].
In the middle of 90s, disk capacities reached the 2 GB limit. Since FAT16 had reached its limit, FAT32 is designed and the largest addressable disk became 2 TB. With FAT32, the number of clusters increased so much that calculating free space in the partition or finding first free cluster became an issue. Therefore other FAT file systems most likely will not be developed.
Although high number of clusters causes performance problems, low number of clusters means that the same disk consists of larger clusters. Since a cluster is the fundamental unit of the file system, any file smaller or equal to cluster size occupies one cluster. In other words, a 1 KB file occupies 4 KB space on a disk with 4 KB clusters. Similarly, a 5 KB file will take up 8 KB of space with its two clusters on the same disk. This wasted space is called "slack space". Having a lot of small files increased this wasted space.
There is a number in FAT corresponding to the aforementioned clusters. This number holds whether the cluster is empty or full and if full, where the file continues. If a cluster is the last one of a file, FAT contains an end of file (EOF) or end of chain (EOC) mark. If OS encounters IO errors while reading disk, it marks the cluster containing that sector as bad cluster in FAT.
File Allocation TableBefore going into the details of FAT, let's take a look at the layout of disk structures (next figure). As we already know, MBR is at the beginning of any disk except for floppies or any other non-partitioned media. Partition entries in MBR point to boot sectors. As we can see, FAT is (almost always) kept as two copies and its beginning (fat_start) is equal to the sum of hidden sectors and reserved sectors values in boot sector, as I mentioned in previous article. The number of FATs and their length in sectors also exist in boot sector. Number of FATs is multiplied by FAT sectors and added to fat_start value, gives the sector (root_dir_start) where the root directory (purple area) starts. The size of root directory depends on how much space is preallocated for root directory entries. When this size is added to root_dir_start value, it implies (data_start) the beginning of 2nd cluster, the data area. The zeroth and first clusters have a special purpose and they are not found on disk.
These are also known from my previous article. After calculating data_start value, any cluster number can be translated to sector number by decreasing the cluster number by two then multiplying it with sectors per cluster value and adding data_start to that finally:
clus2sect(c) = (c - 2) * sectors_per_cluster + data_start [2]
For simplicity, the figure contains only one partition and its not scaled of course.
FAT12
FAT12 is still used with floppy disks today (are floppies still used today at all?). Normally, FAT12 supports up to 4096 clusters using 12 bit entries but in practice, this limit is only 4078 when reserved cluster number are excluded.
To analyze FAT12, I will use the FreeDOS system floppy, I created for the previous article. (Recall: I created a freedos.ima file of 1 474 560 bytes using dd. Then I inserted it into FreeDOS VM and formatted it with FORMAT A: /S command). I chose this for analysis because the data on it is smaller compared to MSDOS system floppy. There are two files on the floppy, KERNEL.SYS and COMMAND.COM. When I divide their file sizes by 512 (size of a cluster in bytes, obtained from bytes per sector times number of sectors per cluster) and round up, first file has 92 clusters and second has 131 clusters. Now, I check the contents of the floppy image file with following command:
hexdump -C -v freedos.ima | less
In the previous article, I had calculated the fat_start value as 1 for floppies. So, the FAT starts at 512th byte (0x0200). A snippet of the FAT is below:
[ SNIP ]
00000280 05 57 80 05 59 a0 05 5b c0 05 5d f0 ff 5f 00 06 |.W..Y..[..].._..|
[ SNIP ]
00000340 0d d7 80 0d d9 a0 0d db c0 0d dd e0 0d df 00 0e |................|
00000350 ff 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
[ SNIP ]
As I mentioned before, each FAT entry is 12 bits, i.e. 1.5 bytes. As if it couldn't be more complicated, each entry is kept as little endian. To find the nth entry, n has to be multiplied by 1.5 and rounded down. The easiest way to do, is to put n into n + (n >> 1) formula:
#Entry | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | n |
Offset | 0 | 1 | 3 | 4 | 6 | 7 | 9 | ... | n + (n >> 1) |
Lower eight bits of the zeroth entry are in the zeroth byte and remaining higher four bits (nibble) are contained in the lower nibble of the first byte because it's in little endian. In other words, last twelve bits of the first word. This is the case for all even numbered entries.
Lower four bits of the first entry are in the higher nibble of the first byte, i.e. in the half of first byte that doesn't belong to previous entry and its rest is in the next byte. This also can be generalized to all odd-numbered entries. Coding this is easier than explaining:
The whole code is in my github.
I made following listing using the output produced by my code, from the FAT sample above (offsets are in parentheses):
Cluster0: 0xFF0 (0x0000)
Cluster1: 0xFFF (0x0001)
Cluster2: 0x003 (0x0003)
Cluster3: 0x004 (0x0004)
Cluster4: 0x005 (0x0006)
...
Cluster90: 0x05B (0x0287)
Cluster91: 0x05C (0x0288)
Cluster92: 0x05D (0x028A)
Cluster93: 0xFFF (0x028B)
Cluster94: 0x05F (0x028D)
...
Cluster222: 0x0DF (0x034D)
Cluster223: 0x0E0 (0x034E)
Cluster224: 0xFFF (0x0350)
Cluster225: 0x0 (0x0351)
Cluster226: 0x0 (0x0353)
...
The first two entries have special meaning, I will mention these later. They are not important for now since the data area is starting from second cluster. The value of the second entry is 3, which means the following cluster is third. The value of the third entry is 4, so the following cluster is fourth and so on. This chain continues up to the 93rd cluster and the 93rd entry has the value 0xFFF. This means end of chain or end of file, indicating that cluster is the final cluster of the file. Since this value also indicates end of directories, the term EOC can be said more correct but the term EOF is also correct if directories are considered as a special type of file that contains other files. I will go over this in "Directories" chapter. All values between 0xFF8 and 0xFFF are EOF. The sample FAT block contains zeroes starting from cluster 225 to its end. Zero means that the cluster is empty. We can say that there are two files up until the cluster 226. To speak more precisely, we have to take a look at all the entries.
If the OS encounters any bad sector in a cluster while doing IO on it, OS needs to mark that cluster containing bad sector with 0xFF7, as bad block and that cluster is not used anymore. Because the first cluster is special, the value 0x001 is reserved. No other entry can point to first cluster. The values from 0xFF0 to 0xFF6 are also reserved. I also explained that the remaining values are pointing to the consecutive cluster. We will find which file exists in the second cluster by looking at the directory table.
The zeroth entry contains the value called FAT ID. This is actually the media descriptor byte in the boot sector adjusted to FAT bit length. I wrote that the media descriptor byte mostly 0xF0 for floppies or 0xF8 for disks. So it is 0xFF0 for FAT12 floppies, 0xFF8 for FAT12 disks and 0xFFF8 for FAT16 disks. And the first entry always contains an EOF mark.
FAT16
There is no difference in FAT16 except that the entries are 16 bits. While theoretical limit for the number of clusters is 65 536, it supports up to 65 524 clusters because some values are reserved.
As an example of FAT16, I will analyze the disk of FreeDOS VM. Since there are many files as well as directories on this disk, its FAT contains more EOFs than the previous example. In VirtualBox, disks are in .vdi format however the format specifications are beyond the purpose of this post. The only thing to know is that in .vdi files, disk data starts at offset 0x200000 (actually there is a pointer to the disk data at the offset 0x158, this 0x200000 value is hardcoded so the data starts always at that offset). So, I will open the .vdi file with hexdump and then scroll down until the start offset:
Alternatively:
This "skip" value above should be 4096 to skip directly to MBR, 4159 to boot sector and 4160 to FAT. Since reserved sectors value is 1, FAT comes right after the boot sector. Note that I did not use -v parameter with hexdump, since there are many zero blocks in the .vdi file.
00208000 f8 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00208010 ff ff 0a 00 ff ff 0c 00 0d 00 ff ff 95 0a ff ff |................|
[ SNIP ]
00208030 ff ff ff ff ff ff 1c 00 ff ff ff ff 2a 00 ff ff |............*...|
[ SNIP ]
0020b220 ff ff 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 |................|
0020b230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0020b240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 19 |..............(.|
0020b250 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
[ SNIP ]
It is easier to read as entries are 16 bits. Its code is also quite simple:
My code is also capable to open .vdi files. So, I can easily read FreeDOS.vdi file. As I analysed the FAT snippet above;
Cluster0: 0xFFF8 (0x0000)
Cluster1: 0xFFFF (0x0002)
Cluster2: 0xFFFF (0x0004)
...
Cluster9: 0x000A (0x0012)
Cluster10: 0xFFFF (0x0014)
Cluster11: 0x000C (0x0016)
Cluster12: 0x000D (0x0018)
Cluster13: 0xFFFF (0x001A)
Cluster14: 0x0A95 (0x001C)
Cluster15: 0xFFFF (0x001E)
...
Cluster30: 0x002A (0x003C)
...
Cluster2709: 0xFFFF (0x152A)
...
Cluster6416: 0xFFFF (0x3220)
Cluster6417: 0x0000 (0x3222)
Cluster6418: 0xFFFF (0x3224)
...
Cluster6439: 0x1928 (0x324E)
Cluster6440: 0xFFFF (0x3250)
...
The zeroth entry is FAT ID again like in FAT12 or media identifier byte but of course in 16 bits. Since this is a hard disk, its value is 0xFFF8. First entry also contains EOF like in FAT12. EOF values are 0xFFF8 to 0xFFFF, bad clusters are marked with 0xFFF7, and reserved values are 0xFFF0 to 0xFFF6. All these values are same as the values with FAT12, except prefixed with 0b1111.
In this example, cluster11 points to 12, cluster12 points to 13 and cluster13 contains EOF. Interestingly, cluster14 points to 2709. So, a piece of a file is in cluster14 and next piece of that file is in cluster2709. When I check 2709th entry, there is EOF there. Similarly, entry 30 points to 42. The rest cannot be seen in the above example however the sequence is as follows: 30->42->51->74->210->245->EOF. A file ends in cluster 6416, albeit the next cluster is empty, the cluster after this also contains EOF. Then, following nineteen clusters are also empty, but clusters 6439 and 6440 contains two successive clusters of a file. So, what is the cause of this chaos?
Fragmentation
Fragmentation means disintegration, going into pieces. There are two types of fragmentation example in the listing above. First one is caused by the clusters of any file not being sequential. An example for it can be seen in cluster 14 and a more prominent example in cluster 30. Since the clusters are not consecutive, various sectors in various cylinders must be read in mixed order, in order to read these files linearly. This causes longer disk access times, especially for rotating disks. Two significant components, which contribute to this latency, are (1) the time spent for moving the heads onto the correct cylinder i.e. seek time and (2) the time spent by the plate rotation until the sector to be read (or write) is located under the head, i.e. rotational latency.
Second type of fragmentation is caused by non-successive empty clusters. For example, clusters 6417 and 6419 are empty but their previous and next clusters contain data. This is usually caused by deleting either relatively smaller files or larger but fragmented files. When creating a file, DOS starts writing that file to the first cluster of an empty cluster block (next free cluster after the last occupied cluster for example), not to the first available empty cluster, in order to avoid first type fragmentation.
Directories and Directory Table
So far, I have explained how successive clusters are tracked on the disk. But it doesn't make any sense to track successive clusters without knowing which file starts at which cluster or which file does the cluster chain belongs to. At this point, directory tables come to help. In the previous article, three different values were calculated when booting to FreeDOS: fat_start, root_dir_start ve data_start. Then, the code had loaded KERNEL.SYS into the memory, by finding on which cluster it starts on the disk from the record in root_dir_start area. This is the beginning of the root directory.
Let's calculate again the beginning of the root directory with the help of the screenshot above. Since there is only one reserved sector (field 1 in the image above), sector next to the boot sector is the beginning of FAT. There are two FATs (field 2) and each is 171 sectors in size (field 3), FAT is 342 sectors in total. Considering that I am looking at the absolute sector 63 (field 4), right now, the root directory starts at the sector 406. I press Alt+P in the disk editor and open that sector:
Alternatively, I already found that I need to skip first 4159 blocks with dd in order seek the boot sector directly. So, I use the following command in linux without even calculating:
This sector contains a (root) directory table and this table contains the list of files in the root directory, if minor differences are ignored. Each entry is 32 bytes in size and their format is:
Filename consists of 8 characters, all caps. If the file name is shorter that 8 characters, space (0x20) characters are added to the left to fill this space. The same holds for extension as well. File attributes byte consists of following flag bits:
Read-only bit can be clearly understood from its name. Hidden files are files not listed with dir command. System files are files that disk tools should not touch while OS is running. Volume label is a special kind of file that overrides the volume label in the boot sector. Directory bit can also be understood clearly from its name. Archive bit, when a file is modified, is set by the program and backup software then backup those files with this bit set and resets this bit. Device driver files are not found on the disk but this bit is only set in the device driver file entries in memory.
The fields at offsets 0x0E and 0x10 are reserved in MS-DOS6 and earlier versions. FreeDOS and DOS7 (Win95), on the other hand, keep here the file creation time and date, respectively. In DOS6, these fields are zero but for example if I boot DOS6 VM with DSL and create a file, it is not zero because the FAT driver in DSL supports the usage of this field and writes the creation date to this field but DOS6 ignores this. Similar thing is with the field at offset 0x14. With the files, I created in DSL, this field is not zero, but there is no documentation on this. It can be a copy of file modify time.
Modification time at offset 0x16 is common to all FATs and it is kept as little endian. Its format is as follows (H is hours, M is minutes, S is seconds and the time resolution is 2 seconds because seconds are represented with only five bits):
Likewise, modification date at offset 0x18 is also common to all FATs and kept in little endian. The year value starts with 1980. (Y: Year, M: Month, D: Day):
Cluster number at offset 0x1A, points to the first cluster of the file. After finding the first cluster of the file, we already know how to track the rest on the disk. Finally, the file size is the size of file in bytes, obviously.
If the first character of the filename is 0x0, it means the end of the directory table, so if OS encounters an entry with zero in its first character, it stops reading the rest of the table. If a file is deleted, DOS replaces the first character of the entry with either 0x5 or 0xE5 and zeroes the cluster entries in FAT. In other words, entries starting with 0x5 or 0xE5 belong to deleted files. If the former clusters of a deleted file are not overwritten and if the deleted file is not fragmented, the file can be undeleted by replacing its first character with the correct one. I have an example for a deleted file in the next section.
Let's take a deeper look at the directory table in the screenshot above:
00000000 46 52 45 45 44 4f 53 32 30 31 36 08 00 00 00 00 |FREEDOS2016.....|
00000010 00 00 00 00 00 00 e2 b8 0b 4f 00 00 00 00 00 00 |.........O......|
[ SNIP ]
00000040 46 44 4f 53 20 20 20 20 20 20 20 10 00 00 e8 b8 |FDOS .....|
00000050 0b 4f 00 00 00 00 e8 b8 0b 4f 03 00 00 00 00 00 |.O.......O......|
00000060 4b 45 52 4e 45 4c 20 20 53 59 53 20 00 00 f3 b8 |KERNEL SYS ....|
00000070 0b 4f 00 00 00 00 40 ad ab 48 1e 00 5d b6 00 00 |.O....@..H..]...|
[ SNIP ]
000000e0 41 64 00 69 00 73 00 6b 00 31 00 0f 00 22 00 00 |Ad.i.s.k.1..."..|
000000f0 ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff |................|
00000100 44 49 53 4b 31 20 20 20 20 20 20 10 00 00 73 1a |DISK1 ...s.|
00000110 0c 4f 0c 4f 00 00 73 1a 0c 4f 02 00 00 00 00 00 |.O.O..s..O......|
The first entry is FREEDOS2.016 and from its attributes, it is a volume label. So this is a special kind of entry (not a real file). Its cluster and its size is zero.
It can be seen that the second entry "FDOS", has its remaining four characters and its extension padded with spaces. It is a directory as the directory flag is set in its attribute byte. Its creation and modification date and time are the same. Its time is 0xB8E8 i.e. 10111 000111 01000 = 23:07:32 (seconds field is 16 but the time resolution was 2 seconds, therefore it will be multiplied by two). And its date is: 0x4F0B i.e. 0100111 1000 01011 = 1980+39/8/11. This directory starts at the third cluster and has zero as length. Let's check its date and time first:
As it can be seen on the figure titled "FreeDOS boot sector", there are 512 entries preallocated (field 5) in the root directory. Since each directory entry is 32 bytes and a sector is 512 bytes, the root directory table is 32 sectors long. I had previously calculated that the root directory is at sector 406, adding 32 sectors to it implies that the data area begins at sector 438 on this disk. Using formula [2], the second cluster is begins at sector 438 and the third cluster begins at sector 438+16=454. In Linux;
When I go to the sector 454 in disk editor (using Alt+P), the view in disk editor changes automatically. Disk editor detects that there is a directory here and displays it as directory (right figure) and in this view, disk editor automatically parses entries and shows the information like file size, date, cluster etc. like I manually calculated above. If I switch to Hex view with F2, I see that the first two directory entries are "." (the directory itself) and "..". Cluster number of "." entry is 3, like the directory itself and cluster number of ".." entry is zero because the parent (upper) directory is the root directory. If I generalize, "." entry which represents the directory itself, has the same cluster number where the directory is; and ".." entry which represents the parent directory, has the cluster number of parent directory. If the parent directory is the root directory, cluster number points to zero since root directory has no cluster number. When I was examining FAT16, there was EOF in third clusters entry. This means that the list of files and subdirectories of this directory, i.e. the directory table spans only this cluster (no continuation). Directory table can grow and span multiple clusters, when the it contains more entries. But no matter how large it is, the file size field in its directory entry is always zero. There is an important sentence in the Design of the FAT file system article in Wikipedia: "A directory table is a special type of file that represents a directory". Directory tables, like files, are kept in clusters and have FAT entries except the root directory. Both in FAT12 and FAT16, the root directory doesn't have a FAT entry. To be able to talk about the file system hierarchy, a root directory must be located first. Therefore it is guaranteed by design, that the root directory cannot fragment, cannot be moved and it's always at the beginning of data area, which makes finding root directory easier.
If the next KERNEL.SYS entry is examined, it can be seen that its creation date is same as FDOS entry except the difference in seconds field by 3, which means it is created after about 5-6 seconds after FDOS directory. Its modification date, 11.05.2016, must be the modification date of the source file on installation CD. The file starts at cluster 0x1E (sector 886) and is 0xB65D=46 685 bytes long.
After this entry, there is a hidden partition label with zeros and lowercase letters in its name. Its cluster number is zero and size is 4 GB. disk1 can be read in filename field and has the same file name as the next valid entry, but this entry does not conform to the directory entry standard.
Long File Name (LFN) Support
As it is known, filenames in DOS are limited to 8+3 characters (capitals, numbers and some punctuation), but since Win95, long file names could be used with FAT. This is thanks to an extension called VFAT and this extension can be used with all FAT versions as long as the FS driver of OS supports it.
The trick of VFAT is keeping LFN string within invalid entries, which exist before the actual entry. Although "DISK1" is not longer than eight chars, in the example above, it has a VFAT entry because it has lowercase letters in its name. While old version disk editor does not recognize these LFN entries, its new version recognizes them as LFN as file type (upper right figure).
To create a better example, I booted the VM with DSL, inserted FreeDOS floppy and run following commands in terminal:
Note: When -t vfat parameter is not given, the floppy will be mounted without LFN support. I typed a few characters into the file using vi, saved it with :wq and quit. Same file could have been created like this, too:
While saving the file, vi creates a temporary .swp file and deletes it after saving. At this point, when I look at the image file using hexdump -C freedos.ima | less command, I see there are (deleted) file entries with character 0xE5 at the beginning of each, between the offsets 0x2640 and 0x2700. This means that 0xE5 character is written to the first character of all entries, when LFN files are deleted. Among these entries, the LONGFI~1.TXT entry at the offset 0x2780 plays the primary role, actually. People can remember, that many files in Win95 would look like this under DOS, because LFN entries were ignored by it. I will inverse the order of the entries to make them easier to read (their actual order can be identified by the offset numbers at the line beginning):
00002780 4c 4f 4e 47 46 49 7e 31 54 58 54 20 00 00 25 87 |LONGFI~1TXT ..%.|
00002790 26 52 26 52 00 00 25 87 26 52 f9 00 29 00 00 00 |&R&R..%.&R..)...|
00002760 01 6c 00 6f 00 6e 00 67 00 20 00 0f 00 d4 66 00 |.l.o.n.g. ....f.|
00002770 69 00 6c 00 65 00 20 00 6e 00 00 00 61 00 6d 00 |i.l.e. .n...a.m.|
00002740 02 65 00 20 00 28 00 4c 00 46 00 0f 00 d4 4e 00 |.e. .(.L.F....N.|
00002750 29 00 20 00 73 00 75 00 70 00 00 00 70 00 6f 00 |). .s.u.p...p.o.|
00002720 03 72 00 74 00 20 00 6f 00 6e 00 0f 00 d4 20 00 |.r.t. .o.n.... .|
00002730 46 00 41 00 54 00 20 00 66 00 00 00 69 00 6c 00 |F.A.T. .f...i.l.|
00002700 44 65 00 20 00 73 00 79 00 73 00 0f 00 d4 74 00 |De. .s.y.s....t.|
00002710 65 00 6d 00 2e 00 74 00 78 00 00 00 74 00 00 00 |e.m...t.x...t...|
The only valid entry is the original one at 0x2780. Values such as file size and cluster number are in this entry. The first character of LFN entries is a sequence number and it increases as the offset gets lower. In the last entry of the sequence, 6th bit of the first character must be set e.g. 0x44 at fourth and the last entry at offset 0x2700. After the sequence number, there are five 16-bit long characters in UCS charset. Thus, it's also possible to use Chinese or Cyrillic characters in file names. The 11th byte of the entry, which is normally file attributes byte, is always 0x0F with LFN entries. This means the file has hidden+system file+read only+volume label attributes at the same time. Having more than one volume label in disk is undefined, additionally having the other attributes set in the entry causes the entries to be ignored by the OS.
12th byte of LFN entries are always zero and they have a checksum at the 13th byte. The checksum algorithm can be find in Wikipedia under VFAT long file names title. 12 bytes after this checksum, between 14th and 25th bytes, contain 6 more characters. The cluster number field is always zero in LFN entries and the remaining 4 bytes contain 2 more characters. This makes 13 characters per entry in total. LFNs are null terminated strings and if characters in the last entry are less than 13 bytes, remaining fields are padded with 0xFFs. It cannot be observed in the above example because the last entry is fully used but in the DISK1 example above, null string terminator is at the 14th byte and afterwards 0xFFs are clearly visible.
With this article, I have completed the main topics of the disk system series. Since FAT32 is a huge topic by itself, it would be better to discuss it in a separate article. The logic behind the root directory is completely different with FAT32 and directory entries as well as LFN records are slightly different with FAT32, too. Other than that, I am going to publish additional articles on topics such as GPT and NTFS in future.
[1]: https://en.wikipedia.org/wiki/File_Allocation_Table
Lower four bits of the first entry are in the higher nibble of the first byte, i.e. in the half of first byte that doesn't belong to previous entry and its rest is in the next byte. This also can be generalized to all odd-numbered entries. Coding this is easier than explaining:
for(int k = 0; k < clusters; k++) {
cur_cluster = (k >> 1) + k; // multiply by 1.5
if(k & 1) // shift by 4 for odd numbered clusters
fat_entry = *((unsigned short *) &FAT[cur_cluster]) >> 4;
else // take last 12 bits for even numbered clusters
fat_entry = *((unsigned short *) &FAT[cur_cluster]) & 0x0FFF;
printf("%d -> %d [%X] \n", k, fat_entry, fat_entry);
}
cur_cluster = (k >> 1) + k; // multiply by 1.5
if(k & 1) // shift by 4 for odd numbered clusters
fat_entry = *((unsigned short *) &FAT[cur_cluster]) >> 4;
else // take last 12 bits for even numbered clusters
fat_entry = *((unsigned short *) &FAT[cur_cluster]) & 0x0FFF;
printf("%d -> %d [%X] \n", k, fat_entry, fat_entry);
}
The whole code is in my github.
I made following listing using the output produced by my code, from the FAT sample above (offsets are in parentheses):
Cluster0: 0xFF0 (0x0000)
Cluster1: 0xFFF (0x0001)
Cluster2: 0x003 (0x0003)
Cluster3: 0x004 (0x0004)
Cluster4: 0x005 (0x0006)
...
Cluster90: 0x05B (0x0287)
Cluster91: 0x05C (0x0288)
Cluster92: 0x05D (0x028A)
Cluster93: 0xFFF (0x028B)
Cluster94: 0x05F (0x028D)
...
Cluster222: 0x0DF (0x034D)
Cluster223: 0x0E0 (0x034E)
Cluster224: 0xFFF (0x0350)
Cluster225: 0x0 (0x0351)
Cluster226: 0x0 (0x0353)
...
The first two entries have special meaning, I will mention these later. They are not important for now since the data area is starting from second cluster. The value of the second entry is 3, which means the following cluster is third. The value of the third entry is 4, so the following cluster is fourth and so on. This chain continues up to the 93rd cluster and the 93rd entry has the value 0xFFF. This means end of chain or end of file, indicating that cluster is the final cluster of the file. Since this value also indicates end of directories, the term EOC can be said more correct but the term EOF is also correct if directories are considered as a special type of file that contains other files. I will go over this in "Directories" chapter. All values between 0xFF8 and 0xFFF are EOF. The sample FAT block contains zeroes starting from cluster 225 to its end. Zero means that the cluster is empty. We can say that there are two files up until the cluster 226. To speak more precisely, we have to take a look at all the entries.
If the OS encounters any bad sector in a cluster while doing IO on it, OS needs to mark that cluster containing bad sector with 0xFF7, as bad block and that cluster is not used anymore. Because the first cluster is special, the value 0x001 is reserved. No other entry can point to first cluster. The values from 0xFF0 to 0xFF6 are also reserved. I also explained that the remaining values are pointing to the consecutive cluster. We will find which file exists in the second cluster by looking at the directory table.
The zeroth entry contains the value called FAT ID. This is actually the media descriptor byte in the boot sector adjusted to FAT bit length. I wrote that the media descriptor byte mostly 0xF0 for floppies or 0xF8 for disks. So it is 0xFF0 for FAT12 floppies, 0xFF8 for FAT12 disks and 0xFFF8 for FAT16 disks. And the first entry always contains an EOF mark.
FAT16
There is no difference in FAT16 except that the entries are 16 bits. While theoretical limit for the number of clusters is 65 536, it supports up to 65 524 clusters because some values are reserved.
As an example of FAT16, I will analyze the disk of FreeDOS VM. Since there are many files as well as directories on this disk, its FAT contains more EOFs than the previous example. In VirtualBox, disks are in .vdi format however the format specifications are beyond the purpose of this post. The only thing to know is that in .vdi files, disk data starts at offset 0x200000 (actually there is a pointer to the disk data at the offset 0x158, this 0x200000 value is hardcoded so the data starts always at that offset). So, I will open the .vdi file with hexdump and then scroll down until the start offset:
hexdump -C ~/VirtualBox\ VMs/FreeDOS/FreeDOS.vdi | less
Alternatively:
dd if=~/VirtualBox\ VMs/FreeDOS/FreeDOS.vdi \
bs=512 skip=4096 | hexdump -C | less
bs=512 skip=4096 | hexdump -C | less
This "skip" value above should be 4096 to skip directly to MBR, 4159 to boot sector and 4160 to FAT. Since reserved sectors value is 1, FAT comes right after the boot sector. Note that I did not use -v parameter with hexdump, since there are many zero blocks in the .vdi file.
00208000 f8 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00208010 ff ff 0a 00 ff ff 0c 00 0d 00 ff ff 95 0a ff ff |................|
[ SNIP ]
00208030 ff ff ff ff ff ff 1c 00 ff ff ff ff 2a 00 ff ff |............*...|
[ SNIP ]
0020b220 ff ff 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 |................|
0020b230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0020b240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 19 |..............(.|
0020b250 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
[ SNIP ]
It is easier to read as entries are 16 bits. Its code is also quite simple:
for(int k = 0; k < clusters; k++) {
fat_entry = *(unsigned short *)&FAT[k << 1];
printf("%d -> %d [%X] \n", k, fat_entry, fat_entry);
}
fat_entry = *(unsigned short *)&FAT[k << 1];
printf("%d -> %d [%X] \n", k, fat_entry, fat_entry);
}
My code is also capable to open .vdi files. So, I can easily read FreeDOS.vdi file. As I analysed the FAT snippet above;
Cluster0: 0xFFF8 (0x0000)
Cluster1: 0xFFFF (0x0002)
Cluster2: 0xFFFF (0x0004)
...
Cluster9: 0x000A (0x0012)
Cluster10: 0xFFFF (0x0014)
Cluster11: 0x000C (0x0016)
Cluster12: 0x000D (0x0018)
Cluster13: 0xFFFF (0x001A)
Cluster14: 0x0A95 (0x001C)
Cluster15: 0xFFFF (0x001E)
...
Cluster30: 0x002A (0x003C)
...
Cluster2709: 0xFFFF (0x152A)
...
Cluster6416: 0xFFFF (0x3220)
Cluster6417: 0x0000 (0x3222)
Cluster6418: 0xFFFF (0x3224)
...
Cluster6439: 0x1928 (0x324E)
Cluster6440: 0xFFFF (0x3250)
...
The zeroth entry is FAT ID again like in FAT12 or media identifier byte but of course in 16 bits. Since this is a hard disk, its value is 0xFFF8. First entry also contains EOF like in FAT12. EOF values are 0xFFF8 to 0xFFFF, bad clusters are marked with 0xFFF7, and reserved values are 0xFFF0 to 0xFFF6. All these values are same as the values with FAT12, except prefixed with 0b1111.
In this example, cluster11 points to 12, cluster12 points to 13 and cluster13 contains EOF. Interestingly, cluster14 points to 2709. So, a piece of a file is in cluster14 and next piece of that file is in cluster2709. When I check 2709th entry, there is EOF there. Similarly, entry 30 points to 42. The rest cannot be seen in the above example however the sequence is as follows: 30->42->51->74->210->245->EOF. A file ends in cluster 6416, albeit the next cluster is empty, the cluster after this also contains EOF. Then, following nineteen clusters are also empty, but clusters 6439 and 6440 contains two successive clusters of a file. So, what is the cause of this chaos?
Fragmentation
Fragmentation means disintegration, going into pieces. There are two types of fragmentation example in the listing above. First one is caused by the clusters of any file not being sequential. An example for it can be seen in cluster 14 and a more prominent example in cluster 30. Since the clusters are not consecutive, various sectors in various cylinders must be read in mixed order, in order to read these files linearly. This causes longer disk access times, especially for rotating disks. Two significant components, which contribute to this latency, are (1) the time spent for moving the heads onto the correct cylinder i.e. seek time and (2) the time spent by the plate rotation until the sector to be read (or write) is located under the head, i.e. rotational latency.
Second type of fragmentation is caused by non-successive empty clusters. For example, clusters 6417 and 6419 are empty but their previous and next clusters contain data. This is usually caused by deleting either relatively smaller files or larger but fragmented files. When creating a file, DOS starts writing that file to the first cluster of an empty cluster block (next free cluster after the last occupied cluster for example), not to the first available empty cluster, in order to avoid first type fragmentation.
Directories and Directory Table
So far, I have explained how successive clusters are tracked on the disk. But it doesn't make any sense to track successive clusters without knowing which file starts at which cluster or which file does the cluster chain belongs to. At this point, directory tables come to help. In the previous article, three different values were calculated when booting to FreeDOS: fat_start, root_dir_start ve data_start. Then, the code had loaded KERNEL.SYS into the memory, by finding on which cluster it starts on the disk from the record in root_dir_start area. This is the beginning of the root directory.
FreeDOS boot sector |
Let's calculate again the beginning of the root directory with the help of the screenshot above. Since there is only one reserved sector (field 1 in the image above), sector next to the boot sector is the beginning of FAT. There are two FATs (field 2) and each is 171 sectors in size (field 3), FAT is 342 sectors in total. Considering that I am looking at the absolute sector 63 (field 4), right now, the root directory starts at the sector 406. I press Alt+P in the disk editor and open that sector:
FreeDOS root directory table |
Alternatively, I already found that I need to skip first 4159 blocks with dd in order seek the boot sector directly. So, I use the following command in linux without even calculating:
dd if=~/VirtualBox\ VMs/FreeDOS/FreeDOS.vdi bs=512 \
skip=$((4159+1+171*2)) | hexdump -C | less
skip=$((4159+1+171*2)) | hexdump -C | less
This sector contains a (root) directory table and this table contains the list of files in the root directory, if minor differences are ignored. Each entry is 32 bytes in size and their format is:
Offset | Size | Description |
---|---|---|
0x00 | 8 byte | File name |
0x08 | 3 byte | File extension |
0x0B | 1 byte | File attributes |
0x0C | 2 byte | Reserved |
0x0E | word | MSDOS: Reserved FreeDOS: Creation time |
0x10 | word | MSDOS: Reserved FreeDOS: Creation date |
0x12 | 2 byte | Reserved |
0x14 | 2 byte | Reserved* |
0x16 | word | Modify time |
0x18 | word | Modify date |
0x1A | word | Cluster number |
0x1C | dword | File size |
Filename consists of 8 characters, all caps. If the file name is shorter that 8 characters, space (0x20) characters are added to the left to fill this space. The same holds for extension as well. File attributes byte consists of following flag bits:
Bit | Description |
---|---|
0 | Read-only |
1 | Hidden |
2 | System file |
3 | Volume label |
4 | Directory |
5 | Archive |
6 | Device driver |
7 | Reserved |
Read-only bit can be clearly understood from its name. Hidden files are files not listed with dir command. System files are files that disk tools should not touch while OS is running. Volume label is a special kind of file that overrides the volume label in the boot sector. Directory bit can also be understood clearly from its name. Archive bit, when a file is modified, is set by the program and backup software then backup those files with this bit set and resets this bit. Device driver files are not found on the disk but this bit is only set in the device driver file entries in memory.
The fields at offsets 0x0E and 0x10 are reserved in MS-DOS6 and earlier versions. FreeDOS and DOS7 (Win95), on the other hand, keep here the file creation time and date, respectively. In DOS6, these fields are zero but for example if I boot DOS6 VM with DSL and create a file, it is not zero because the FAT driver in DSL supports the usage of this field and writes the creation date to this field but DOS6 ignores this. Similar thing is with the field at offset 0x14. With the files, I created in DSL, this field is not zero, but there is no documentation on this. It can be a copy of file modify time.
Modification time at offset 0x16 is common to all FATs and it is kept as little endian. Its format is as follows (H is hours, M is minutes, S is seconds and the time resolution is 2 seconds because seconds are represented with only five bits):
Offset 0x17 | Offset 0x16 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
H4 | H3 | H2 | H1 | H0 | M5 | M4 | M3 | M2 | M1 | M0 | S4 | S3 | S2 | S1 | S0 |
Likewise, modification date at offset 0x18 is also common to all FATs and kept in little endian. The year value starts with 1980. (Y: Year, M: Month, D: Day):
Offset 0x19 | Offset 0x18 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Y6 | Y5 | Y4 | Y3 | Y2 | Y1 | Y0 | M3 | M2 | M1 | M0 | D4 | D3 | D2 | D1 | D0 |
Cluster number at offset 0x1A, points to the first cluster of the file. After finding the first cluster of the file, we already know how to track the rest on the disk. Finally, the file size is the size of file in bytes, obviously.
If the first character of the filename is 0x0, it means the end of the directory table, so if OS encounters an entry with zero in its first character, it stops reading the rest of the table. If a file is deleted, DOS replaces the first character of the entry with either 0x5 or 0xE5 and zeroes the cluster entries in FAT. In other words, entries starting with 0x5 or 0xE5 belong to deleted files. If the former clusters of a deleted file are not overwritten and if the deleted file is not fragmented, the file can be undeleted by replacing its first character with the correct one. I have an example for a deleted file in the next section.
Let's take a deeper look at the directory table in the screenshot above:
00000000 46 52 45 45 44 4f 53 32 30 31 36 08 00 00 00 00 |FREEDOS2016.....|
00000010 00 00 00 00 00 00 e2 b8 0b 4f 00 00 00 00 00 00 |.........O......|
[ SNIP ]
00000040 46 44 4f 53 20 20 20 20 20 20 20 10 00 00 e8 b8 |FDOS .....|
00000050 0b 4f 00 00 00 00 e8 b8 0b 4f 03 00 00 00 00 00 |.O.......O......|
00000060 4b 45 52 4e 45 4c 20 20 53 59 53 20 00 00 f3 b8 |KERNEL SYS ....|
00000070 0b 4f 00 00 00 00 40 ad ab 48 1e 00 5d b6 00 00 |.O....@..H..]...|
[ SNIP ]
000000e0 41 64 00 69 00 73 00 6b 00 31 00 0f 00 22 00 00 |Ad.i.s.k.1..."..|
000000f0 ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff |................|
00000100 44 49 53 4b 31 20 20 20 20 20 20 10 00 00 73 1a |DISK1 ...s.|
00000110 0c 4f 0c 4f 00 00 73 1a 0c 4f 02 00 00 00 00 00 |.O.O..s..O......|
The first entry is FREEDOS2.016 and from its attributes, it is a volume label. So this is a special kind of entry (not a real file). Its cluster and its size is zero.
It can be seen that the second entry "FDOS", has its remaining four characters and its extension padded with spaces. It is a directory as the directory flag is set in its attribute byte. Its creation and modification date and time are the same. Its time is 0xB8E8 i.e. 10111 000111 01000 = 23:07:32 (seconds field is 16 but the time resolution was 2 seconds, therefore it will be multiplied by two). And its date is: 0x4F0B i.e. 0100111 1000 01011 = 1980+39/8/11. This directory starts at the third cluster and has zero as length. Let's check its date and time first:
As it can be seen on the figure titled "FreeDOS boot sector", there are 512 entries preallocated (field 5) in the root directory. Since each directory entry is 32 bytes and a sector is 512 bytes, the root directory table is 32 sectors long. I had previously calculated that the root directory is at sector 406, adding 32 sectors to it implies that the data area begins at sector 438 on this disk. Using formula [2], the second cluster is begins at sector 438 and the third cluster begins at sector 438+16=454. In Linux;
dd if=~/VirtualBox\ VMs/FreeDOS/FreeDOS.vdi bs=512 \
skip=$((4160+171*2+32+16)) | hexdump -C | less
skip=$((4160+171*2+32+16)) | hexdump -C | less
When I go to the sector 454 in disk editor (using Alt+P), the view in disk editor changes automatically. Disk editor detects that there is a directory here and displays it as directory (right figure) and in this view, disk editor automatically parses entries and shows the information like file size, date, cluster etc. like I manually calculated above. If I switch to Hex view with F2, I see that the first two directory entries are "." (the directory itself) and "..". Cluster number of "." entry is 3, like the directory itself and cluster number of ".." entry is zero because the parent (upper) directory is the root directory. If I generalize, "." entry which represents the directory itself, has the same cluster number where the directory is; and ".." entry which represents the parent directory, has the cluster number of parent directory. If the parent directory is the root directory, cluster number points to zero since root directory has no cluster number. When I was examining FAT16, there was EOF in third clusters entry. This means that the list of files and subdirectories of this directory, i.e. the directory table spans only this cluster (no continuation). Directory table can grow and span multiple clusters, when the it contains more entries. But no matter how large it is, the file size field in its directory entry is always zero. There is an important sentence in the Design of the FAT file system article in Wikipedia: "A directory table is a special type of file that represents a directory". Directory tables, like files, are kept in clusters and have FAT entries except the root directory. Both in FAT12 and FAT16, the root directory doesn't have a FAT entry. To be able to talk about the file system hierarchy, a root directory must be located first. Therefore it is guaranteed by design, that the root directory cannot fragment, cannot be moved and it's always at the beginning of data area, which makes finding root directory easier.
If the next KERNEL.SYS entry is examined, it can be seen that its creation date is same as FDOS entry except the difference in seconds field by 3, which means it is created after about 5-6 seconds after FDOS directory. Its modification date, 11.05.2016, must be the modification date of the source file on installation CD. The file starts at cluster 0x1E (sector 886) and is 0xB65D=46 685 bytes long.
After this entry, there is a hidden partition label with zeros and lowercase letters in its name. Its cluster number is zero and size is 4 GB. disk1 can be read in filename field and has the same file name as the next valid entry, but this entry does not conform to the directory entry standard.
Long File Name (LFN) Support
As it is known, filenames in DOS are limited to 8+3 characters (capitals, numbers and some punctuation), but since Win95, long file names could be used with FAT. This is thanks to an extension called VFAT and this extension can be used with all FAT versions as long as the FS driver of OS supports it.
Upper part from new version lower part from old version |
To create a better example, I booted the VM with DSL, inserted FreeDOS floppy and run following commands in terminal:
Note: When -t vfat parameter is not given, the floppy will be mounted without LFN support. I typed a few characters into the file using vi, saved it with :wq and quit. Same file could have been created like this, too:
echo test \
> "long file name (LFN) support on FAT file system.txt"
> "long file name (LFN) support on FAT file system.txt"
While saving the file, vi creates a temporary .swp file and deletes it after saving. At this point, when I look at the image file using hexdump -C freedos.ima | less command, I see there are (deleted) file entries with character 0xE5 at the beginning of each, between the offsets 0x2640 and 0x2700. This means that 0xE5 character is written to the first character of all entries, when LFN files are deleted. Among these entries, the LONGFI~1.TXT entry at the offset 0x2780 plays the primary role, actually. People can remember, that many files in Win95 would look like this under DOS, because LFN entries were ignored by it. I will inverse the order of the entries to make them easier to read (their actual order can be identified by the offset numbers at the line beginning):
00002780 4c 4f 4e 47 46 49 7e 31 54 58 54 20 00 00 25 87 |LONGFI~1TXT ..%.|
00002790 26 52 26 52 00 00 25 87 26 52 f9 00 29 00 00 00 |&R&R..%.&R..)...|
00002760 01 6c 00 6f 00 6e 00 67 00 20 00 0f 00 d4 66 00 |.l.o.n.g. ....f.|
00002770 69 00 6c 00 65 00 20 00 6e 00 00 00 61 00 6d 00 |i.l.e. .n...a.m.|
00002740 02 65 00 20 00 28 00 4c 00 46 00 0f 00 d4 4e 00 |.e. .(.L.F....N.|
00002750 29 00 20 00 73 00 75 00 70 00 00 00 70 00 6f 00 |). .s.u.p...p.o.|
00002720 03 72 00 74 00 20 00 6f 00 6e 00 0f 00 d4 20 00 |.r.t. .o.n.... .|
00002730 46 00 41 00 54 00 20 00 66 00 00 00 69 00 6c 00 |F.A.T. .f...i.l.|
00002700 44 65 00 20 00 73 00 79 00 73 00 0f 00 d4 74 00 |De. .s.y.s....t.|
00002710 65 00 6d 00 2e 00 74 00 78 00 00 00 74 00 00 00 |e.m...t.x...t...|
The only valid entry is the original one at 0x2780. Values such as file size and cluster number are in this entry. The first character of LFN entries is a sequence number and it increases as the offset gets lower. In the last entry of the sequence, 6th bit of the first character must be set e.g. 0x44 at fourth and the last entry at offset 0x2700. After the sequence number, there are five 16-bit long characters in UCS charset. Thus, it's also possible to use Chinese or Cyrillic characters in file names. The 11th byte of the entry, which is normally file attributes byte, is always 0x0F with LFN entries. This means the file has hidden+system file+read only+volume label attributes at the same time. Having more than one volume label in disk is undefined, additionally having the other attributes set in the entry causes the entries to be ignored by the OS.
12th byte of LFN entries are always zero and they have a checksum at the 13th byte. The checksum algorithm can be find in Wikipedia under VFAT long file names title. 12 bytes after this checksum, between 14th and 25th bytes, contain 6 more characters. The cluster number field is always zero in LFN entries and the remaining 4 bytes contain 2 more characters. This makes 13 characters per entry in total. LFNs are null terminated strings and if characters in the last entry are less than 13 bytes, remaining fields are padded with 0xFFs. It cannot be observed in the above example because the last entry is fully used but in the DISK1 example above, null string terminator is at the 14th byte and afterwards 0xFFs are clearly visible.
With this article, I have completed the main topics of the disk system series. Since FAT32 is a huge topic by itself, it would be better to discuss it in a separate article. The logic behind the root directory is completely different with FAT32 and directory entries as well as LFN records are slightly different with FAT32, too. Other than that, I am going to publish additional articles on topics such as GPT and NTFS in future.
[1]: https://en.wikipedia.org/wiki/File_Allocation_Table