Monday, May 6, 2019

What is the Disk System and What isn't? #1: Hard Disks


Hi there. I was thinking about writing file systems especially about FAT since a while but it does not make any sense to write about FAT without mentioning boot sector and to write about boot sector without mentioning MBR and more importantly the hardware. Therefore I decided to deal with this topic from the very beginning. Speaking of the beginning, I will also mention here where BIOS ends at the boot process and where MBR starts but I will not dig BIOS much deeper (maybe in another article).


Addressing the Hard Disk: CHS, LBA and Int 13h
Cylinders, Heads and
Sectors on a Harddisk
CHS is an acronym for Cylinder, Head and Sector. It was standardized with very old disks (from the 1970s) and although the underlying disk structure has changed dramatically, this CHS addressing continued to be used for about 30 years more due to backward compatibility. The internal structure of a rotating disk is seen on the next figure. This hard disk consists of four platters and of eights heads for access. Each platter has concentric data rings called cylinders. Cylinders are also called "tracks" but I'll use the term "cylinder" throughout the article. Finally, when each cylinder is cut into slices at a certain angle (like the red triangle in the figure), the areas where the slices cut the cylinder, are called sectors. Each sector has 512 bytes. Nowadays, 4K sectors are used but the disk controller reports 512 bytes to OS due to backward compatibility. Let's assume, we are in the 1980s for now (I wish). We need three numbers for CHS addressing. Important note: While cylinder and head numbers start from zero, sector numbers always start from one.

I find the image below more descriptive:

https://superuser.com/questions/974581/chs-to-lba-mapping-disk-storage

Cluster is a structure consisting of one or more physical sectors. It is the smallest element of logical file system. I will not mention it in this article because I am defining hardware elements yet and will mention logical elements later. In some sources, the term "cluster" and "sector" used interchangeably but it is absolutely wrong. A cluster can be equal to a sector (in some special cases e.g. floppies) but it doesn't have to be so always.  

5 MB IBM disk being
carried by forklift.
Source: thenextweb.com
CHS logic is simple and useless. Careful readers can notice that outer sectors are larger than the inner ones. CHS causes inefficient usage of outer sectors in terms of data density. In the 1970s, disks of a few tens of MBs with a price of a few ten thousands of dollars had this structure. These disks are called CAV (constant angular velocity). Since the angular velocity is constant, the disk head passes over each sector in a constant time. Having this said, I must also mention that CDs and DVDs are CLA disks (constant linear velocity). When a CD was being read, it would spin first slowly on the inner tracks and it was getting faster on the external tracks near the end of recording. In the 1990s (actually near the end of 80s), zone bit recording technology is developed for the hard disks. With ZBR, large outer cylinders are divided into more sectors while relatively smaller inner cylinders had fewer sectors. For backward compatibility, the controller was receiving CHS addresses from OS and it converts this CHS address into real physical sector in the background using a formula.

By the way, let's come back to the present day for a small moment with a quick flashback. In SSDs, CHS as well as CAV/ZBR are completely irrelevant. Therefore, it can be easily understood that all of these technologies are obsolete now.

Another addressing method is LBA (Logical Block Addressing). In LBA, disk is addressed using only a linear sector number (only one single number). For example, a 1.44MB high density (HD) floppy has 18 cylinders, 2 heads and 80 sectors (or 9 cylinders on double density (DD) floppies). The CHS address of the last sector is (17, 1, 80). Its LBA address is 2879 (18 * 2 * 80 = 2880 (total sectors), since the first LBA sector is zero, last one is 2879).
 
Let c be the cylinder number, h head number and s be the sector number. Nhead is the number of the heads of the disk and let NSPC be the number of sectors in a cylinder. The formula to translate CHS address to LBA is following:
 
LBA(c, h, s) = (c * Nhead + h) * NSPC + (s - 1)    [ 1 ]

One advantage of LBA is that it allows disks to be adressed easily even with a complex sector layout like in the following image. But that's not the only advantage.

Source: https://venam.nixers.net/blog/unix/2017/11/05/unix-filesystem.html

Int 13h is an interrupt service routine (ISR) provided by BIOS for disk access. This ISR was used in the 80s but it started to fall into disfavor because of constantly growing disk sizes. It has many functions but I will focus on 02 and 03. Function 02 is used for reading from disk to memory and function 03 is used for writing to disk. Before calling this interrupt, input values must be as follows:

AH = 02h/03h (read/write)
AL = Number of sectors to read/write ( >0 )
CH = Low 8-bit of cylinder number
CL = Sector number 1-63 (bit 0 .. 5) + High 2-bit of cylinder number (bit 6 .. 7)
DH = Head number
DL = Drive number: 7. bit is set for hard disks.
Example: First floppy drive: 00h, second floppy drive: 01h
First hard drive: 80h, second hard drive: 81h
Birinci sabit disk: 80h, ikinci sabit disk: 81h
ES:BX: Pointer to data to be read/written

 
I created an example for the usage of Int 13h. Even though Windows XP contains debug.exe, it does not allow such a low-level operation. I installed DOS 6.22 in a virtual machine (which I downloaded from my universitys server). I will explain how it's set up in the next article. For now, I just added screenshots:

Reading the disk using debug.exe

In the above example, I read the first sector of the hard disk. I will also explain the output in the upcoming article. There are two screenshots above: one for the beginning and one for the end of the sector.

If I did not want to use Int 13h, I could also do this operation by accessing the IDE controller directly using hardware ports. I will give an example for this a few paragraphs later.

Let's make a calculation. There are 3 bytes reserved for CHS addressing in total. These are registers CH, CL and DH. In terms of bits: 10 + 6 + 8 = 24 bits. Since a sectors capacity is 512 bytes, 512 byte * 224 = 8 GB is addressable using CHS scheme. Moreover, since the sector numbers starts with 1 instead of zero, the capacity is actually slightly less than 8 GB. 218 * 63 * 512 byte = 7.875 GB. This means CHS is insufficient to address drives larger than 8GB. LBA scheme has also an advantage in addressing over CHS, but before digging deep into that, we need to take a look at ATA standards first.

 
(Parallel) ATA Standard
In the previous section, I tried to give an overview to disk access from BIOS (Int 13h) side. From that side, the things might look bit complicated. As if this were not that complicated, the things from hardware side is complicated, too. In 1986, when Western Digital had announced the first IDE / ATA standard, 22 bits were reserved to address hard disk: 10 bits for cylinders, 4 bits for heads and 8 bits for sectors, if I am not mistaken. In 1994, in new EIDE / ATA-2 standard, bits for cylinders became 16 bits (from 10), making total number of address bits 28 bit (from 22). Moreover, 28 bit LBA addresses were also supported besides CHS. However, IBM had already designed BIOS interrupts and other standards (like MBR) and according to these standards, 10, 8 and 6 bits were reserved for cylinders, heads and sectors respectively. If we take the smallest common number of bits of the three standards into consideration, we see that CHS could support disks up to 512 MB at the largest, with 10 + 4 + 6 = 20 bits and 512 bytes per sector. In fact, even slightly less than that because sector numbers start at 1 instead of 0.*

Changing the standards (easily) was nearly impossible due to backward compatibility. Fortunately, the foundations of LBA were being laid around those years and a "nonsense" in IBM's standard provided a workaround for this problem: In their standards, IBM had allocated 8 bits to address disk heads without considering how to fit 128 platters (or 256 IO heads in other words) into a 1 inch high hard disk with 3.5 inch form factor. Because this was practically impossible, more heads (than actually exist in hard disk) were reported to OS, to keep the number of cylinders still addressable using 10 bits in hard disks larger than 512 MB. In other words, the product of physical number of cylinders and heads were equal to the product the number of cylinders and heads reported to OS. In the Int 13h code, these numbers were put in the formula [1] and converted to LBA if LBA is enabled in BIOS, of course.**

*: Detailed info on this paragraph:
https://en.wikipedia.org/wiki/Logical_block_addressing#Enhanced_BIOS
https://en.wikipedia.org/wiki/Parallel_ATA

**: Detailed info on this paragraph:
https://en.wikipedia.org/wiki/Logical_block_addressing#LBA-assisted_translation


Example:
Suppose that, a 2GB hard disk physically has 8 heads, 8320 cylinders and each cylinder has 63 sectors. When OS gets disk parameters using Int 13h, return value will be 128 heads, 520 cylinders and 63 sectors. The product has not changed. If same OS would have tried to access this disk directly via IO ports, it would not know how to write number of heads to the IO port, which is greater than 16.

Now assume that CHS 100, 17, 17 on this disk needs to be accessed. Substituting the values in the formula [1]:

LBA = (100 * 128 + 17) * 63 + (17 - 1) = 807 487. This sector is to be accessed. 

In 1990s, ATA-2 standard could support hard drives up to 128 GB using more bits in cylinder numbers, however IBM's existing programming interface could only support disks up to 8 GB with LBA conversion. By the way, IBM reserved 32 bits for LBA in MBR which I will mention in upcoming article. In the mid-1990s, IBM and Microsoft extended Int 13h capabilities with functions such as 42h and 43h. On the other hand, since LBA field is 32 bits in MBR, ATA standard also had room to progress. In 2003, with ATA-6 standard, physical addressing of hard disks became 48 bits wide and at the same time CHS addressing became history.


How BIOS Accesses to Disk? Disk Controller
Documentation about IDE/ATA disk controller ports can be found in ports.b file in the D section of the famous Ralf Brown Interrupt List. Motherboards had in the past two EIDE disk controllers. Two disks could be connected to each of these controllers as master or slave. The controllers' base addresses were 01F0h and 0170h respectively. The ports are as follows:


Port | IO | Description
-----+----+----------------------------------------------------
01F0 | RW | Data register
01F1 | R- | Error register++
01F1 | -W | (Write Precompensation Cylinder divided by 4)+
01F2 | RW | Sector count
01F3 | RW | Sector number (CHS mode)
     |    | 0-7. address bits (LBA mode)
01F4 | RW | Low byte of cylinder number (CHS mode)
     |    | 15-8. address bits (LBA mode)
01F5 | RW | High byte of cylinder number (CHS mode)
     |    | 23-16. address bits (LBA)
01F6 | RW | Drive and head number
     |    | 27-24. address bits (LBA)
01F7 | R- | Status register++
01F7 | -W | Command register++


+: This value is unused on newer disks and it can be zero. I will explain this later. ++: Please see Ralf Brown Interrupt list for the meaning of the bits in these registers.

01F6h is a little bit more complicated than it looks. Bit 5 and 7 must always be set due to backward compatibility (these bits were used with MFM disks whose sectors are not 512 byte in size). If the bit 6 is zero, the access to be made is in CHS mode. If one, it is in LBA mode. If the bit 4 is zero, the operation will be made on the master disk and if one, on the slave disk. The bits between 3 and 0 are address bits. The information so far is enough to write the code but we don't have setup a place to write the code yet. The reader can set up a DOS or FreeDOS VM and try it himself.

I wrote offsets at the beginning of each line, so that it is easy to read. And also wrote comments on the left. The code must be entered in debug.exe by giving a (Assemble) command:

0100    mov ax,0001
0103    mov dx,01F2
0106    out dx,al    ; Sector count = 1
0107    inc dx       ; dx = 01F3
0108    out dx,al    ; Sector number = 1
0109    inc dx       ; dx = 01F4
010A    dec ax       ; ax = 0000
010B    out dx,al    ; Cylinder Lo = 0
010C    inc dx       ; dx = 01F5
010D    out dx,al    ; Cylinder Hi = 0
010E    inc dx       ; dx = 01F6
010F    mov al,A0    ; CHS mode, master disk
0111    out dx,al
0112    inc dx       ; dx = 1F7
0113    mov al,20
0115    out dx,al    ; 20h ATA read sector command
0116    in al,dx     ; Read status register
0117    test al,58   ; 0101 1000: Drive ready | Seek complete | Buffer ready
0119    jz 0116      ; Wait until disk is completely read
011B    mov dx,01F0
011E    mov bx,0200
0121    in ax,dx     ; Read data (word sized)
0122    mov [bx],ax  ; Copy to the buffer
0124    inc bx
0125    inc bx
0126    cmp bx,0400  ; 0200h byte
012A    jnz 0121
012C    int 3        ; Breakpoint


To run the code, I give g (Go) command and when I check memory with d (Dump) command, I see that it is read without problem (same as previous int 13h output): 
 

This code can only run in debugger. After replacing "int 3" in last line with int 20, it can be saved as a file using following commands:

rbx
0000
rcx
002E
nreadmbr.com
w

Of course, it should be noted that the code does not generate any output to the screen. To load the file again to debug.exe, the filename is either given as a command line parameter to debug.exe or it is entered with n command and loaded with l command after that. I am leaving the explanation of LBA code to the next article.


More Theory
Write Precompensation Cylinder (WPC): In the beginning of the article, I mentioned that non-ZBR CHS disks have larger space on their outer sectors. Manufacturer sets a cylinder on these disks and the disk changes read/write encoding on the cylinders which are beyond this WPC. I really don't know whether BIOS reports this cylinder to programmer or the programmer finds it. I never had such an old disk to test.

In very old disks, heads were driven by a stepper motor. Since these motors are very sensible to heat, a technology called "voice coil motor" has been used. This video below will give an idea of how it works:

These motors were dragging the head above the platters at the applied current rate, and when the current was cut off, the heads were automatically moved to an empty area using springs.

Landing Zone (LZ): Before turning the computers with stepper motor disks off, disk heads had to be moved to a cylinder (a special cylinder which doesn't contain data) called LZ, in other words "parked". If a hard disk was moved without parking the heads, they could move on platters and scratch the disk. 19h subroutine of Int 13h does this parking. Parking is obsolete today, so the LZ is also. 
 
Voice coil motors were working robustly with heat but they had less spatial sensitivity (to position the heads accurately). And manufacturers have developed a solution for that. For example, they placed certain markings on one side of a three platter (6 heads) disk to increase position accuracy. This side was not used to store data. The top head was reading position markings, determining its accurate position according to these markings and correcting its position using feedback if necessary. Thus, interestingly, hard disks with odd numbered heads had appeared.

Source: https://www.brainbell.com/tutors/A+/Hardware/_Actuator_Arms.htm

Now I am done with the hardware part and in the next article I will write about MBR.