Saturday, December 27, 2025

Programming VGA: Smooth Scrolling in Text Mode #3: Modifying Base Address


Hi there. This article will be an unexpected follow-up to the previous smooth scroll articles. In the second article of the series, I introduced start address registers and mentioned, that the scroll effect could also be made using these registers and VGA pages, without copying memory blocks. I watched the following video while preparing previous article, and it inspired me to shortly demonstrate this approach as well.



First of all, the owner of this channel does excellent work in retro programming. I'd recommend anyone interested in retro programming, to follow this channel. I had mentioned, that my former approach is CPU intensive due to memory transfer, but in turn, it only uses as much VGA memory as visible screen area. In the video above, an ASCII art text is scrolling up and down at a speed tied to a sine function, and it uses start address registers to scroll the text.

I've implemented this approach in a less visual way. First, I'm not capable of creating such visually appealing work, and second, I had a ready-made code for this task, all I had to do, was modifying it just a bit. Some time ago, I had written a simple reader for a diskmag. I always liked justified text to both sides. This is clearly noticeable in my blog's page layout, I think. Diskmag had been published in text files, limited to 80 characters per line. I had developed a justification algorithm to read these texts on whole screen. The original reader had just some extra features like header and footer lines as well as some escape character codes.

This time, I put my code on github gist and embedded it at the end of this article (let's see. If it doesn't look good, next time I'll add just a link like before). The justify_to_80() function is used to span a row to eighty characters. To do this, the number of characters and spaces in a line are counted. The number of additional spaces needed is then calculated based on these values. If spaces needed exceed the number of existing spaces between words in the line (e.g., the line consists of 3-4 long words), then the each variable holds the number of spaces to be added next to each existing space between words (line 33). On the other hand, if the line consists of many short words (i.e. many already-existing spaces) and just a few spaces are needed to complete it to eighty, in this case extra variable holds the number of spaces that need to be added. Of course, both variables can be non-zero at the same time but statistically speaking, each is usually zero and extra is non-zero most of the time.

Inserting each times space characters is easy,  as they will be inserted to each existing space anyways (line 42). Distributing extra times spaces evenly to a line, is a bit more complicated. The weight variable holds how many spaces need to be added per space in the line. weight is a double type variable, because the number of missing characters cannot be divided by the number of spaces without remainder for most cases. The value of weight increases by extra / spaces for each existing space character (line 46). Depending on this value, a space character is added, whenever it reaches an integer, like when it goes from 1.9 to 2.1. Let's consider following line:

They were hidden from the road by a shallow ridge, but there was only sparse

It has 76 characters, but strlen() returns 77, because it counts CR LF as well. That's also why this number is subtracted from 81 in line thirty two. missing = 4, space count is 14. Therefore each = 0, extra = 4. In the for loop in the thirty sixth line, characters are processed one by one. If it hits a space (line 40), the inner for loop has no effect (it's skipped) because each = 0 in this example. Since sp1 = 0, the weight variable is initially 0. At the second space character (between the words "...were hidden..."), weight = 1 * (4 + 1) / 14 ≈ 0.36. Because of sp1, weight will increase linearly. So, following values are obtained for this example for each space:

0.00   0.36   0.71   1.07   1.43   1.79   2.14   2.50   2.86   3.21   3.57   3.93   4.29   4.64

The integer crossings here occur at fourth (0.71 -> 1.07), seventh (1.79 -> 2.14), tenth (2.86 -> 3.21) and thirteenth (3.93 -> 4.29) spaces.

Actually, looking at both its explanation above, and the number of code lines, this function is more complex than the smooth scroll algorithm itself.

The vga_set_base_addr() function multiplies the lineP by eighty (line 62), writes its high byte to Start Address High and it low byte to Start Address Low registers. lineP is a counter, that determines which line will be displayed at the top of the screen.

I explained waitbl() in detail in the previous article.

vgaprint() copies the given string to the video memory. Since the lines returned by justify_to_80() don't contain '\n', while others coming directly (last paragraph line) do have '\n' at the end, it was necessary to implement a workaround like the line eighty-six. In the for loop (line 89), exactly eighty characters are printed. Even if a line has less than eighty characters (last line of a paragraph), space characters overwrite existing characters in the line, if any. I actually added, making linecount a local variable, to the TODO list (on the line 14 of the code). If the value of this variable was increased in main(), it could easily be made local. However, investigating main() below, it will become clear, that these parts are written bit hastily.

main() first of all, is relatively big. The first while block (line 121) and the second one, handling scrolling (line 137) could have been written as two separate functions. Second, input sanitization should perhaps have been more featureful, but I'm also aware that every diskmag file will have shorter lines than 80 characters. Most important constraint is actually, that an input file cannot be longer than 409 lines. VGA text mode video memory is 32 KB, between B800:0000 and B800:7FFF (color). Max 409 lines of 80 characters can fit here. Input file size cannot be trusted in this case, because we'll be adding spaces to the file.

A line is read from the file (line 119) even before the first while loop, and it's assumed that this line is not the end of a paragraph (EOP). The next lines are read inside the while loop, and checked whether it's a blank line or end of file. The aim here is not to justify any line at EOP, and if there is a blank line after the current line, that current line marks the EOP. If the line is not EOP, it is justified to 80 characters and the execution processes the next line. In short, the next line is also checked at each step.

ESC key leaves the second while block. Keyboard input is checked in the switch/case structure. If the up arrow is pressed, the top line number variable is decreased by 1 (line 142), similarly down arrow key increases it by 1. Of course, while doing this, the number of lines in the file is also checked, so that the text always stays wholly on the screen, it doesn't scroll off the top or bottom. Scrolling effects here are exactly the same as in the previous code, and even simpler as no memory is copied. The only difference is that scroll speed of the arrow keys is dependent on the SCROLLSTEPFINE parameter, and scroll speed of the Pg Up and Pg Dn keys is dependent on the SCROLLSTEPCOARSE parameter. Decreasing these slows down scrolling, increasing these reduces the effect and speeds up the scrolling.

The logic behind the scrolling with Pg Up and Pg Dn keys is exactly the same as scrolling with the arrow keys. However, the effect is intensified by doing 24 small consecutive line scrolls up or down in a for loop.

I've embedded a video of this below, but due to the recording, the scrolling doesn't look right in the clip when scrolling with Pg Up or Pg Dn.

And the source code is given below:

Sunday, December 7, 2025

How to Drive an 8x8 LED Matrix Display with MAX7219 using Arduino


Hi there. In this article, I'll take a look at the MAX7219 integrated circuit and explain, how to drive an 8x8 LED Matrix with this IC. Since there is an Arduino library for this chip, I'll use Arduino for my examples. However, SPI (Serial Peripheral Interface) protocol is independent of microcontroller and my examples can be easily ported to other microcontrollers as well. MAX7219 is a fairly simple IC, it is also really easy to use it without any library. I'll give two versions of an example code, one with library and another without.


MAX7219 and MAX7221 ICs

MAX7219 and MAX7221 ICs provide an interface to 7 segment displays or 8x8 LED matrices via SPI. While they are pin and instruction compatible, they share same datasheet and same functions, MAX7221 supports other serial protocols besides SPI and operates in a more robust way. Therefore, it is also more expensive.

These ICs support up to eight 7 segment displays with decimal points, bar graph and 8x8 LED matrix displays, and have builtin decoding options for them. Up to 64 LEDs can be driven through 8 common cathodes. As I bought an assembled kit, I won't dive deeper into IC pin connections.

These ICs can also be cascaded up to eight times, which means, by purchasing eight of these modules and connecting their DOUT pin to the DIN pin of next module in a chain, I can drive a larger number of LED displays. The second 5-pin connector on the distal side of the kit is for cascading. Different displays can also be cascaded.

Quad 8x8 LED matrix displays are also available as pre-assembled kits. I also bought a quad kit, to understand and experiment cascading. I'll explain it in detail later in this article.

By looking at the PCB from back, you can see how simply and easily the cascading is done:

The module has five pins. Vcc and GND pins require no explanation. DIN (Data In) pin is where the data is written to. CS (Chip Select) pin is active low. When this pis is set to active, the latches are enabled and the data coming thru DIN is received by the internal shift registers. In the meantime, CLK carries the clock signal synchronously with the data coming from DIN. The data is processed with the rising edge of CS.


Registers of MAX7219

At the beginning of this article, I mentioned that the MAX7219 is a fairly simple IC. All the functionality is handled by a total of 14 registers and eight of them are simple data registers that control LEDs. For convenience, I've copied the table of registers from the datasheet and pasted it to the right side.

All of these addresses (or commands from another perspective) are 8-bit, and each of them is followed by another 8-bit data (or operand). In other words, each data packet arriving at DIN must be 16-bit. To send data, CS' (nChipSelect) signal must be set to logic zero first, and the data must be sent to DIN synchronously with CLK. Setting CS' back to logic 1 terminates data transmission. Communication is described in detail in the "Serial Addressing Modes" section of the datasheet.

I'll come to the first register No-Op later, because No-Op doesn't actually perform a No-Op.

The registers Digit 0 to Digit 7 are holding the LED states, i.e. a row of a 8x8 LED Matrix display or a segment of a 7-segment display. For example, by sending the value 0x0F to the Digit 0 register, i.e. by pushing the data 0x01 0x0F onto data bus, I turn the rightmost four LEDs of the first row on, and turn the leftmost four off. For Digit 1, this controls the second LED row, for Digit 2, the third row, and so on.

Decode Mode (0x09) controls the internal 7-segment decoder unit of the IC. If it has 0xFF, the ICs only looks at the lower four bits of Digit registers and decodes them for a 7-segment display. If it has 0x00, no decoding is performed. This is the appropriate mode for 8x8 LED Matrix displays.

Intensity (0x0A) register is used to adjust the LED intensity with PWM.

Scan Limit (0x0B) register is used to optimize the scan rate of LEDs, if not all LEDs will be used (e.g. a 7-segment display without decimal point), by deactivating unused LED pins. As all LEDs are used in a LED Matrix display this should be usually zero.

If the shutdown (0x0C) register is zero, the IC shuts itself down. The scan oscillator of IC is shut down and all LEDs turn off. The supply current drops to 150 µA. It is in mA range during normal operation, and is approx. 300 mA when all LEDs are on. The IC boots up in shutdown mode. Therefore, first step of initialization is to write 0x1 to this register.

When 0x1 is written to the display test (0x0F) register, all LEDs turns on. This allows you to check if any of them are faulty. During the test, Digit registers are not touched, the output is overridden. When 0x0 is written here, IC returns to its normal operating mode.

No-Op operation (0x0) is used for cascading ICs. This has no effect on the display receiving the command. The IC receiving a No-Op simply sends the subsequent command via DOUT pin. For example, in a quad 8x8 LED matrix display, if you need to do something just with the fourth display, the microcontroller issues three No-Op commands followed by the actual operation. In this case, first display receives these data, stripes first No-Op and sends two No-Ops followed by the actual operation to the second module via its DOUT pin. Second display likewise sends just one No-Op and the actual operation to the third display. The third display receives the remaining No-Op and the operation destined to fourth display and forwards only the actual operation one last time to the fourth. Below is a diagram illustrating this process.



The block diagram in the datasheet doesn't show a register named "No-Op", but since it is covered under the "No-Op Register" section on page ten, I can't tell if this is an operation or a register. By the way, as it can be seen above, even though it's completely meaningless, even No-Op command is 16-bit, so it has be to packed with a 8-bit data.


Arduino LedControl Library

After explaining the registers in such detail, using a library may actually seem pointless, but the LedControl library simplifies some tasks quite a lot. For example, while Digit registers provide row-by-row access to LEDs, the library has some other functions like setColumn(), setLed() and setChar() in addition to setRow().

To install the library, go to Tools -> Manage Libraries in Arduino IDE and search for "LedControl" and click Add. Then, include LedControl.h header file in your code and create a LedControl object. While creating this, you need to pass which pin is connected to which Arduino pin and how many devices are cascaded, to constructor function. E.g.

#include "LedControl.h"
LedControl lc = LedControl(data = 12, clk = 11, chipSel = 10, 4);

In all my examples, DIN is connected to Arduino's 12th pin, CLK to the 11th and CS to the 10th pin, like this:

The Matrix display image in Fritzing has six pins. The top pin is not connected, which is the second Vcc, so it doesn't really matter.


Code Examples

My first example is a simple character scrolling. Here, the characters are scrolled module by module. There won't be any bit operations. I uploaded the code to my github account. As I mentioned before, MAX7219 starts up in shutdown mode. In the for loop, on line 61 inside the setup() routine, each display is first taken out of shutdown mode one by one, and LED intensity is set to lowest. The "dizi" array holds the character sequence to be displayed. It is actually a string variable in broader sense. At the end of the array, the first three characters repeat for endless scrolling effect on display.

The "table" array contains the bitmaps of characters. I downloaded a font package from here, and used BIOS.F08 font. This file contains the classic 8x8 BIOS font. I opened it in GIMP and exported it as C source code or C header. Since the entire character table 2 KB (256 * 8), it is not possible to to load whole table to Arduino UNO's 2 KB RAM, yet it's not necessary anyway. I only imported the characters, that is going to be displayed. In the main for loop, the values of bitmaps are sent to MAX7219 row by row using the setRow() function. Although the character sequence is 20 characters long, when the pointer value is 16, 16th, 17th, 18th and 19th characters will appear on the display, so the pointer must not exceed 20 - 4.


Second example (counter) is a counter as its name suggests, which is even simpler than the first example. Arduino increments the values in the "a" array in full speed, starting from the zero-indexed element. In the for loop on line 25, the array elements are checked for overflow at byte boundary. If an overflow occurs, this array element is reset, and the carry is transferred to the next element (line 29), and this binary counter is visualized with LEDs on line 32.

Counting from zero to 256 takes less than a second. The LED corresponding to the tenth bit flashes approximately at one second intervals. If we ignore less significant nine bits, it would take roughly 2(64-10)=254 seconds for the remaining 54 LEDs to light up completely, which is about 5.709 * 108 (571 million) years.


My third example is the same of the second one, but I wrote it without library. In this code, the pins are first set to OUTPUT. Then, the IC is taken out of shutdown mode (line 32), the scan limit is set to 7 (line 37), decoding is disabled on line 43, as we have an 8x8 LED display. That's the initialization sequence. On line 51, the LED intensity is set to the lowest level and all LEDs are cleared (line 60). The logic in the loop() procedure is same as the above example, with one difference. Each array element is written to the corresponding digit register directly row by row (line 77).

As you can see, the IC can also be easily programmed without library. My goal here was not perform a speed test with or without library. I'd probably get similar results anyway. But if my focus were speed, I would be using Assembly.


In the fourth and final example, I created a smooth scrolling text using bit operations. To do this, I took two long German words. German is really perfect for this task. I converted these words into char arrays (lines 16 or 19). I then created a bitmap table like I did in the first example, but with more characters this time. As usual, the IC is initialized in the setup() routine, and the bitmap images of characters are copied to the "kayanyazi" array.

In the loop() function, the first four characters of "kayanyazi" array are sent to display. Since the text will be scrolled to the left, I assign the most significant bits (MSB) of each LED line to the "carry_old" variable, which means, carry_old contains the first column of character array (or first LED column). Then all characters are shifted by one bit (line 108), but the first column of each character is copied to carry_new (line 106) before any shift operation, so that any carry bits of byte order is kept before it gets lost and this is inserted to the least significant bit (LSB) of the trailing character.

Sunday, November 9, 2025

Programming VGA: Smooth Scrolling in Text Mode #2


Hi there. In this article, I will continue with the VGA topic, and I have a fantastic example from 1994, that I want to show you. I wanted to write about this code in detail, therefore I didn't want to cram it into the previous post.;

I uploaded this code, I mentioned, to my github account. I must have downloaded it in late 90s, as the comment header indicates, it's a Basic code written in 1994 (Dear William Yu, if you're reading this, feel free to reach me out). There are two points I want to highlight in this code. The first one is the code snippet on line 12, that accesses the ninth CRT controller register:

OUT &H3D4, 9
OUT &H3D5, 1

This register consists of following bits [1]:

Maximum Scan Line Register (Index 09h)
76543210
SDLC9SVB9Maximum Scan Line

and the "Maximum Scan Line" field, which is modified by the code, repeats the pixels by one more than the value in this field on vertical axis in graphics modes. If its value is 1, each pixel appears twice as large vertically, as if the pixel just below it were also set. If the value of this field were 9, the pixels would be 10 times higher. Since the vertical resolution of the screen is a constant -in our example 640x480 pixels in mode 12h (line 11)- doubling the pixel height would mean reducing the visible screen size by two. In this example, 640x240 pixels would be visible on the screen. If we had expanded the pixels by 10, we would have obtained a resolution of 640x48 pixels on the visible screen. Obviously, the video memory size does not change, pixels below the half of the screen just won't be visible. They will only be visible again, when 0 is written to this field. In standard text mode, this field contains the value 15. As I mentioned in the previous article, this is the height of a standard character in pixels. If greater values are written to this field, the lines get spaced out. For quite smaller values, the characters get jumbled together and and the screen gets unreadable.

As the calculations in this code are based on 640x240 pixels (e.g. lines 30 and 35), this line cannot be commented out in an easy way. The pixels are doubled in size, the text on the screen is also twice as large (see right). At the loop between the lines 13 and 20, stars are printed on the screen and the planet is drawn between the lines 22 and 27. In the section, up to the line 35, a triangle (spacecraft) is drawn, and in section up to the line 43, this triangle is moved on the screen as an image block with GET and PUT commands. The remaining graphic effects are not quite important. The most important part is the EarthQuake SUB procedure (lines 87 to 94). Here, some values are written to the eighth register sequentially.

Delay = 5500       ' Increase this or decrease for earthquake delay

FOR X = 1 TO Delay
  OUT &H3D4, 8: OUT &H3D5, X
NEXT X

That's the deal with the eighth register [1]:

Preset Row Scan Register (Index 08h)
76543210

Byte PanningPreset Row Scan

In my observation, the "Preset Row Scan" field has either no effect in graphics mode, or DosBOX cannot emulate it properly. Normally, this field shifts the origin of the text screen with pixel precision, and it works flawlessly in DosBOX. In other words, if I write 8 to this field, the upper half of the first character line will disappear and another half character line appears at the bottom of the screen. Basically, the screen is shifted up by the number of pixels written in this field. This is the exact register, which I also used for smooth scrolling. The "Byte Panning" field shifts the screen one character wide to the left. Therefore, the screen can be scrolled (but not smoothly) horizontally by 1, 2 or 3 characters wide, depending on the value of this field. For mode 12h, this is 8 pixels per character (640 pixels / 80 characters).

Other than this register, there is another register pair, I'd like to mention. These are Start Address Register Low and High.

Start Address Low Register (Index 0Dh)
76543210
Start Address Low


Start Address High Register (Index 0Ch)
76543210
Start Address High

These don't have any bit fields. Normally, the top left corner of the screen is the origin, i.e. (0, 0) point of the screen and its memory location is 0x0 in VGA regardsless of text or graphics mode. But sometimes, it may be more convenient to take the center of the screen as the origin. In Mode 13h (320 x 200), taking the center of the screen as origin also means drawing the first image pixel at 160x100 point. The linear address of this pixel is 320 * 100 + 160 = 32160 = 7DA0h.

OUT &H3D4, &HD: OUT &H3D5, &HA0
OUT &H3D4, &HC: OUT &H3D5, &H7D

With this code snippet, the origin is moved to the center of the screen. The value written to the zeroth offset of the video memory will appear in the center of the screen, after this point. This is roughly, what the WINDOW command in QBasic does. Of course, QBasic will also convert the negative coordinates given to PSET, LINE etc., by itself, whereas in lower level programming languages this task is left to the programmer.

If sequentially increasing values are written to this register, it will seen as a left scrolling effect on the screen. Obviously, it is not the characters, that are actually being scrolled, but the origin. Similarly, if values are written to the register in increments equal to the width of the screen (the number of characters per row for text modes, or the number of pixels on X-axis for graphic modes), it will seen as an upwards scrolling effect on the screen. The characters aren't moved from one memory block to another. The processor is just busy writing some values thru the ports. This is in theory the most optimal way to scroll the entire screen in VGA. However, to copy the characters, which disappear from the screen, to the part, which will be appearing on the screen, moving memory blocks is inevitable.


VGA Text Mode Structure

VGA text mode is pretty simple. Here, I'll explain 80x25 standard text mode. Even though the logic of the 40x25 text mode is quite similar, some addresses need to be recalculated for this mode. Monochrome mode video memory starts at the segment 0xB000 and ends at 0xB777. Color video memory starts at the segment 0xB800 and ends at 0xBFFF spanning 32 KB for each. Each character is one word, i.e. 16 bits. The low byte of this word contains the ASCII code of the character, and the high byte holds the color codes for character foreground and background [3]. Four lower bits of the color byte is the color of the character. There are 8 standard VGA colors; 0: black, 1: blue, 2: green, 3: cyan, 4: red, 5: magenta, 6: brown/yellow, 7: gray. Adding 8 to these values yields the high intensity versions of these colors. Bits 4, 5 and 6. keep the background color. And the most significant bit (7th) makes the character blink. Here is an example for this. The characters 'u' and 'g' normally blink.

VGA text mode consists of 8 pages. I already mentioned, that visible area of the screen consists of 80 x 25 = 2000 characters, i.e. 2000 words. In this case, the visible screen is 4000 bytes (0x0FA0) long. Since the video memory is 32 KB, it can be divided into 8 pages. First page starts at address 0xB800:0, next one at 0xB800:0xFA0, next one 0xB800:0x1F40 and so on. The number of pages for each screen mode can be seen in this table and you can switch between pages using Int 10h/AH=05h.

With that much background info, I can now move on to my own smooth scrolling code.


SCROLL.C: Smooth Scrolling in VGA Text Mode

First of all, why didn't I use the start address register for scrolling? Shortest answer is, I could have. If I copied the zeroth (actual) page to the first and second pages and switch to the first page, I would get a copy of the visible screen above and below. Then, I could implement scrolling by increasing or decreasing the start address register by 160. I'll leave this for another post for now.

My code, OTOH, interfere with other pages as little as possible (just a single line). This is the difference compared to the method, described above. I uploaded the code to my github account as usual.

I wrote it in Turbo C v3.0. It compiles and runs without problem. As I mentioned in the previous article, everything is done in DosBOX. Since Turbo C does not have true and false as built-in data types, I define them as shown on line seven and eight. I also define two pointers on sixteenth and seventeenth lines. First one is VGA video memory pointer for text mode [4]. Second one is a pointer, to hold a local copy of the screen memory, but even though I named it DoubleBuff, it doesn't really do double buffering [2], IMO. I'll get back to double buffering in a future article. This is a short pointer, and 80x25 words of memory are allocated on line 23. Why words? There are two reasons for this. First, processing data word by word is faster than byte by byte. Characters are moved with a single instruction along with their color codes. The second reason is code readability.

Scan codes of up arrow and down arrow keys are fetched, and based on this ScrollUp() or ScrollDown() functions are assigned to the 'fp' function pointer. Both functions scroll the screen by just a single line. Calling these functions 25 times in a for loop does full-screen scrolling.


ScrollUp

While scrolling the screen up by increasing the value at the preset row scan (PRS) field, the first row of the next page, which is normally invisible, becomes partially visible. Therefore, on line 60 of the code, I copy the top row of the visible screen to the top row of the next page before it gets visible. Then, I copy the first row of the screen to the bottom of the buffer, then second row to the first row of buffer and the subsequent rows to the one row above corresponding screen row in the DoubleBuff array. In other words, the line N+1 on the screen to the line N in the buffer.

Next, I save the initial value of the PRS field and increment it one by one from 0 to 15 (line 75). This makes the screen look like scrolling up pixel by pixel. At the same time, the row I copied from the top on the 60th line, which was initially invisible, begins to appear from the bottom of the screen. On line 80, I copy DoubleBuff to the visible screen using inline assembly. I used assembly here for speed, because the C code, I wrote, that does the same job, runs slower and causes flickering on the screen.

Finally, I write the old value of the PRS back. Here, I would actually need to write zero rather than its old value.


ScrollDown

In ScrollDown(), I used a different approach than in ScrollUp(). Since the PRS only scrolls the screen upward, for the downward scrolling effect, I first copied the bottom row into a local array (line 108), as this bottom row is going to disappear from the screen during scroll down. After moving the characters in the video memory, I write this line I copied, to the top row and finally write the highest possible value -which is 15- to the PRS field of the register (line 113).

I moved the video memory using assembly code again for speed, like in ScrollUp(). SI has 4000 and DI has 4160. This means, SI points to the first character of the first page (the invisible bottom page), and DI points to the first character on the second row of the first page. When CX, as a counter, has a value of 2000 (4000 in CX is divided by two with shr on line 136), the last character doesn't get copied, as copying starts including from the first character of the first page. Decreasing SI and DI by two (one word) requires 4 bytes of code (lines 137 .. 140, dec instructions). Instead of that, I increase CX and copy one extra character, but increasing CX is done with a single one byte inc instruction on line 141, instead of four. I could have decreased the value on line 132, too, but in this case (if I'm not mistaken), I'd have had to increase CX later, because I decremented the counter. Finally, I set the direction flag and copy backwards, so that the pointer values decrease. Since I'm scrolling downwards, if I had copied in forwards direction, I would have overwritten data, which I'll need for the next row. After copying (rep movsw, line 148) is done, I reset the direction flag to its original state.

Immediately after this, since the top row is now empty after copying, I write the row I copied to TempLine at the beginning of the function (line 156. Why didn't I do this with assembly?), but since I've already written the largest value to the PRS field, only one pixel of this row is yet visible (and it is largely black as well). Therefore it doesn't look too bad. On the 160th line, I complete the scrolling effect, by slowly decreasing the value at the PRS field.


waitlinefull and waitlinehalf

CRT monitors create images by scanning the screen. Electron guns normally send electrons to the center of the screen. To display an image on the screen, this electron beam is deflected by vertical and horizontal deflection coils -or yokes to be more specific- in color monitors. Scanning starts from the upper left corner of the screen and moves to the upper right first. During that, the voltage in the vertical coil stays constant, and sawtooth wave is applied to horizontal coil. This draws the first row of pixels on the screen. There is actually no direct correspondence between a pixel and a point on phosphor coating of the screen. Yet let's still assume it as a pixel for simplicity. After first line is done, the voltage on the vertical coil is increased, and this process is repeated for each row until scanning reaches the bottom right corner of the screen. So basically both coils get a sawtooth wave with different frequency. Of course, during this scan, the electron guns don't send electrons continuously, they turn on and off, depending on the picture. If they remained continuously on, a blank screen would appear.

Deflection Yokes (Source: Wikipedia)

This scanning operation is called "vertical retrace" (VR) in the terminology, and if the video memory is changed while this process is ongoing, the image appears to flicker. To check scanning status, 0x3DA control register of VGA comes to rescue. The third bit of this register is set during VR. The programmer can (and should) check this bit, before writing anything to the screen. This check is made in waitlinefull() and waitlinehalf() functions.

In waitlinefull(), if there is no VR, the execution waits in the loop on line 173, because even if VR isn't active at that moment, it can start anytime before video memory operations are complete, and the image may still flicker. If VR is already in progress, this line has no effect and the execution continues from the next line and waits here in the loop, until VT is complete.

In waitlinehalf(), the execution waits in the loop only if VR is already in progress. It doesn't wait for the next VR cycle. In waitlinefull(), waiting until the next VR cycle starts, wastes too many CPU cycles on fast computers. Therefore this is skipped in waitlinehalf(). It waits for a shorter period, but it may sometimes fail to prevent flickering.

I put both functions into my code, and after completing the skeleton of the effect, I tried them out on various lines, until I achieved a smooth effect. Since waitlinehalf() takes a quite short period of time, I used waitlinefull() everywhere. As I mention in the previos post, DosBOX is just a VGA emulation. Therefore these procedures would need to be readjusted for real hardware. I actually chose waitlinefull() in the code, just because it provides a stable wait time, that doesn't vary much from machine to machine.


So, I only scratched the surface of VGA topic here. The smooth scrolling effect looks quite well, but this is just a small example of what can be done with VGA. By playing around with these registers, it's possible to create a wide variety of effects. In future posts, if I find time, I would like to explain a few more effects. And finally, here is a video of the effect:



[1]: http://www.osdever.net/FreeVGA/vga/crtcreg.htm#09
[2]: http://wiki.osdev.org/Double_Buffering
[3]: https://en.wikipedia.org/wiki/VGA_text_mode#Text_buffer
[4]: https://stackoverflow.com/questions/47588486/cannot-write-to-screen-memory-in-c

Thursday, August 21, 2025

Programming VGA: Smooth Scrolling in Text Mode #1


Hi there. The title of the article sounds probably like a chapter title from a computer book written in Monospace fonts from 80s, because this time, I'll be indeed discussing VGA, a technology from 80s, with a fairly low level approach. 

80s Tech (illustrative image) [1]

More clearly, I explained in this article, how to create smooth scrolling effect in VGA Text mode, how to access to video memory and VGA registers directly without using interrupts. In first part, I gave a brief introduction to VGA and explain some basic registers. I also put some nostalgia in between. Therefore, this post turned out to be longer than I expected, so I had to divide it into two parts. The essence of smooth scrolling will be covered in the next article, as I want to discuss double buffering technique along with smooth scrolling. This is also a relatively long topic.

Disclaimer: Writing improper values to VGA registers may cause permanent damage to the hardware. The information provided here, might not be accurate, as it has not been tested on a real CRT monitor, and therefore the risk of using it is solely yours. If there is any odd chance of damage, the author of this article warns you and accepts no responsibility for anything that may happen.

Original VGA Graphics Card (wikipedia)
Well, let me explain the warning above a little bit. Forcing a VGA Card to operate at a frequency, that a CRT doesn't support, can indeed damage your monitor [2]. I haven't seen this before, but I know such thing exists. On the other hand, I threw my last CRT monitor away, maybe ten years ago (is there anybody out there, still using a CRT?). My graphics card isn't a real VGA card, just VGA-compatible. So, even the registers I've accessed aren't real. I developed and ran my code in DOSBox and also optimized in DOSBox. So, the timings would likely be off on a real CRT, and I'd need to readjust waitretrace routines (waitretrace will be covered in the next article). TBH, I don't even know, how this code would run on a real 80386. 

Introduction

We usually access to the VGA card using int 10h BIOS interface. Everything from selecting screen mode to moving cursor around or adjusting its size can be done with this interrupt. What this interrupt really does, is accessing VGA registers in a "correct way". VGA BIOS can overwrite int 10h routines, when needed.

I also do not think that int 10h causes a significant slowdown (except few routines like putpixel), and one advantage of it is that it allows us to avoid the complexity in the code. VGA has a lot of registers [3]*. Understanding the function of some of these registers requires CRT knowledge at some context. On the other hand, I mentioned that I used DOSBox for this article. Although DOSBox can correctly emulate many VGA features, it still cannot display some (non-standard) effects properly. It is known that some DOS games don't work well in DOSBox (configuration errors aside). Yet DOSBox still deserves credit; it is more compatible to DOS compared to VMware or VirtualBox (and yes, I compared apples to oranges here, first one is dedicated to DOS and just DOS, while others are generic virtualization solutions). 

* The source mentioned here refers to more than 300 registers, but not even 100 of them are documented there. They probably count non-standard registers in this number, which are added by different vendors. There are approximately 60 standard VGA registers, which is still a big number. 


VGA Registers and Accessing the Card

As I mentioned above, there are a lot of VGA registers. In this link, these are grouped into six categories. These categories are formed according to the HW port numbers used for access. In this article, I mainly worked with CRT controller (CRTC) registers. 

The registers are accessed via six pairs of hardware ports, roughly. In other words, not every register has been assigned to a port. A list of ports is given in [3]. Generally speaking, you put the register number, you want to access, on port 0x3DX and then read the value from the register at the port 0x3DX+1, or write to it. There is an exception for 0x3D0, but I won't mention it here. I'll provide plenty examples on these registers, in the following sections.

I didn't want to mention all the registers here and turn this article into a reference manual. So, I'll focus only on the interesting parts. For example, CRTC is accessed via 0x3D4 and 0x3D5 ports. 0x3D4 is the address register and 0x3D5 is the data register [5].


Cursor Start Register (Index 0Ah)
76543210


CDCursor Scan Line Start


Now, let's take a look at the code for disabling the cursor in [6]:

void disable_cursor()
{
    outb(0x3D4, 0x0A);
    outb(0x3D5, 0x20);
}

First, we write 0xA to 0x3D4 and tell the card, that we are going to access the register 0xA. Then we write the value 0x20 to this register via 0x3D5 port. This value sets the 5th bit of the register, which is Cursor Disable (CD) bit [5]. Pretty easy.

Cursor Scan Line Start bits hold the pixel, which the cursor will start from. In the standard 80x25 character mode (mode 3), each character and the cursor itself are actually 8 pixel to 16 pixel (px) images. So that, by changing these images, custom fonts can be loaded in DOS. In the screenshot below, which I took from a font editor for DOS, an example character can be seen closely, so that you can count. The font table is located in VGA BIOS (check Int 10h / 1130h, if you're interested) and custom fonts are written to this area temporarily. After reboot, standard font will come back. As fonts are a complicated topic, I don't want to go into further detail.

Back to the topic: In edit environment, the cursor starts at the 14th pixel row and ends at the 15th, but when Insert key is pressed in DOS edit, the cursor gets bigger. It starts from the 0th pixel row and ends at 15th. The first element, that causes this effect, is the low five bits of the Cursor Start Register, and the second element is the register 0xB or Cursor End Register, or more precisely its low five bits:


 Cursor End Register (Index 0Bh)
76543210

Cursor SkewCursor Scan Line End


This register holds the lower pixel row of the cursor. But if a character is 16 px high, why 5 bits? VGA actually supports characters up to 32 px high in text mode [6]. Similarly, VGA font table is also 8 KB in size: Number of chars (256) * height (32 px) * width (8 px) / bits per byte (8). Therefore, 5 bits are allocated on each register for cursor, but the fourth bit has no meaning in any text mode, as no text mode supports chars higher than 16 pixels. Cursor Skew field is reserved for EGA compatibility and has no meaning in VGA either.

Another easy to understand pair of registers are Cursor Location High (0xE) and Cursor Location Low (0xF) registers. They keep the linear position information of the cursor. This value divided by the number of character columns (in our case 80), the quotioent is the y-position, and the remainder is the x-position of the cursor. Or from the opposite direction: D = Y * 80 + X. Since these registers are byte sized, high byte of D is written to 0xE and low byte to 0xF.

Cursor Location High Register (Index 0Eh)
76543210
Cursor Location High


Cursor Location Low Register (Index 0Fh)
76543210
Cursor Location Low


Back to the 80s: QBasic

Now, I am going to do a little demo with these two pairs of registers, and interestingly, I'm going to use QBasic for that. Like many others, I started programming with Basic: just a little C64 Basic, then GW-Basic (big thanks to TRT (Turkish State Television) at that time, especially computer programming courses on the Channel 4 (TRT4) from Open Education Faculty) and finally QBasic. And I claim, that everyone who was born in 80s and had a computer in the 90s, has seen the IDE below at least once. I used to write my scripts in QB, when .bat files were inefficient for a specific task. Later, when Qbasic's notorious speed became a visible problem to me, this pushed me to learn C and Assembly. -There was also a short Pascal period somewhere in between.- BTW, Qbasic was an interpreter and being unable to compile .exe files was another huge drawback for me. Even though, I had the chance to work with Quick Basic v4.5 at almost the same time I started learning C; the horizons, C opened up, were completely different. Additionally, finding Quick Basic 4.5 IDE at that time was quite hard, at least for me.

Only, neither QBasic nor Quick Basic (QB for short) were incapable programming languages. With QB, I could also do anything, that I could do with C (except speed). Looking at my old code, I wrote a QB - Int 33h interface for mouse, and an int 13h interface for low lever disk operations. There are also incredible QB codes written by others. But aside from its slowness, the programming logic in QB was also a bit "different", and it was IMHO diverging from where the programming paradigm was heading to. I also tried my luck at Visual Basic, but I realized more or less around that time, that Windows, in general, wasn't for me. In the early 2000s, I tried Win32 Assembly for visual programming, but it felt too cumbersome to me.

And even though I've been blogging for more than ten years, I've written many code snippets in various programming languages in these blog posts, I realized that I haven't given any single QB example. However, I believe, QB is more suitable for such simple code snippets, because neither you would need as many lines as in Assembly to do a simple thing, nor would you have to worry about things like including headers, type casts or paying attention to buffers, pointers etc. Anyway, enough retrospective. I do the following example of this post in QB:


DECLARE SUB ENABLECURSOR (CURSTART%, CUREND%)
DECLARE SUB DISABLECURSOR ()
DECLARE SUB MOVECURSOR (CURSORX%, CURSORY%)

FOR X% = 0 TO 15
  FOR Y% = X% TO 15
    CALL ENABLECURSOR(X%, Y%)
    SLEEP 1
    CALL DISABLECURSOR
    SLEEP 1
  NEXT Y%
NEXT X%

CALL ENABLECURSOR(0, 15)

FOR Y% = 0 TO 10
  FOR X% = 0 TO 10
    CALL MOVECURSOR(X%, Y%)
    SLEEP 1
  NEXT X%
NEXT Y%

SUB DISABLECURSOR
    OUT &H3D4, &HA
    OUT &H3D5, &H20
END SUB

SUB ENABLECURSOR (CURSTART%, CUREND%)
    OUT &H3D4, &HA
    CS1% = INP(&H3D5)
    OUT &H3D5, (CS1% AND &HC0) OR CURSTART%

    OUT &H3D4, &HB
    CE1% = INP(&H3D5)
    OUT &H3D5, (CE1% AND &HE0) OR CUREND%
END SUB

SUB MOVECURSOR (CURSORX%, CURSORY%)
    POSITION% = CURSORY% * 80 + CURSORX%

    OUT &H3D4, &HF
    OUT &H3D5, POSITION% AND 255
    OUT &H3D4, &HE
    OUT &H3D5, POSITION% \ 256
END SUB


The code is a bit long, but it basically contains the same code in [4]. In the first part, all combinations for the cursor is being set in a for loop. The parameters are sent to relevant VGA registers in ENABLECURSOR subroutine. The delay (SLEEP) can be skipped by holding down the CTRL key.

After the first for loop, I set the cursor to its biggest size, so that it could be easily seen. Then I moved it around the 10 x 10 section of the screen using MOVECURSOR subroutine. Assuming that the screen is 80 columns wide, I calculated the linear position of the cursor from (X, Y) coordinates in MOVECURSOR subroutine.

In the following article, I will continue to discussing VGA registers, primarily those required for smooth scrolling and provide additional examples in QB. However, since scroll operation requires high speed, I wrote it in Assembly + C, and used waitretrace function, I mentioned, I will discuss it at the beginning of this article.



[1]: DEC PDP8 Family User's Guide TSS/8 (1970). Link
[2]: https://retrocomputing.stackexc....damage-my-vga-card-by-programming-it-in-assembly-throu
[3]: http://wiki.osdev.org/VGA_Hardware
[4]: http://wiki.osdev.org/Text_Mode_Cursor
[5]: http://www.osdever.net/FreeVGA/vga/crtcreg.htm
[6]: https://en.wikipedia.org/wiki/VGA_text_mode#Fonts

Thursday, July 10, 2025

Add Captions or Labels to Images Using Python

Hi there. In this blog post, I'll be addressing a need-based problem. My problem is adding labels or any kind of text to images in a directory in a bulk way. For example a copyright note or an address information. I think, this can be solved by creating a stencil in GIMP. But what if the text to be added is not a constant? My actual problem was adding sequence numbers to images. Consequtive numbers starting from one in the corner of each photo. In this case, a stencil wouldn't be a solution. And even if it can be done by editing each and every image with GIMP, it won't be very practical for several hundred images. Most practical solution is writing a simple script for this. By the way, I could have done this sequencing with their filenames, but I didn't want to touch then, because I also want to preserve the timestamp of the file. And let's assume, I want to use these photos on a web page, where their filenames won't be visible at first glance. Finally, with such a script, you can generate time stamps, similar like the cameras from 90s. 

When I talk about scripting, bash comes to (my) mind very first, however it's unfortunately not the best tool for this job. AFAIK, there is no image library usable with bash. Of course, nothing is impossible. It can be done, but as you don't use a hammer to knock down a wall, when you can have a sledgehammer, choosing the appropriate tools for the job is the first step of all solutions. Python lovers may get angry for that, but python, the second best scripting language after bash, has a library for exactly this purpose, which is called Python Imaging Library (PIL). I used a fork of PIL called Pillow in my script. This library is easily imported with pip install pillow command. 

I again uploaded the code to my github account the keep the article short. It's a small script, with just some tricks in it. First of all, like any other python script, there are import statements at the very beginning. The Image module of the library contains image related functions. ImageDraw contains simple 2D image effects, which is needed to rotate the image in my script and finally ImageFont for fonts and other text effects.


Exif Data and Orientation

The first challenge of this project is that the images cannot be viewed on the computer as easily as we see it on the phone. Let's take the image below, that I took with my cell phone, as an example:


Above, it appears vertically on the browser. In Gwenview, also appears in vertical format. But, when I open it with the following code

>>> from PIL import Image
>>> img = Image.open("image.jpg")
>>> img.show()

it appears horizontally.

and when I open it with GIMP, a strange dialog box says that the image contains "Exif orienation data" and asks if I want to rotate it. But why?

While holding a cell phone horizontally* and taking a photo, it doesn't actually rotate the photo. When I open such photos, taken with a cell phone or a digital camera, in python, those which are not taken vertically, are shown on the screen with the same orientation as they're taken. The camera saves the orientation info (thanks to their gravity sensors) inside the picture, and rotates them while viewing. If the orientation was not saved, we would have to turn the phone to the exact orientation, at which it was originally taken, each time.

*: The default position of some cameras are horizontal, but some are vertical. 

From the above statement, it's clear that an image doesn't only consist of pixel data. There is a field called Exif, where the metadata of an image is stored and today, all image formats as well as cameras support Exif.  There is a table on Wikipedia about the data stored in Exif. Typical fields are the manufacturer and the model of the camera, image orientation, the time and date the photo was taken, image resolution etc. For example, saving the timestamp of the photo allows you to find out when the photo was actually taken and to sort it by date, even if the file is renamed afterwards. On the other hand, some phones put the coordinates there from phones GPS, and reveal where the photo was taken and some platforms can then automatically tag the location of the photo, when it is shared. These are occasions, that will send shivers down the spine of those who is sensitive to the privacy of personal data. According to legend, Ukrainians asked Russian soldiers online for their photos, and used the coordinate data of this photos to launch attacks. 

In Linux, a tool called exiftool can be used to review this information (exiftool -list <filename>), be manipulated or wiped completely  (exiftool -all= <filename>). For the example image above, there is a difference of around 77 KB between the original image and the image with all Exif data wiped. 

After a long (and unnecessary) explanation about Exif, let's go back to the image orientation. In Pillow, there is an Image.getexif() function [1], to read and parse Exif data. When I print the output of this function to screen (22., 23. and 24. lines, commented out), I can see the orientation info of .jpg files in the directory. As mentioned in [1], the values 2, 7, 4 and 5 were not in my images. Likewise 8, therefore I didn't implement 8 in my code either, but it's easy. For the value 1, I used "else" in the if structure (line 31), so for 8 and all other values, the image is not rotated. 

For rotation, Pillow has Image.rotate() function [2]. Using that, I rotated the image 90 degrees for Orientation=3, and 270 degrees Orientation=6. I have to open a parenthesis here for the line 27: If there is no image orientation field in Exif, or if there is no Exif information in the image at all, the code will throw an error. For this reason, the best practice says, better check the return value of getexif() function first and then rotate, if there is no error. Since all my images have Exif, I did not have any problem and kicked the can down the road. It doesn't mean that you also won't have any issues.

So far, I've corrected the image orientation. I used [3] to solve the main problem. There, it is demonstrated how to add text in a simple way, I only adjusted the parameters for my system. In line 35, an ImageDraw object is created to add text. The next line creates a font object with ImageFont.truetype() function. As the font used in [3] isn't on my machine (and as I don't check the return value of the function for errors), I chose a font among my fonts under /usr/share/fonts/ . The second parameter is font size. I found its value by trial and error. My photos were relatively large (8 MP), so 128 could generate a barely visible caption. In the next line, at the given coordinate of the image (25, 25 - top left), I added the sequence number "sirano" with the font I created at the previous line with red color. At this step, the date and time of the photo could be added to the image automatically, either from its filename or from its Exif data. An example of an enumerated photo is below:


Finally, line 40 can be uncommented to show image on the screen and/or the line 43 can be uncommented to save the tagged image with "_enum" suffix. While working on this script, I ran it in a directory with several hundred images, so I neither wanted to display nor save that many images for each run, therefore commented out. 


Not: Alternatively, it's possible to do the same thing in OpenCV with cv2.putText() function, but I'm keeping this for another post. 


[1]: https://jdhao.github.io/2019/07/31/image_rotation_exif_info/
[2]: https://note.nkmk.me/en/python-pillow-rotate/
[3]: https://www.geeksforgeeks.org/python/adding-text-on-image-using-python-pil/ 

Sunday, January 5, 2025

ETag Calculation on Amazon S3 Objects


Hi there. In this article, I'm going to discuss, how the hash of files uploaded to S3, called ETag, is calculated. It is actually nothing more than a simple MD5 hash, calculated for everything uploaded to an S3 bucket. We know, that files in S3 are not really "files", as we understand them. S3 is called as “Object Storage”, thus the files are stored as objects in the correct terminology. 

After a certain file size, uploads with aws s3 cp or aws s3 sync are automatically split into equal sized chunks for (possibly) easier storage. Such objects are called multipart objects. So how big is this certain size? Although, there is no general measure, my files are currently stored in chunks of 8M and 16M. The source I used for this article says that files up to 5G are not fragmented [1][2], but my observation is, that this is no longer true, and this value is also not very important for our problem.

If a file is smaller than this so called multipart threshold, it is stored as a single part and the ETag of the object is equal to its MD5 hash. So simple is that. If the file is larger than this threshold, it is stored as a multipart object and things get a bit complicated. You can easily tell, if an object is multipart or not, by looking at its ETag. A normal MD5 hash consists only of hexadecimal digits. Therefore a hyphen ( - ) does not belong to an MD5 hash. If a file in S3 has a hyphen in its ETag, then it is a multipart object and the number of parts of this file is given after the hyphen. I will give a concrete example later in the article.

ETag calculation on multipart objects works like this: Each part is hashed separately, the resulting hashes are concatenated and hashed again. This hash is the part of the ETag before the hyphen. The number of fragments is simply added after the hyphen [3].

I regularly back up my disks with Clonezilla. Backups get written to an external disk and then copied to S3. I usually keep the most recent copy on my external disk and the last three copies on S3. For backwards (FAT32) compatibility, I split backup files into 4G chunks (even though I don't back up to FAT32 media). The need for ETag comparison arose, because I wanted to verify my copies on S3.

At this point, I assume that the aws CLI tool is installed and configured. The settings are made in the .aws/config file, but I won't go into its details, to avoid lengthening this article. Let's take the small file example first:

$ aws s3api head-object --bucket mybucket --key image_backup/2023-10-15-10-img/Info-lshw.txt
{
    "AcceptRanges": "bytes",
    "LastModified": "2023-10-15T18:28:31+00:00",
    "ContentLength": 40960,
    "ETag": "\"fe78f69cb9d41a23ba23b4783e542a7b\"",
    "ContentType": "text/plain",
    "ServerSideEncryption": "AES256",
    "Metadata": {}
}

As I mentioned before, this is not a multipart object. So the MD5 hash, i.e. the ETag, can be simply found. Below is an example of a large file:

$ aws s3api head-object --bucket mybucket --key image_backup/2024-12-01-13-img/sda5.ntfs-ptcl-img.xz.ac
{
    "AcceptRanges": "bytes",
    "LastModified": "2024-12-03T17:00:58+00:00",
    "ContentLength": 4096008192,
    "ETag": "\"360f5e8babf8cd28673eaafd32eb405f-489\"",
    "ContentType": "application/vnd.nokia.n-gage.ac+xml",
    "ServerSideEncryption": "AES256",
    "Metadata": {}
}

This file is 4096 MB in size and, as you can see from its ETag, it consists of 489 parts.  The main thing here is to find the size of the parts. ContentLength divided by 489 is actually very close to 8M. From this, it's safe to assume that the file is actually divided into 8M chunks, but it would be better to find the exact value, to use it in a script. To do this, I'll add --part-number parameter to the command and check a single part. Since files are splitted into fixed size chunks, only the size of the last fragment is different. And the ETag value for each part is the same. In other words, --part-number will not give the MD5 hash of each individual part.

$ aws s3api head-object --bucket mybucket --key image_backup/2023-10-15-10-img/sda5.ntfs-ptcl-img.gz.aac --part-number 1
{
    "AcceptRanges": "bytes",
    "LastModified": "2023-10-15T18:28:31+00:00",
    "ContentLength": 16777216,
    "ETag": "\"aba379cb0d00f21f53da5136fc5b0366-299\"",
    "ContentType": "audio/aac",
    "ServerSideEncryption": "AES256",
    "Metadata": {},
    "PartsCount": 299
}

$ aws s3api head-object --bucket mybucket --key image_backup/2023-10-15-10-img/sda5.ntfs-ptcl-img.gz.aac --part-number 299
{
    "AcceptRanges": "bytes",
    "LastModified": "2023-10-15T18:28:31+00:00",
    "ContentLength": 401408,
    "ETag": "\"aba379cb0d00f21f53da5136fc5b0366-299\"",
    "ContentType": "audio/aac",
    "ServerSideEncryption": "AES256",
    "Metadata": {},
    "PartsCount": 299
}

According to the official AWS documentation (as of December 2024) [4] the default chunk size is 8 MB, yet as seen above, in October 2023 a file was uploaded with 16 MB chunks. So it makes more sense to get this value from the ContentLength field instead of assuming it as a constant. It seems, that the folks at Amazon change this default, when they get bored. By the way, aws command produces json output. When working with bash script, it is more elegant to parse the output with jq instead of grep:

$ aws s3api head-object --bucket mybucket --key image_backup/2023-10-15-10-img/sda5.ntfs-ptcl-img.gz.aac --part-number 1 | jq -r '.ETag'
"aba379cb0d00f21f53da5136fc5b0366-299"

$ aws s3api head-object --bucket mybucket --key image_backup/2023-10-15-10-img/sda5.ntfs-ptcl-img.gz.aac --part-number 1 | jq -r '.ContentLength'
16777216

I wrote a script to compare all files in the backup directory one by one. It's kinda long to paste it here, so it's available via repo link.  The script simply asks the bucket name and the name of the directory, where the backups are copied. I keep my backups in subdirectories with the format <YYYY-MM-DD-HH-img>, under a directory called image_backup. This part (line 12) can be changed when needed. If a file is a single part file, I hash it directly (line 26). If it's multipart, the file is split with dd (line 36) and individual hashes of each part are written to a temporary file. When the parts are fully processed, the resulting file is hashed again and the temp file is deleted (lines 41-42). The rest of the file is compared with bash string operations and if the hashes are the same, OK is printed and if not, FAIL is printed.


[1]: https://stackoverflow.com/questions/45421156
[2]: https://stackoverflow.com/questions/6591047
[3]: https://stackoverflow.com/questions/12186993
[4]: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html#multipart-chunksize