Thursday, December 27, 2018

I2C with Arduino: Three Mini Examples


Hi there. I noticed that I did not write anything about Arduino except one post in May 2015 where it is mentioned in one sentence. A little bit for this reason but actually to have a documentation for myself, I decided to write about Arduino. This is kind of an interim post because I could not begin to write the article I have in my mind. In this post, I will give three examples for using I2C bus with Arduino.

Arduino has a useful interface for I2C bus. I previously used I2C with PC serial port and with PIC series micro-controllers. After all, I was surprised how it's incredibly easy with Arduino. I2C bus uses two pins for communication. It is quite similar to old Akbils (short for akıllı bilet i.e. smart ticket in touch on memory (TOM) used previously in Istanbul). Two of the ICs, I used in the examples, are actually manufactured by Dallas Semiconductors (bought by Maxim) which has also manufactured TOMs used as Akbil.


Arduino UNO uses A4 and A5 pins for I2C. Their function is SDA (Serial Data) and SCL (Serial Clock) respectively. The communication in I2C happens between two devices called master and slave. Master generates the clock signal and accesses the devices by their addresses. Slave reads and/or writes data on the bus as it receives its own address in the bus. The clock signal, SCL is used for synchronization during the communication and data flows in SDA pin. I will not dig deeper into the details about the protocol (like start and end of transmission frames etc.) to keep the post short. I will start with the examples.

Sources:
https://www.arduino.cc/en/Reference/Board
https://www.arduino.cc/en/Reference/Wire
https://en.wikipedia.org/wiki/I%C2%B2C


External EEPROM with AT2432 IC
Overture: ATmega328P (i.e. Arduino UNO) has an internal EEPROM of 1KB. I learned this after I first met Arduino. I suppose no one uses this internal EEPROM due to its limited rewrite cycles. At least I did not encounter someone who uses it. It is quite easy to access this internal storage using EEPROM library (source) however I prefer an external storage.

I used Atmel's IC AT24C32. This is an 32K EEPROM consisting from 8 x 4096 bytes. Here are the datasheet from official website and same document in my Google Drive. There are three address pins like A0, A1 and A2. This means: I have written above that every slave has a I2C address. This address is 7 bits long and it is 1010XXXb for this chip. Last three bits of the address is left to the designer. If three of them are grounded the address gets 1010000b or if they all are connected to Vcc then it gets 1010111b. Thus, up to eight chips can be used on the same bus.

AT2432 EEPROM
It is obvious that I2C addresses have to be unique. In theory, 128 chips can be connected to the same bus if their addresses are different. Those chips with address pins can be used more than one in the same bus.
WP pin is used for write protection. I have grounded it because I am not going to use it. SCL and SDA pins are connected to A5 and A4 pins of Arduino respectively using 4.7K pull-up resistors. The schematic is in the next figure.
Fritzing is "almost officially" schematics editor of Arduino project. Fritzing schematics can be found in every book or project that has something to do with Arduino. I couldn't get used to it because, usually I create my schematics in PCAD and nowadays in EagleCAD. However there is currently no better option than Fritzing for Arduino schematics because it contains block diagrams of Arduino itself as well as most of its shields. Another possibility is that I could not find block diagrams for EagleCAD. In 'Breadboard' view in Fritzing, schematic can be drawn as easy as plugging the components onto a breadboard. Many Arduino books include those breadboard view schematics to look "more user-friendly". I find these schematics pretty useless, maybe because I am old fashioned. On these schematics, it is very unclear where the cables are connected or where they pass. Therefore I will put here only old style schematics. And btw, I am still learning Fritzing.

The project code is as follows:

 /* AT24C32 I2C EEPROM Example */

#define MX24C32  B1010000    // 7-bit I2C address
#include <Wire.h>

void setup()  {
  Serial.begin(9600);
  Wire.begin();
}

void loop()  {
  char hello[8] = {'H', 'e', 'l', 'l', 'o', '!', '!'};
  byte addr = 0;
  int X;
  char t;

  // This part writes 'Hello!!' on the EEPROM and it
  // has to run once. Therefore it's commented out
  //for(int i = 0; i < 7; i++)  {
  //  Wire.beginTransmission(MX24C32); // Send write command
  //  Wire.write(0x00);         // Send high addr.
  //  Wire.write(addr++);       // Send low addr. and increase var. for next char
  //  Wire.write(byte(hello[i]));     // Send the byte
  //  Wire.endTransmission();
  //  delay(100);               // wait until EEPROM is ready
  //}

  // This part sends an empty write command (without data) to
  // the address 0x0 of the EEPROM to seek at that address
  Wire.beginTransmission(MX24C32);
  Wire.write(0x00);
  Wire.write(0x00);
  Wire.endTransmission();

  delay(100);

  // 10 bytes will be read
  for(int i = 0; i < 10; i++)  {
    Wire.requestFrom(MX24C32, 1);       // request a byte from the chip
    X = Wire.available();               // is it available on the bus?
    Serial.print("X = "); Serial.println(X);
    if(X >= 1)  {
      t = Wire.read();                  // read the byte if it's available
      Serial.print("t= "); Serial.println(t);
    }
  }

  while(1);
}


I send the data to the IC, using Wire.write() function. Write command is sent by Wire.beginTransmission() function and it is followed by two bytes of address and one byte of data to be written. Wire.endTransmission() ends the I2C frame. Please refer the datasheet p.9: "A write operation requires two 8-bit data word addresses following the device address word and acknowledgment". The chip has an internal pointer, so to say. Reading occurs from where the last write operation left off. In other words, write operation has an address operand but read operation doesn't. To read from a specific address, a seek operation is done by sending a write command with address but no data. This is the general usage. More detailed information can be found on the datasheet.

Here is another similar project:
https://playground.arduino.cc/code/I2CEEPROM


Thermometer using DS1621 IC
DS1621 is an I2C thermometer integrated circuit by Dallas Semiconductors. (Note: Dallas has another chip with thermometer included, in same packaging of Akbil). This IC also supports using up to eight chips on the same bus. Tout is the temperature alarm pin. This pin is set when the temperature goes higher than the given threshold (datasheet, google drive). The schematics is as follows:
DS1621 Thermometer

There is a library for this IC but I did not used it:
https://github.com/martinhansdk/DS1621-temperature-probe-library-for-Arduino

/* DS1621 temperature sensor interfaced by Arduino
 *
 * I2C ID will be 0x48 if A0 = A1 = A2 = GND
 */

#define DS1621 B1001000
#include <Wire.h>

void setup()   {
  Serial.begin(9600);
  Wire.begin();
  Wire.beginTransmission(DS1621);
  Wire.write(0xAC);     // Send Access Config command
  Wire.write(0x02);     // Write 02 to config register
                        // Output polarity bit = 1 => Active High
  delay(10);

  Wire.beginTransmission(DS1621);
  Wire.write(0xEE);     // send "start convert" command
  Wire.endTransmission();       // Stop bit

}

void loop()  {
  byte SH, SL, X;
  // SH: High order byte of temperature
  // SL: Low order byte of temperature

  Wire.beginTransmission(DS1621);
  Wire.write(0xAA);     // Send "Read Temperature" command
  Wire.endTransmission();


  Wire.requestFrom(DS1621, 2);
  // After sending "read temperature" command
  // request two bytes from IC for temperature
  X = Wire.available();
  if(X >= 2)  {        // if 2 bytes were received
    // SH contains the integer part of the temperature
    SH = Wire.read();
    // SL contains 0x80 for 0.5 degree celcius
    SL = Wire.read();
    Serial.print(SH, HEX);
    Serial.print("    ");
    Serial.println(SL, HEX);
  }
  else
    Serial.println(X);
  //Wire.endTransmission();

  delay(500);

}
 
0xEE and 0xAC are the commands specific to this IC. The list of all commands can be found on the tenth page of the datasheet. During the initialization, method of operation is written to the config register. For example 0x02 means One Shot mode = 0 and Polarity = 1. 0xEE command starts the temperature conversion cycle and using 0xAA command, temperature values are pulled to the bus from the IC. Although there is a busy flag in config register, I implemented busy waiting with delay() function for simplicity.

There is an old project where this IC is connected to a PC through the serial port. I will explain this project in one of the following posts.






Real Time Clock (RTC) with IC DS1307

DS1307 is a RTC chip again by Dallas Semiconductors. This IC does not have any address bits because it is meaningless to use more than one RTC in same circuit. There is a VBAT input pin for battery supply. The clock continues to tick on battery even if Arduino is powered off. I had made a note to myself "do not connect the battery to Vcc" but I cannot remember why. This IC can be thought as counter with 64 byte of RAM. The counter increases seconds, if seconds overflow minutes are increased, if minutes overflow hours are increased and so on.

DS1307 Real Time Clock
There are ready-to-use kits of this IC and even kits with this IC and AT2432 (the IC, I discussed in the first example) on the same board. There is also a library for this IC. I wrote my own code but I borrowed two function from this library.


This IC requires a 32.768 KHz crystal oscillator to run. X1 and X2 pins are connected to this oscillator. Pin 7 is square wave output. It is floating just because it is unused. There are plenty of comments in the code:


/* The code for DS1307 RTC circuit. Do not connect the battery
 * to Vcc. Do not forget the pull-up resistors on SDA and SCL
 */

#include <Wire.h>
#define DS1307 B1101000

// These two functions were borrowed from the source
// code of RTC library of jeelabs:
// https://jeelabs.org/2010/02/05/new-date-time-rtc-library/
static uint8_t bcd2bin (uint8_t val) { return val - 6 * (val >> 4); }
static uint8_t bin2bcd (uint8_t val) { return val + 6 * (val / 10); }

void setup()  {
  Serial.begin(9600);
  Wire.begin();

  // Begin initialization
  Wire.beginTransmission(DS1307);
  Wire.write(0);        // send first "0" as an address byte
  Wire.endTransmission();

  Wire.requestFrom(DS1307, 1);
  int ss = Wire.read(); // read a character
  Serial.println(ss);
  // End initialization
  // If the chip returns a character then it's working


  /*   //  Set the clock first if it is not set yet
  Wire.beginTransmission(DS1307);
  Wire.write(0);
  Wire.write(0x0);    // seconds
  Wire.write(0x21);   // minutes
  Wire.write(0x0);    // hours
  Wire.write(0);      // day of the week
  Wire.write(0x24);   // day
  Wire.write(0x12);   // month
  Wire.write(0x18);   // year
  Wire.write(0x10);   // config register: SQ Wave out @1Hz
  Wire.endTransmission();
  // */

}

void loop()  {
  Wire.beginTransmission(DS1307);
  Wire.write(0);        // send "0" as an address byte
  Wire.endTransmission();

  Wire.requestFrom(DS1307, 7);  // Read 7 bytes from RTC
  /* Those 7 bytes are:
   * 00H: CH Bit + Seconds BCD (Bit 7 of 00H is the Clock Halt
   *      (CH) bit. It stops the oscillator when set.
   *      "Please note that the initial power-on state of all
   *      registers is not defined. Therefore, it is important
   *      to enable the oscillator (CH bit = 0) during initial
   *      configuration.")
   * 01H: Minutes BCD
   * 02H: Hours BCD (Bit 6 of the hours register selects 12H
   *      mode when set and in 12H mode, bit 5 represents AM
   *      when reset and PM when set. If bit 6 is reset, 4.
   *      and 5. bits are tens place of hours in 24H mode.
   * 03H: Week of the day. Unused in this code.
   * 04H: Day of the month BCD
   * 05H: Month (BCD)
   * 06H: Year (BCD)
   * 07H: Control register
   */

  // Read the seconds but discard CH bit:
  uint8_t ss = bcd2bin(Wire.read() & 0x7F);
  uint8_t mm = bcd2bin(Wire.read());   // Read minutes
  uint8_t hh = bcd2bin(Wire.read());   // Read hours in 24H
  Wire.read();                         // ignore the week day
  uint8_t d = bcd2bin(Wire.read());    // day
  uint8_t m = bcd2bin(Wire.read());    // month
  uint16_t y = bcd2bin(Wire.read());   // year

  Serial.print(d); Serial.print(".");
  Serial.print(m); Serial.print(".");
  Serial.print(y); Serial.print("   ");
  Serial.print(hh); Serial.print(":");
  Serial.print(mm); Serial.print(":");
  Serial.println(ss);

  delay(1000);  // read every second.
}

Thursday, November 29, 2018

How to Define Special Characters on LCD


Hi there. This post is somehow a continuation of previous LCD post however not related with hardware but only software. I will address the problems, mentioned in the previous post and mention how to define special characters on LCD. Although, this is really simple to do in high level platforms such as Arduino using libraries, unfortunately there are only few resources on the Internet about that. 

I will try to follow the sequence of events in the previous post.


Problem with the Pins
I could not write any character to the display initially. It came to my mind that the characters were actually written on the display but I couldn't see them. I had experienced this a long time ago. It's simple enough to connect the LCD contrast pin (3) to the ground over a resistor with 4.7K or with 2.2K value. I was going to do like that. I was still thinking like "it is working without any problem but I could not see it", therefore I changed the resistor with a potentiometer in order to adjust the contrast. Normally, backlight is also not needed in green LCDs. Therefore, I chose a green one. Anyway, I connected pin 15 of LCD to Vcc and pin 16 of LCD to GND.

The problem was still not solved, although I connected everything without any shortcuts. I have changed the line outb(0, BASE) at the end of the code to outb(255, BASE). I checked the voltage on the data pins but everything was fine with them. I increased the delay parameter to 3 seconds in order to read signals with multimeter and checked E and RS pins. E was flipping between high and low as expected but there was nothing on RS except plain high signal. Pin connections I used, are as follows:

PP Signal
DB25 Pin
Centronics Pin
IC In
IC Out
LCD Signal (Pin)
nStrobe(C0)
1
1
IC2_17
IC2_3
E (6)
nSelect (C3)
17
36
IC2_15
IC2_5
RS (4)
Data0 (D0)
2
2
IC1_2
IC1_18
D0 (7)
Data1 (D1)
3
3
IC1_4
IC1_16
D1 (8)
Data2 (D2)
4
4
IC1_6
IC1_14
D2 (9)
Data3 (D3)
5
5
IC1_8
IC1_12
D3 (10)
Data4 (D4)
6
6
IC1_17
IC1_3
D4 (11)
Data5 (D5)
7
7
IC1_15
IC1_5
D5 (12)
Data6 (D6)
8
8
IC1_13
IC1_7
D6 (13)
Data7 (D7)
9
9
IC1_11
IC1_9
D7 (14)

The source article, I used during the assembly of the circuit, shows pin 13 for nSelect signal. I misunderstood this part, because pin 13 is actually an input pin. In another source [ https://www.lammertbies.nl/comm/cable/parallel.html ], which I usually used, port directions were also drawn. I was trying to get an output from pin 13 not pin 36 and since pin 13 is an input pin it was always set. BTW, the width of the plastic part on the both ends of jumper cables (colored ones) is bigger than the distance of the pins in Centronics port. Therefore, I used wires in even numbered ports and jumper cables in odd numbered ports (example). First, I suspected a contact problem but if this was the case, the code would produce different results on each execution. I will also mention the relationship of signals and ports in further chapters.

By the way, there is a video on LCDs, that there is no upper limit for DELAY parameter:


The Problem with the Function lcdKomut()
This function was initially written as follows: 

void lcdKomut(unsigned char veri)    {
    outb(veri, BASE);
    outb(8   , CTRL);    // RS = 0; E = 1
    usleep(DELAY);
    outb(9   , CTRL);    // RS = 0; E = 0
    usleep(DELAY);
}

For some reason, each time after I sent a command with this function, following data was printed twice. For example, as I tried to print 'Testing' in first line and '123' in the second, it was actually printed 'TTesting' and '1123'. I thought, this caused because the parallel port controller cannot reset the pin fast enough. Even E pin is still not zero, the CPU was too fast compared to the controller and sends the data to the bus immediately. But I have observed the same behavior even with bigger values of DELAY parameter. I changed the value 9 to 1 in the code and problem was fixed. There is an image, I like about these situations:


Since the problem was solved, I didn't want to investigate further. Maybe I should examine incoming signals with an oscilloscope but as I said, I didn't want to deal with it.


Status and Control Signals of Parallel Port
Speaking of signals, I need to briefly mention status and control signals. Parallel port has three groups of signals: data, status and control. These signals are controlled by the BASE, BASE+1 and BASE+2 I/O ports, respectively. Data signals are simple: The parallel port pins between 2 and 9 are set/reset according to the byte written to the BASE port and they remain. Status pins can be used for input because they are designed to read the printers status, i.e. printer generates an interrupt request, runs out of paper or has a paper jam etc.

nStrobe and nSelect pins are in control signal group. These control signals exist to control the printer and they are "active low" signals. Therefore, the name of the pin is prefixed by "n" or "~". nStrobe is used as a clock signal when computer is transmitting data to printer. The computer has to set this signal each time the data pins are changed. nSelect is set when the printer is selected (to print). nStrobe and nSelect pins were controlled using 0. and 3. bits of the control register, which is mapped to 0x378 + 2 = 0x37A I/O port. nStrobe is reset by writing 1 to the control port and nSelect is reset by writing 8 to the control port.

LCD Commands and Defining Special Characters
I used some of the LCD commands without mentioning all of them. 3-4 commands are really enough while working on LCDs. I usually use this image working on LCDs: https://goo.gl/images/QHtKee. Many similar results can be found in Google images by searching "lcd commands". The most proper way, is always relying on the datasheet of LCDs however reading an 60 pages document to build some device quickly, is impossible. I uploaded a generic datasheet to my Google drive and will use it to program CGRAM and create my own character set in LCD.

LCD fonts are stored CGRAM and CGROM (character generator RAM). There is a DDRAM, which stores the codes of the characters shown in the LCD cells. When a data is written to DDRAM, CGRAM/CGROM is used as a look-up table to demonstrate the characters. Since CGRAM is writable, it allows the users to create their own character set. There are 6 bits reserved for addressing CGRAM according to the datasheet but I am not sure that all of these bits are used in a standard LCD. CGRAM address starts with the zero.
On the page 24 of datasheet, table of the LCD commands can be found on Table 6. I first need to specify with a command, which character I will write. To address CGRAM, I need to issue 0b01XX XXXX command. Those Xs are the address bits of the character. Therefore the command is 0x40 for the zeroth address. According to page 31, the data can be send with lcdVeri() function. The character to be defined, must be sent as 5-bit-wide bitmap data. This information is on the previous pages in datasheet. One important thing is, LCD has to be disabled while writing CGRAM.

I slightly changed my old code to define characters. Definitions, lcdVeri() and lcdKomut() functions are same. I have pasted main() below:

int main(int argc, char* argv[])    {
    int i;

    if(ioperm(BASE, 3, 1))    {
        fprintf(stderr, "Access denied to %x\n", BASE);
        return 1;
    }

    lcdKomut(0x38);    // 8 bit, 2 lines, 5x7 px
    lcdKomut(0x08);    // disable lcd

    const unsigned char specialchars[] = {
        // something like smiley
        0B01110, 0B10001, 0B11011, 0B10001,
        0B11011, 0B10101, 0B10001, 0B01110,
        // inverse of smiley
        0B10001, 0B01110, 0B00100, 0B01110,
        0B00100, 0B01010, 0B01110, 0B10001,
        // spades
        0B00100, 0B01110, 0B11111, 0B11111,
        0B10101, 0B00100, 0B01110, 0B00000,
        // clubs
        0B00000, 0B01110, 0B10101, 0B11111,
        0B10101, 0B00100, 0B01110, 0B00000,
        // heart
        0B00000, 0B00000, 0B01010, 0B11111,
        0B11111, 0B01110, 0B00100, 0B00000,
        // tile (diamond)
        0B00000, 0B00100, 0B01110, 0B11111,
        0B11111, 0B01110, 0B00100, 0B00000
    };



    lcdKomut(0x40);   // 0. CGRAM address
    for (i = 0; i <= 47 ; i++)
      lcdVeri(specialchars[i]);

    lcdKomut(0x01);    // clear the screen
    //lcdKomut(0x80);    // linefeed
    lcdKomut(0x0F);    // enable screen, cursor blink

    lcdVeri('D'); lcdVeri('e'); lcdVeri('n');
    lcdVeri('e'); lcdVeri('m'); lcdVeri('e');
    lcdKomut(0xC0);    // second line
    lcdVeri('1'); lcdVeri('2'); lcdVeri('3');
    lcdVeri(' '); lcdVeri(' '); lcdVeri(' ');

    // special characters
    lcdVeri(0x00); lcdVeri(0x01); lcdVeri(0x02);
    lcdVeri(0x03); lcdVeri(0x04); lcdVeri(0x05);
 
    outb(0, BASE);
    return 0;
}

With this code above, special characters are defined and printed on the display:

Thursday, November 8, 2018

Calculator the Game and Its Solution


Hi there. After a long break, this post is again about a game. I will talk about the game and use Matlab.


I had downloaded "Calculator: The Game" in the beginning of summer. The aim of the game is, to start with a given number and to reach to the "goal" number in a limited number of operations. Here is a video of that game:  https://www.youtube.com/watch?v=w5yyY341-4A. This game was downloaded 5M+ times at that time, so I can assume that it is a famous game. The screenshot, left, is from the first level. This level starts with zero. The allowed operation is +1 and the aim is obtaining 2 in two moves. Since there is only one operation, it is obvious that the solution is pressing +1 two times.

Level 30
In advanced levels, more complicated operations are available. For example, '<<' operation gives 432 from 4321. Violet buttons with numbers on them like in the screenshot right, append numbers to the actual number. Pressing violet 5 on 432, implies 4325. Orange conversion buttons, seen on the same screenshot right again, replaces 1s to 2 ( 1 => 2 ) and 2s to 3
( 2 => 3 ).  The solution of level 30, in the rightmost screenshot, is  pressing: 1, 2, 2=>3, 1=>2, 2, 1. SUM button gives the sum of the digits of the number. Inv10 subtracts each digit from 10. [+]1 works on the calculator buttons and adds 1 to the values on the buttons. Store, stores the number in memory, in order to use the same more than once. Shift, corresponds rotation operation in assembly. Mirror, appends the reversed digits of a number to its left. 

Examples:
4325 (SUM) -> 14 (SUM) -> 5
4325 (Inv10) -> 6785
4325 (Shift>) -> 5432 (<Shift) -> 4325
25 (Mirror) -> 2552 -> (Mirror) 25522552

There are also portals in the game which are not buttons. For example, if there is a portal from hundreds place to one's place and if the result of an operation is greater than 100, the portal adds the digit on the hundreds place to one's place and removes the digit from hundreds place.

Bölüm 60
I played this game for a long time and noticed that a "trial and error" approach on all combinations is more practical than thinking, in specific levels. For example, let's consider level 30 (the screenshot above). The solution is quite easy. There are 46 = 4096 button combinations in total, since there are 4 buttons and six moves. For this example, thinking is more practical. On the other hand, let's consider 60th level, in the screenshot on the right. This level looks like harder than 30th level however pressing on two keys in five moves makes 25 = 32 different combinations. In this level, brute force technique is much more simpler than thinking.

Not using a computer is unthinkable, when brute force approach is considered. So that, even though 4096 may seem like huge, this can be tried in a computer in a few seconds. The most important is to approach the problem correctly.

If the operations would be written in a single function, they could not be called in different sequences, in a linearly running code. The button operations should be defined as functions and there must be as many functions as the number of keys. Or in other words, each key should be written as a function.

How can the keys be pressed? For example, even though there are five different keys in a level, pressing only one of them consequtively can be a solution. Or pressing first key and then second key and then first again etc. Let's assume the keys are numbered and pressing sequence does not matter. Since there are five button, five digits will be used for numbering. Let's also assume that number of moves are three for this theoretical level. Pressing the first key consequtively corresponds to 111, pressing first three keys in a sequence corresponds to 123. If generalized, it's used as many numerals as the number of keys and the number corresponding to a key sequence consists from as many digits as the number of moves. These are some examples to the valid combinations: 125, 134, 225, 334, 415, 531, 555. Since there are 53 = 125 combinations in total, there is no need and space to count them all. In the meantime, if 0 is used instead of 5 when numbering the keys, the relation to the 5-based numbering system can be noticed easily.

125 means to press 1st, 2nd and 5th keys or calling first then second and then fifth function respectively. So, how should the functions be called in a certain order according to the value of the number? The easiest way to do this is to work with function pointers, not the functions itself. This might sound a bit complicated. Speaking of complicated, this is done in C as follows:
#include<stdio.h>

void fonk1(int i)    {
  fprintf(stdout, "Fonk1: %d\n", i);
}

void fonk2(int i)    {
  fprintf(stdout, "Fonk2: %d\n", i);
}

void fonk3(int i)    {
  fprintf(stdout, "Fonk3: %d\n", i);
}


int main(int argc, char* argv[])    {
  void (*fparray[3])(int i);
  int i;
 
  fparray[0] = fonk1;
  fparray[1] = fonk2;
  fparray[2] = fonk3;
 
  for(i = 0; i < 3; i++)    {
    (*fparray[i])(i);
  }
 
  return 0;
}

In the example above, three simple functions were defined. A void function pointer array with three elements is created in the main() function and pointers of fonk1, fonk2 and fonk3 are assigned to this array. Considering that, the name of a function in C is the pointer to itself at the same time, no operator (*, &) is required for assignments. In other words, the name of the function contains the address of the first command of function. In for loop, elements of the array is called one-by-one with the argument 'i'.
Level 192
Function pointers in C, are hard to understand (at least for me). It is confusing which parentheses should be where. Instead of C, I used Matlab for the solution. It is quite easier to code this in Matlab. Similar to the & operator in C, there is @ operator in Matlab for function pointers. To store the function pointers, I used cell arrays instead of matrices and feval function to evaluate the button function with given argument. Instead of dealing with number systems in the code, I used nested for loops as many as the number of functions.

For example, 192nd level was hard. I had to code for this level. There are five buttons and six moves, which makes so many combinations. There is a portal from thousands position to one's position. First, I started to write functions to the buttons. Please note that, the functions in Matlab must be saved in files with the same name of the function. First function is in the file tus1.m, second function is in the file tus2.m etc.

function deger = tus1(deger)
  % +8
  deger = deger + 8;
endfunction

function deger = tus2(deger)
  % *4
  deger = deger * 4;
endfunction

function deger = tus3(deger)
  % Inv10
  s_deger = int2str(deger); % convert number to string
  for k1 = 1:length(s_deger)
    if (s_deger(k1) != '0')
      s_deger(k1) = int2str(10 - str2num(s_deger(k1))); % subtract the characters from 10
    endif
  endfor

  deger = str2num(s_deger); % convert the string back to number

endfunction

function deger = tus4(deger)
  % append 9
  deger = deger * 10 + 9;
endfunction

function deger = tus5(deger)
  % 7 => 0
  if( floor(deger / 100) == 7 )
    deger = deger - 700;
  endif
 
  if( mod(floor(deger / 10), 10) == 7 )
    deger = deger - 70;
  endif
 
  if(mod(deger, 10) == 7)
    deger = deger - 7;
  endif
 
endfunction

I considered the portal as a separate function, accepting the value, returned from any button function, as an argument. The result of a function, enters to the portal function and  exits  arranged. I named the portal function as "girdicikti":

function deger = girdicikti(deger)
  if(deger > 999)
    binler = floor(deger / 1000);
    deger = mod(deger, 1000) + binler;
    if(deger > 999)
      girdicikti(deger)
    endif
  endif
endfunction

Main function is as follows:

ilkdeger = 189; % ilkdeger means initial value

% Function array
f_a = { @tus1, @tus2, @tus3, @tus4, @tus5 };

for k1 = 1:5
  for k2 = 1:5
    for k3 = 1:5
      for k4 = 1:5
        for k5 = 1:5
          %for k6 = 1:5
           
            step1 = girdicikti( feval (f_a{k1}, ilkdeger) );
            step2 = girdicikti( feval (f_a{k2}, step1) );
            step3 = girdicikti( feval (f_a{k3}, step2) );
            step4 = girdicikti( feval (f_a{k4}, step3) );
            step5 = girdicikti( feval (f_a{k5}, step4) );

            if(step5 == 500)
              printf("%d %d %d %d %d\n", k1, k2, k3, k4, k5);
              break
            endif
          %endfor
        endfor
      endfor
    endfor
  endfor
endfor

The 'ilkdeger' variable enters to a function. As a result of a key press (function), ilkdeger becomes step1. Another key press results in step1 being step2... Interestingly, this level is solved in five steps instead of six. The solution took about 15.96 seconds in my computer. This program produces two different outputs:

Level 199
2 5 4 1 5: 189 (x4) -> 756 (7 => 0) -> 056 (9) -> 569 (+8) -> 577 (7 => 0) -> 500
4 2 1 4 2: 189 (9) -> 900 (x4) -> 603 (+8) -> 611 (9) -> 125 (x4) -> 500

I wrote another code for the level 199. There are 46 = 4096 button combinations in this level. Since the third button is them same button as the previous example, I skipped it below:

function deger = tus1(deger)
  % append 7
  deger = deger * 10 + 7;
endfunction

function deger = tus2(deger)
  % 3 => 5
  deger = str2num(strrep(num2str(deger), '3', '5'));
endfunction

function deger = tus4(deger)
  % shift >          3002 => 2300
  if(deger > 999)
    deger = mod(deger, 10) * 1000  +  floor(deger / 10);
  elseif(deger > 99)
    deger = mod(deger, 10) * 100  +  floor(deger / 10);
  elseif(deger > 9)
    deger = mod(deger, 10) * 10  +  floor(deger / 10);
  endif
endfunction

The portal from ten thousands place to one's place: 

function deger = girdicikti(deger)
  if(deger > 9999)
    onbinler = floor(deger / 10000);
    % strip ten thousands place
    deger = mod(deger, 1000) + onbinler;
    % add to ones place
    if(deger > 9999)
      girdicikti(deger)
    endif
  endif
endfunction

In the main function, unlike the previous example, I assigned the number of functions to lfa variable and ran the for loops up to this variable:
clc
clear

ilkdeger = 3002; % ilkdeger means initial value

f_a = { @tus1, @tus2, @tus3, @tus4 };
lfa = length(f_a);

for k1 = 1:lfa
  for k2 = 1:lfa
    for k3 = 1:lfa
      for k4 = 1:lfa
        for k5 = 1:lfa
          for k6 = 1:lfa
            step1 = girdicikti( feval (f_a{k1}, ilkdeger) );
            step2 = girdicikti( feval (f_a{k2}, step1) );
            step3 = girdicikti( feval (f_a{k3}, step2) );
            step4 = girdicikti( feval (f_a{k4}, step3) );
            step5 = girdicikti( feval (f_a{k5}, step4) );
            step6 = girdicikti( feval (f_a{k6}, step5) );
            if(step6 == 3507)
              printf("%d %d %d %d %d %d\n", k1, k2, k3, k4, k5, k6);
              return
            endif
          endfor
        endfor
      endfor
    endfor
  endfor
endfor

This level is solved in six moves. The solution is "1 1 2 3 4 1" and it took approximately 0.68 seconds:
3002 (7) -> 30 (7) -> 307 (3 => 5) -> 507 (Inv10) -> 503 (Shift>) -> 350 (7) -> 3507

This level finishes the game and ending video starts. This approach, described in the post, can be applied to many levels of this game. But it is difficult to apply the keys like Store and more importantly keys like [+]2. For the latter, it should be possible to define a global increment value and modify the button functions according to this value. Since I had no difficulty with the levels containing such keys, I didn't have to write any code for these.

Tuesday, October 30, 2018

Let's Amplify 3.3V TTL Signals to 5V for Character LCD


Hi there. In my previous post, I have told that the events, I experienced, were loosely connected but it was partially true. I needed the desktop computer, I have prepared in the previous blog, for its parallel port. Unfortunately, it's impossible to find a computer with parallel port nowadays. If you need a parallel port, you have to buy those PCI cards, manufactured specifically for this purpose. USB to parallel port adapter could not provide standard parallel port interface. What I mean with "standard parallel port interface" is 0x378 or 0x3BC I/O port of course. The parallel port was needed for a LCD circuit. I will mention a bit of everything in this blog. By the way, my favorite port is 0x3BC for parallel interface however my BIOS does not support IO port selection for parallel interface. So, I had to use 0x378.



Before using the parallel port, I have checked on BIOS that it is enabled and working in a appropriate mode. In fact, I don't have a specific need as a port mode. So, appropriate mode means even "Output only" if the port is enabled.

I have installed CentOS to this machine and checked the dmesg output for parallel port:
[root@xxxxxxxx ~]# dmesg | grep -i parport
parport_pc 00:08: reported by Plug and Play ACPI
parport0: PC-style at 0x378, irq 7 [PCSPP]


The port mode is between the square brackets. My port works as a standard parallel port (SPP). For bidirectional mode, it should be [PCSPP, Tristate]; for Enhanced Parallel Port (EPP), should be [PCSPP, Tristate, EPP] and lastly, for Extended Capabilities Port (ECP), it should be [PCSPP, Tristate] with auxiliary I/O port in parentheses. SPP works unidirectional, and as far as I can remember, with 50KB/s. Bidirectional mode has the same speed and it works in both directions as the name suggested. EPP or ECP works from 500KB/s to 2MB/s however these are out of the scope of this post. Therefore I don't want to go deeper into the details.

This is the schematics, that I want to build:
Circuit schematics (originally from: http://www.aljaz.info/elektro/lcd/lcd-lpt.htm)
I did not follow the article in the link. I was using another source (printed book) and the schematic on the book was the same except the values of pull-up resistors. I am not really sure which source is original. Of course it is not surprising, everyone has built a very similar circuit because the project is very simple indeed.
After I built the circuit physically and run the code*, which is supposed to write something on the LCD. But there was no output on LCD at all. I have tried several things but had the same result each time. I have decreased character per second rate to one (rate of speed is measured in terms of characters per minute (cpm) for printers, historically). Anyway, as the last effort, I measured the voltage on the port and found the issue. The port was working on 3.3V. Actually, to make sure, I had to check on the datasheet of LCD that 3.3V is in the unstable region of TTL signal, but who cares. I did not do that.
(*): Please find this code near the bottom of the post.



This result is not surprisingly, considering that last time, I played with the parallel port on a Celeron 366 machine. I used the code snipped below for testing:

#include<stdio.h>
#include<sys/io.h>

#define BASE 0x378

int main(int argc, char* argv[])        {
    if(ioperm(BASE, 1, 1))      {
        fprintf(stderr, "Access denied to %x\n", BASE);
        return 1;
    }
    outb(1, BASE);
    return 0;
}

https://commons.wikimedia.org/wiki/File:IEEE_1284_36pin_plughead.jpgThis code sets the least significant signal and quits. The least significant bit (or signal) corresponds to the second pin on both the DB25 parallel port and the Centronics port. BTW, the term "Centronics port" actually means the IEEE 1284 bidirectional parallel port standard which is developed by the company Centronics. I have meant the 36-pin connector not the standard. This connector can also be seen on my picture above along my multimeter.

The real name of this connector (pictured) is Micro Ribbon Connector with 36 pins and is used for printers. I may keep calling it Centronics port because of my habit.


If the value 255 is sent to the I/O port, instead of 1, to avoid confusing the pins +3.3V can be measured from all data pins [2..9].

74HC244 Connection diagram
As I searched on the internet, especially CNC people are complaining about this voltage difference. In 6502.org forum, it was mentioned that by using small valued pull-up resistors (between the signal and Vcc) like 4.7K or even 1K the voltage can be increased to 5V [ here ]. But this did not sound safe to me. It is dependent to the impedance of the circuit and parallel port itself. Instead of that, I thought amplifying the input using a 8-bit tri-state buffer circuit. I used 74HC244 for it. According to its datasheet, minimum high level input voltage is slightly lower than 3.3V in normal operating conditions. This means 3.3V from the parallel port will work as a high level TTL signal on the input of IC and I could get 5V signal on the output.

I have built a very simple circuit, whose schematics is given above. Since I was not using the enable pin of IC, I have connected it to the ground. I have connected a 100nF bypass capacitor. Since I operated the LCD in 8 bit mode, I needed 10 bits including RS and E pins. Therefore, I have used two 74HC244s (maybe RS and E pins are not very susceptible to TTL voltage levels like data ping and +3.3V would work fine on them which is unlikely but worth to try anyway). I got +5V from the hard disk connector and GND from the chassis of parallel port. I made the first experiment with a single pin and the result was positive.



The twisted pair cable in the picture above carries +5V. This is how I threat old CAT5 cables. I was too lazy to put the 100nF bypass capacitor on the schematic on the breadboard. The input and output voltages can be seen in the upper and lower part of the picture above, respectively. Now I can connect all the pins to LCD through the integrated circuits.





The final circuit is in the picture above. My original plan was fixing the contrast with a resistor and not using the background lightning to keep the circuit simple. However, I experienced a problem during building the circuit because I have incorrectly connected on of the pins (noticed later). I have connected all these parts to troubleshoot whether I could not see the characters or the data is still not consistent. And I did not unplugged them after I found the problem. The code is below:

#include<stdio.h>
#include<sys/io.h>

#define BASE 0x378    // Base port address
#define CTRL 0x37A    // Control port address
#define DELAY 3000    // for busy waiting

void lcdKomut(unsigned char veri)    {
    outb(veri, BASE);
    outb(8   , CTRL);    // RS = 0; E = 1
    usleep(DELAY);
    outb(1   , CTRL);    // RS = 1; E = 0
    usleep(DELAY);
}

void lcdVeri(unsigned char veri)    {
    outb(veri, BASE);
    outb(0, CTRL);    // RS = 1, E = 1
    usleep(DELAY);
    outb(1, CTRL);    // RS = 1, E = 0
    usleep(DELAY);
}


int main(int argc, char* argv[])    {
    if(ioperm(BASE, 3, 1))    {
        fprintf(stderr, "Access denied to %x\n", BASE);
        return 1;
    }

    lcdKomut(0x38);    // 8 bit, 2 lines, 5x7 px
    lcdKomut(0x01);    // clear the screen
    //lcdKomut(0x80);    // linefeed
    lcdKomut(0x0F);    // enable screen, cursor blink

    lcdVeri('D'); lcdVeri('e'); lcdVeri('n');
    lcdVeri('e'); lcdVeri('m'); lcdVeri('e');
    lcdKomut(0xC0);    // ikinci satir
    lcdVeri('1'); lcdVeri('2'); lcdVeri('3');
 
    outb(0, BASE);
    return 0;
}


Control register of the parallel port is the base (data) port+2. Therefore, it would be better to define CTRL as "#define CTRL BASE+2". DELAY parameter is the time required for the LCD screen to process the commands in microseconds. As I decreased this parameter to 300µs, I had no issue with this code but I had issues with different codes. I found this value of 3000 by trial and error where I had no issue with any code, I tried. If I would be using the R/W pin, I could read the busy flag of LCD on bit D7. There is no upper limit for DELAY parameter.

lcdKomut() is a function for sending command and lcdVeri() is a function for sending data. I wrote the pin values during the execution of these functions in the code comments. It is more important to have E = 0 then the value of RS pin. If E pin is not reset immediately, the data that comes after the command was sent twice, which I could not understand. I am noting this issue to discuss it detailed in a further post.


In the main() function, I sent the fundamental commands to LCD. I commented out 0x80 command because 0x01 contains 0x80 implicitly AFAIK. I sent the characters to write on the screen using lcdVeri() function. The cursor goes further by itself as long as a character is received. 0xC0 command sets the cursor to beginning of second line. If there would be more than two lines on LCD, different commands should be used.

I am now finishing the post because it is about the TTL voltage values. I will publish an addendum about the pins, about the issue I had at the end and other LCD commands.

Tuesday, July 24, 2018

Running Huawei E5573 Mobile Wifi Under CentOS (Multimode USB)


Hi there. In this post and next two posts, I will discuss three recent events I had. They are somehow irrelevant (or loosely connected) with each other. This one is about multi mode USB devices. Next one will be about TTL signal voltage levels in LCDs and last one will be about a game. I will discuss configuration of multi mode USB devices here.

The device is Huawei E5573 mobile WiFi access point. It can be used as a WiFi repeater as well as 4G modem by inserting a SIM card. Detailed product information can be found here: https://consumer.huawei.com/lk/support/mobile-broadband/e5573/

Before 2-3 years, I preferred to use my cell phone as a WiFi Hotspot because USB dongles were required to install 3rd party software on my computer. The disadvantage is the disconnection when I receive a call. But only in 3G network has this issue.

Huawei E5573 Wifi Access Point

This device is installed as a network adapter (ethX) upon connection in Linux Mint. This is called USB tethering. This feature can be useful with linux installations without WiFi drivers like minimal CentOS. This can also be used to emulate network adapter with desktop machines without a WiFi adapter. In my case, I had installed CentOS minimal on a desktop machine without a WiFi adapter.

Another feature of this device was, which I did not care about previously, when I plug this device to another USB port or tried to use it with Windows, it was installed as CD driver which contains Windows drivers of itself (ethernet). But I was using it as WiFi access point, so actually I had no need to plug it in my computer (except charging). I noticed the tethering feature by chance and I never needed to use it in Windows. One day, I needed it for the desktop machine, mentioned above, but I could not install it as a network adapter in CentOS.

First, I thought that the tethering drivers are not installed because it was minimal. As I plugged the device to my Linux Mint computer, I saw cdc_ether driver is loaded.
[239053.520493] usb 1-2: new high-speed USB device number 9 using xhci_hcd
[239053.650038] usb 1-2: New USB device found, idVendor=12d1, idProduct=1f01
[239053.650045] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[239053.650048] usb 1-2: Product: HUAWEI_MOBILE
[239053.650051] usb 1-2: Manufacturer: HUAWEI_MOBILE
[239053.650054] usb 1-2: SerialNumber: 0123456789ABCDEF
[239053.734303] usb-storage 1-2:1.0: USB Mass Storage device detected
[239053.734728] scsi host16: usb-storage 1-2:1.0
[239054.738238] scsi 16:0:0:0: CD-ROM            HUAWEI   Mass Storage     2.31 PQ: 0 ANSI: 2
[239054.739637] sr 16:0:0:0: [sr1] scsi-1 drive
[239054.741976] sr 16:0:0:0: Attached scsi CD-ROM sr1
[239054.742188] sr 16:0:0:0: Attached scsi generic sg2 type 5
[239054.811184] systemd-udevd[7829]: Failed to apply ACL on /dev/sr1: No such file or directory
[239054.811195] systemd-udevd[7829]: Failed to apply ACL on /dev/sr1: No such file or directory
[239054.825401] usb 1-2: USB disconnect, device number 9
[239055.264962] usb 1-2: new high-speed USB device number 10 using xhci_hcd
[239055.394110] usb 1-2: New USB device found, idVendor=12d1, idProduct=14db
[239055.394116] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[239055.394119] usb 1-2: Product: HUAWEI_MOBILE
[239055.394122] usb 1-2: Manufacturer: HUAWEI_MOBILE
[239055.547598] cdc_ether 1-2:1.0 eth1: register 'cdc_ether' at usb-0000:00:14.0-2, CDC Ethernet Device, XX:XX:XX:XX:XX:XX
[239055.584282] cdc_ether 1-2:1.0 eth1: kevent 12 may have been dropped
[239055.599362] cdc_ether 1-2:1.0 eth1: kevent 12 may have been dropped

[239055.601286] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[239055.721397] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[239055.721411] cdc_ether 1-2:1.0 eth1: kevent 12 may have been dropped


But dmesg output in CentOS is below:



I couldn't see cdc_ether driver in lsmod output above. It is actually normal that I couldn't see because I didn't plug the device. I have manually loaded the driver using modprobe and after that, I plugged the device again however no new interface was created. I searched the driver name in google and checked the links. First result was this link: https://www.linuxquestions.org/questions/linux-networking-3/how-do-i-configure-a-cdc-ethernet-device-a-4g-usb-dongle-835856/ . There, the creator of the topic had same -or very similar- issue as I had and the answer for it was very important. As I learned from this answer, the name of those kind of devices is "USB dual mode" or "multimode" in general. It can contain more than one device class in a single USB connection. More important is the this link to usb_modeswitch utility, in that answer. Under this link, there is an utility to use multimode USB devices under linux.

I have downloaded the application in a different computer, because, I still don't have internet in my desktop yet. I copied what I have downloaded, to an USB disk. libusb1-devel package is required for the compilation. Additionally, I had downloaded libusb-devel, too. usbutils package is also required for the lsusb command. I have downloaded all those packages in the same USB disk, unmounted from the computer and plugged the disk to my desktop, installed prerequisite packages. After that, everything needed is make & make install.

There are two mandatory parameters for usb_modeswitch:
-v vendor ID of original mode (mandatory)
-p product ID of original mode (mandatory)

There are two optional but important parameters, too:
-V target mode vendor ID (optional)
-P target mode product ID (optional)

I found the vendor ID and product ID using lsusb. These IDs will be used in conjunction with -v and -p parameters respectively. As it can be seen in the output below, USB ID of the CD driver is 12d1:1f01:


Instead of that, the output is below in Mint:

Bus 001 Device 010: ID 12d1:14db Huawei Technologies Co., Ltd. E353/E3131.

I saved the dmesg, lsusb and lsusb -v output from Mint, into the text files in the USB disk and returned back to CentOS. I knew, which IDs will be given to -V and -P but I issued the command below and got the following error:

usb_modeswitch -v 12d1 -p 1f01 -V 12d1 -P 14db
[SNIP]
Warning: no switching method given. See documentation.

Segmentation fault.

I checked usb_modeswitch --help output, I noticed three parameters related with Huawei devices: -H, -J and -X. Because there are only three and without any explanation, I decided to use trial and error method. Thus, I found that -J parameter is what I needed. BTW, -V and -P is also not required explicitly. The utility could find the target mode by itself.


It is obvious that the application has some issues due to the "Segmentation fault" error. However the device could now work perfectly as a new ethernet card.

Saturday, June 30, 2018

Matrix Operations using LAPACK and BLAS


"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
- Antoine de Saint-Exupery

Hi there. In this article, I will discuss how to make matrix operations using libraries. I had already mentioned that I am going to discuss this topic, in one of my previous articles. The code was ready for a while but I could barely find time to write. And it took a whole day to answer; how many modules have to be  compiled at least for having a working library?, how many packages have to be installed at least? and how can the library be compiled with minimum number of commands? And I found it appropriate to include the quotation at the beginning of this article because it was directly mentioning to what I was trying to do.

The story begins with a minimal Centos 6.9 installation as usual. I have already explained installation and other steps several times in previous articles. The installation is also out of scope of current post. Therefore, I assume that a machine is already installed.

yum install gcc gcc-gfortran
I have updated the machine with yum update command and rebooted as soon as the machine is installed. This step is not strictly necessary but nice to have. I installed gcc and gcc-gfortran packages using yum.

Then, I downloaded LAPACK library. This is an abbreviation for Linear Algebra Package. It is developed with Fortran and contains functions for important matrix operations like, solutions of linear equations, least squares, eigenvalue/eigenvector calculations or matrix factorization algorithms. These functions are not limited to the operations above. They are just what I could remember after graduation. Although we don't encounter those programs based LAPACK in everyday life, many scientific calculation programs are based on this library.

Another important library is BLAS which is another abbreviation for Basic Linear Algebra Subroutines. It is a lower level library, which provides vector and matrix operations and LAPACK needs BLAS for compilation.
OK, the first thing that needs to be clarified is whether we really need these libraries for a simple matrix multiplication or addition. 
What is Matrix Multiplication?
I don't really want to dig deep the fundamental definitions and theorems of matrix algebra. I will try to walk around it as much as I can. 

So, matrix multiplication is defined on matrices where first matrix has (m x n) and second has (n x p) elements (dimensions). Let's name the first matrix as A and second as B and for simplicity let m = n = p. I mean A and B are square matrices. Let's name the product matrix AB = C, where C is a (p x p) matrix. First row elements of A is multiplied term-by-term with first column elements of B. First element of C is the sum of these p elements.

Similarly, second element of C which is on the right of first element consists from the sum of p elements where second row elements of A is multiplied term-by-term with the first column elements of B. I know, it is really confusing to handle this topic verbally. Therefore I am going to define it with mathematical expressions. Let matrix A is defined in the next figure, matrix B is defined in the figure below:

 


Elements of product matrix C is defined with the formula below:


And the positions of individual elements in matrix C is shown below:

If the operation is to be expressed as a pseudo-code, it is as follows:
 
for i = 1 to p
    for j = 1 to p
        c(i)(j) = 0
        for k = 1 to p
            c(i)(j) = c(i)(j) + a(i)(j) * b(i)(j)

The computational complexity of this operation depends on the third power of p and it is expressed as O(n3) with big O notation. In other words, if two (p x p) matrices are multiplied in unit time, then two (2p x 2p) matrices are multiplied in eight unit time. In literature, different approaches has been developed for this operation and the computational complexity is reduced to O(n2.37) by now. (Further reading: Matrix multiplication algorithm).
The facts with coding and optimization in matrix multiplication is quite different from the theory. For example the pseudo-code above is not optimal because of the cache memory utilization of row-major-order arrays in C. This is an introductionary example of every scientific computation and numerical algorithms lecture. Having the indices with (i, k, j) order provides far better cache hit ratio than the indices with (i, j, k) order. In other words, more operation is done in cache memory and algorithm works faster. 

I had researched these algorithms while I was in university. First, I had started with an algorithm similar to the example above and then improved cache usage with reordering indices and keeping the data close to the processor. Then I implemented Strassen Algorithm to the code. As a third step, I applied this algorithm recursively and optimized my code to its theoretical limits. I also utilized compiler optimizations during compilation and reached to a limit in computation time. 

Then I multiplied same matrices with BLAS for comparison. I used same compiler parameters while compiling the library and unfortunately BLAS was still about 1/5 times faster than my algorithm (i.e. 4 seconds instead of 5). Even more sadly, two matrices, that are multiplied in about 60 seconds using BLAS, were multiplied in a little longer than one second using Matlab. The code of Matlab was written (or optimized) much better for the underlying processor architecture. The processor I have used in this research had SSE2 support and as I divided the number of multiplications done in the matrix multiplication, to the clock frequency of one processor core, the ratio was very close to one. This means, Matlab was utilizing the core so effective that the core was almost doing nothing except multiplication.

In short, these libraries are really needed to accomplish some good results. BLAS is a thirty years old library and developed by hundreds of people since than. It would be unfair to expect better results from a simple research project developed just in a few months.

In BLAS terminology, first level operations are vector vector operations, second level are matrix vector and third level are matrix matrix operations. LAPACK uses BLAS functions and provides more complex functions.

To be honest, I heard the name of these libraries when I was in university. Likewise, many times I have compiled these libraries and programs depending to these libraries as a part of my business. Unfortunately, the use or the applications of the libraries was not taught in lectures. These words were used like Phoenix of the unknown land. I used BLAS in some of my projects when I was in university, and never used LAPACK. Today I think, this is not a positive thing for a lecture when I look at it from the outside.
I will return to the point after a long introduction. Latest LAPACK version was 3.8.0 by the time I write this article. I have downloaded and extracted it.

curl http://www.netlib.org/lapack/lapack-3.8.0.tar.gz > lapack-3.8.0.tar.gz
tar xvfz lapack-3.8.0.tar.gz
Note: I used curl because there is no wget in minimal CentOS.

After extracting Lapack, I changed to the lapack directory, copied example make.inc file which is included in Makefile, to real make.inc file. Then I compiled the targets in Makefile:

cp make.inc.example make.inc
make lib
make blaslib
make cblaslib
make lapackelib

"lib" is for compiling the Fortran code of LAPACK, "blaslib" is for BLAS library bundled with LAPACK. "cblaslib" is C interface of BLAS and "lapackelib" is C interface of LAPACK. These targets have to be compiled.
I put my code to Google Drive. It can be downloaded here.

printmat() functions dumps the matrix to the screen. Output is Matlab compatible, so I can double check the results with Matlab. I have defined size = 3 in main(). This defines matrix size. With the code below, matrix size can be given from the command line as a parameter. 

if(argc > 1)   {
     size = atoi(argv[1]);
}

Lapack has its own lapack_int integer type. I assigned size to lapack_size which is a lapack_int variable. I will work on matrixA. It is initially equal to matrixB. I will use matrixB at the end for proof. I allocated the matrices with malloc() and filled them with random values. ipiv vector is used for changing the row order (row permutation) of the matrix. It is empty and unused here.

There are three simple matrix operations in this code. First, I have factorized the matrix with LU method using dgetrf() function. According to the naming convention of LAPACK dgetrf() is: first letter 'd' is for double, 's' for single, 'c' for complex and 'z' for double complex. These are the precision of matrix elements. Next two chars 'ge' means, this function is related with general matrices (without a structure). There are other two letter options for symmetric, diagonal, tri-diagonal and triangular matrices. There is a detailed information in LAPACK documentation and its wiki page. Last three letters are the name of function, e.g. trf means triangular factorization in short. Below is the function, which does the LU factorization:
LAPACKE_dgetrf(LAPACK_ROW_MAJOR, lapack_size, lapack_size, matrixA, lapack_size, ipiv);

In C, two consecutive values of a 2D array is row-wise consecutive in a matrix, like a11 and a12. In Fortan, two consecutive array elements are column-wise consecutive like a11 and a21. If LAPACK_ROW_MAJOR value is given, function handles consecutive array elements as row members. Following lapack_size arguments are number of rows and columns of the input matrix and matrixA is the pointer of the input matrix. Next lapack_size argument is known as LDA, i.e. "Leading Dimension of Array". This value is important when only a submatrix of matrixA is processed. As I will apply the operation to whole matrix, this value should be equal to matrix size (I am going to explain this in the appendix in order to avoid confusion). I gave the zero vector ipiv as the last argument.

After that, I will apply inverse operation to the matrix I had factorized above.

LAPACKE_dgetri(LAPACK_ROW_MAJOR, lapack_boyut, matrisA, lapack_boyut, ipiv);

'tri' means triangular inverse. I did not want to cover algebra topics but here is a short explanation: The identity element of multiplication in real numbers is 1. A real number multiplied by 1 equals the number itself. An (multplicative) inverse of a number is a number which where two numbers are multiplied, the result yields to identity element. In general, the multiplicative inverse of a real number x is 1/x and each real number except zero has a inverse. These rules are almost valid for the matrices (there are some exceptions). The identity element of matrix multiplication i.e. identity matrix consists from ones in the diagonal and zeroes elsewhere. Not all but many matrices have a multiplicative inverse. (A note for curious minds: https://proofwiki.org/wiki/Matrix_is_Invertible_iff_Determinant_has_Multiplicative_Inverse, bonus:  https://proofwiki.org/wiki/Real_Numbers_form_Ring)
Because, only square matrices can have an inverse (actually it's better to say a single inverse, I think), lapack_size argument is given here single time. Fourth argument is LDA.

Using LAPACK functions, I calculated the inverse of a matrix, so far. Now, I will multiply the A matrix, which I claim that it is the inverse of the initial matrix, with its initial value where I had saved in matrix B. If it is really inverse of its initial value, I have to come up with identity matrix. BLAS functions have its own naming conventions like LAPACK functions. I used dgemm() for matrix multiplication. It is Double GEneral Matrix Multiplication in short. The line for matrix multiplication is:
cblas_dgemm( CblasRowMajor, CblasNoTrans, CblasNoTrans, lapack_size, lapack_size, lapack_size, 1.0, matrisA, lapack_size, matrisB, lapack_size, 0.0, matrisC, lapack_size);

There are also predefined values in CBLAS like LAPACK. CblasRowMajor is one of them and its effect is the same as LAPACK_ROW_MAJOR. Two CblasNoTrans argument denote that matrixA and matrixB will be processed as is, respectively. If this argument would be CblasTrans, then the transpose of the corresponding matrix would be processed. In transpose operation, matrix rows become columns and columns become row, so a (m x n) matrix becomes (n x m) after transpose operation. As I mentioned the definition of matrix multiplication, I wrote that only (m x n) sized matrices can be multiplied with (n x p) sized matrices and the product is a (m x p) sized matrix. Three consecutive lapack_size arguments (fourth, fifth and sixth) corresponds to these m, n and p numbers but as we assume m = n = p for simplicity these arguments are all the same.

dgemm does the following operation: 
dgemm
It multiplies A's elements with a alpha scalar, then applies matrix multiplication with B, multiplies C's elements with a beta scalar and sums everything in C. What I want to do is only matrix multiplication, so alpha should be 1.0 and beta should be 0.0. The constant before the argument matrixA is alpha value. The argument lapack_size after matrixA is LDA. After these are the pointer of matrixB and LDB argument of it. And lastly, beta constant with value of 0.0, pointer of matrixC and LDC of matrixC.

References:

I put the function prototypes as comments in the code for simplicity.
I compiled the code below:

gcc matris.c -I/root/lapack-3.8.0/CBLAS/include \
-I/root/lapack-3.8.0/LAPACKE/include -L/root/lapack-3.8.0 \
-llapacke -llapack -lrefblas -lcblas -lgfortran -o matris.x

The output is following:

A matrix = [
0.000000  4.000000  0.000000  ;
3.000000  7.000000  1.000000  ;
5.000000  5.000000  1.000000    ]

post-LU matrixA = [
5.000000  5.000000  1.000000  ;
0.000000  4.000000  0.000000  ;
0.600000  1.000000  0.400000    ]

inverse of A matrix = [
0.250000  -0.500000  0.500000  ;
0.250000  -0.000000  0.000000  ;
-2.500000  2.500000  -1.500000    ]

proof = [
1.000000  -0.000000  0.000000  ;
0.000000  1.000000  0.000000  ;
0.000000  0.000000  1.000000    ]

The results are completely consistent with the results in GNU Octave:


I want to rise a notice on the -0's in the output. This is a topic for a future blog post, I want to write about.


Appendix: What is LDA and when it is necessary?
This was the LU factorization line:

LAPACKE_dgetrf(LAPACK_ROW_MAJOR, lapack_boyut, lapack_boyut, matrisA, lapack_boyut, ipiv);

Let's assume there is a relatively big matrix e.g. 20 x 20. I want to process only the sub-matrix with size 5 x 6. A(1:5, 1:6) in Matlab terms. Because this is a part of bigger matrix, the next memory element after the last element of the row (of sub-matrix) is actually not a part of the sub-matrix. Next element of the sub-matrix is actually 20 - 6 = 14 unit (float, complex, etc) away in memory. So, LDA is the difference value between two column-wise consecutive elements in memory. The function should be for this example is:
LAPACKE_dgetrf(LAPACK_ROW_MAJOR, 5, 6, matrisA, 20, ipiv );

To keep this example as simple as it can be, I choose the sub-matrix from upper left. Otherwise I would have to deal with some pointer arithmetic operations to find out the memory address of the first element in matrix A. There is one more thing to be considered. LDA denotes the distance between two consecutive column elements because LAPACK_ROW_MAJOR is given. If it would be LAPACK_COL_MAJOR then LDA has to be the distance between two consecutive row elements. In short, LDA has to be the row or the column number of the original matrix while working on sub-matrices. LAPACK_ROW_MAJOR or LAPACK_COLUMN_MAJOR determines the value as row or column.