**Controls:** D-Pad: **[Arrows / WASD]** - A: **[J]** - B: **[K]** - Menu: **[U]** - Home: **[I]**

Enjoy games at full speed with sound and lights on the Gamebuino META!

Emulator by aoneill

In this tutorial we will learn how to display a picture on the screen after an "affine transformation" including scale, shear and rotation. Different optimization techniques are used to speed up the display. Such an effect can be used in a game as a splash screen or as a rotating play field.

Download and install the .bin file to see live the different steps of the implementation (just press A/B to switch between steps).

This first part is just about importing the picture into the sketch and display how many milliseconds are necessary for a draw with the `drawImage`

function of the Gamebuino library.

__Import the picture__

First of all we need a picture to display. I built the following picture based on graphics of the Gamebuino web-site (so it can be used for marketing, if anyone wishes, and there is no discussion about copyright!), but you can use any picture you want. To use the full set of optimizations presented in this tutorial, the picture needs to be a square of **256x256** pixels, but the technique can be adapted to other picture sizes (preferably powers of 2).

Rename the picture as "Picture.bmp" and then use the image-to-code converter to get a long text defining a data array and an Image object. The conversion may take some seconds to complete due to the size of the picture.

Create a new sketch (or use an existing one if you know what you are doing), and insert the code. Then add the classical Gamebuino code, and display the picture using `drawImage`

to see if everything is correctly setup.

The code should look like this (the line with `PictureData`

is truncated on purpose, otherwise it would take the whole page):

#include <Gamebuino-Meta.h> const uint16_t PictureData[] = {256,256,1, 1, 0, 0, 0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0..... [snip] Image Picture = Image(PictureData); void setup() { // Starts Gamebuino library and sets target frame rate gb.begin(); gb.setFrameRate(50); } void loop() { while(!gb.update()); gb.display.drawImage(0, 0, Picture); }

Please note that the `setFrameRate`

command sets already the target frame rate, which will be reached at the end of the tutorial.

At this point, executing the sketch should display the top-left part of the picture.

**Measure the duration of the draw**

We are looking for performance in this tutorial, so we need to measure how long it takes to draw the picture, and hence assess the effect of each optimization. This measurement is not necessary for the zoom/rotation effect, it is implemented only for didactic purposes. You can jump to the next chapter if you want.

To measure the time we use the function millis() of the Arduino library, which returns the number of milliseconds since the start of the current program.

Insert the measurement of the start time just under the `while(!gb.update());`

line:

// Gets the time at start of the computation unsigned long startTime = millis();

And insert the computation and display of the duration at the end of the `loop()`

function:

// Computes and displays duration unsigned long duration = millis()-startTime; gb.display.clearTextVars(); gb.display.setColor(WHITE, BLACK); gb.display.print(duration); gb.display.print(" ms");

The number is in milliseconds, drawn on a non-transparent background for readability. Now the setup is completed, you should see something like this on the screen:

Time to jump to the next part and discover the draw concepts.

This step presents the general idea of the draw technique and a first, non-optimized implementation.

__General idea__

To display a picture to the screen with arbitrary zoom and rotation, two families of algorithms can be used:

- Method 1: For each pixel of the source picture, we determine the position and size of this pixel on the screen and draw the pixel at this position
- Method 2: For each pixel of the screen, we determine the color of the pixel in the source image and draw it at the current position on the screen

The method 1 has some drawbacks:

- In case the image covers a few pixels on the screen, the algorithm will anyway look at each pixel in the source picture and often draw several times at the same pixel position on the screen
- If the picture is drawn with a zoom, then gaps will be visible because one pixel in the source picture would need to be drawn as a rectangle on the screen.

You got it, we will use the method 2.

The basic idea here is to start from a point at given coordinates within the picture, and to jump to the next point by adding constant increments to the coordinates.

More precisely, on the following diagram, the algorithm is represented for a screen of 6x5 pixels:

We start at point A and draw the color of A at (0,0) on the screen. Then, for the second point we jump to the next column from A to B and draw the color of B on the screen at (1,0). We do the same on the first row until we reach C, in turn drawn at (4,0) on the screen. Now to go to the next row, we come back to A and jump to D, and so on.

Please note that we have 3 co-existing coordinate systems in the algorithm: the screen, the source picture and the "virtual" picture we want to draw. To be clear in this tutorial we will name the coordinates:

- "column" and "row" in the virtual picture, so AB is the column increment and AD is the row increment
- Uppercase X and Y for the coordinates within the source picture

For commonality with the existing screen coordinate system, the other coordinate systems have the same orientation, i.e. origin at the top-left, X axis pointing to the right and Y axis pointing to the bottom.

The origin, column and row increments have two X and Y components in the 2D source picture, that we will name as following:

`originX`

and`originY`

for the origin, point A on the diagram`columnIncrementX`

and`columnIncrementY`

for the columns, in the diagram this vector is approximately equal to*(3.2,1.2)*`rowIncrementX`

and`rowIncrementY`

for the columns, in the diagram this vector is approximately equal to*(-1.0,2.1)*

Last thing to note: the increments have fractional values, not integer values. This is necessary to keep a good precision while drawing. In the next section, these increments will be encoded using the `float`

type.

**Implementation with float variables**

Let's implement the algorithm. First define constant values for the origin and increments in the `loop()`

function, just before the line with `startTime = millis()`

. The constant values are given here as an example to display a portion of the source picture:

// Coordinates of the first point to draw float originX=10; float originY=10; // Column increment float columnIncrementX = 3.2; float columnIncrementY = 1.2; // Row increment float rowIncrementX = -1.0; float rowIncrementY = 2.1;

Then **replace** the line `gb.display.drawImage(0, 0, Picture);`

by the following lines to implement the complete algorithm with floats. The variables `startX`

and `startY`

contain the coordinates of the first point of a row, `X`

and `Y`

the coordinates of the current point within the source picture and `px`

and `py`

the coordinates of the current point on the screen:

// Inits the counters and performs loop float startX=originX, startY=originY, X, Y; for(int py=0; py<64; py++) { X=startX; Y=startY; Color background = gb.createColor(0, 0, py*4); for(int px=0; px<80; px++) { int intx = (int)X; int inty = (int)Y; if(intx<0 || intx>255 || inty<0 || inty>255) gb.display.drawPixel(px, py, background); else gb.display.drawPixel(px, py, Picture.getPixelColor(intx, inty));`X+=columnIncrementX; Y+=columnIncrementY; } startX+=rowIncrementX; startY+=rowIncrementY; }</pre>`

First, as you can see, we simply use the functions

`drawPixel`

and`getPixelColor`

to transfer the color of the pixel from the source picture to the screen (more on that in the next part).Second, before getting the pixel color from the source picture, the float coordinates are converted to integers using a cast

`(int)X`

, and the resulting position is then checked to ensure that it is in the source picture. If not, a simple shade is displayed as background color. As a remark, if we do not check the position, then at some point a fatal error is raised because the`getPixelColor`

function tries to pick a pixel in some random memory area that is not accessible to the processor.Time to execute and have a look at the generated picture, which should look like this:

## Part 3: Animation and first optimizations

At this point we are able to draw the picture according to constant increments. In this part we will add animation and start to optimize.

AnimationWell, now we can draw the picture, it is not very quick but it works (trust me, it will get faster in the next parts). We know that we can modify the origin and the increments vectors to change the viewpoint of the observer, but how?

First of all, we want to center the draw on the center of the source picture. The vectors are inter-dependent so we will need some basic equations (maths are useful after all!).

More precisely, when we draw the pixel at the center of the screen at (40,32), we should be at the position (128,128) in the source picture. According to the draw algorithm (look at the diagram), this point is reached on the row 32 (so after adding 32 times the row increment vector from the origin) and the column 40 (so after adding 40 times the column increment vector from the origin).

In the form of equations projected on the X and Y axes, it looks like:

*originX + 40*columnIncrementX + 32*rowIncrementX = 128**originY + 40*columnIncrementY + 32*rowIncrementY = 128*

The resolution to compute the origin from the increments is immediate:

*originX = 128 - 40*columnIncrementX - 32*rowIncrementX**originY = 128 - 40*columnIncrementY - 32*rowIncrementY*

Now we have the origin and need to compute the increments. We want a simple rotation, i.e. we have to define rotating increments vectors, and such that the column and row increment vectors are perpendicular.

Classically, the trigonometric functions sine and cosine are used for this purpose. These functions oscillate between *-1* and *1* with a period of *2*PI.*

*To draw a circle centered on the origin, if t is varying between 0 and 2*PI*, the successive points (x,y) on the circle of radius R are located at the coordinates:

*x = R*cos(t)**y = R*sin(t)*

The increments vectors will be computed exactly as a point on a circle (the vector from the center to the point on the circle is indeed rotating as *t* increases):

*columnIncrementX = R*cos(t)**columnIncrementY = R*sin(t)*

For the row increment vector, this is the same with an added angle of 90 degrees, i.e. *PI/2* in radians:

*rowIncrementX = R*cos(t + PI/2)**rowIncrementY = R*sin(t + PI/2)*

We will define a radius of 1 and use as variable varying with the time a new variable `angle`

incremented by *2*PI/100* at each frame, hence after 100 frames the cycle will be completed.

The basics are known, so let's go to the implementation.

First define the new **global** variable called `angle`

before the `loop()`

function:

// Increasing angle to set observer's position float angle=0.0;

The angle needs to be increased at each frame. Add these lines at the beginning of the `loop()`

function:

// Increments angle angle = angle + 2*3.1416/100;

**Re-write** the following lines such that the increments follow circles of radius 4 (the origin being computed as described above):

// Increments to go to next column (increment on x on drawn image) float columnIncrementX = 4*cos(angle); float columnIncrementY = 4*sin(angle); // Increments to go to next row (increment on y on drawn image) float rowIncrementX = 4*cos(angle+3.1416/2); float rowIncrementY = 4*sin(angle+3.1416/2); // Coordinates of the first point to draw float originX=128-40*columnIncrementX-32*rowIncrementX; float originY=128-40*columnIncrementY-32*rowIncrementY;

With these settings, the animation should look like this (by the way for the animated gifs of this tutorial, ezgif.com was used to scale the images 3 times and adapt the animation speed to be near what can be seen on the Gamebuino):

Finally, to make the animation a little more interesting, let's add an oscillating zoom and a 3D-like effect. We will define a new `zoom`

variable that oscillates with a frequency different than the one for the rotation (here we just use a sine function with the `angle`

divided by an arbitrary value). The variable `zoom`

is in turn used as a factor on the length of the row and column increments, such that it gives an effect of zoom in/out oscillation. In addition, we reduce the radius of the column increment to give a 3D isometric effect.

For the implementation, add the `zoom`

definition and **re-write** the lines defining the increments:

// Computes zoom level float zoom = 2.2 + 2*sin(angle/1.618);// Increments to go to next column (increment on x on drawn image) float columnIncrementX=zoom

2cos(angle); float columnIncrementY=zoom2sin(angle); // Increments to go to next row (increment on y on drawn image) float rowIncrementX=zoom3cos(angle+3.1416/2); float rowIncrementY=zoom3sin(angle+3.1416/2);

Do not hesitate to experiment a lot with these settings! You can change the factors or the functions to create different effects.

Now the animation should look like this:

Well, not bad after all, but still not very fast to draw. Optimizations will be introduced in the next section.

__Optimization of getPixelColor__

The idea of the first optimization is to replace the `getPixelColor`

function by a direct access to the memory. We already check that the coordinates are in the range *[0..255]*, so it is safe to pick the pixel color directly.

The data of the source picture is stored in the flash memory as a constant array `PictureData`

. It starts with 6 `uint16_t`

values that we do not need, after this header it is a one-dimensional array. A pixel at position *x,y* is stored at the offset *6+y+x*256* (the multiplication by 256 can be written as well as a bit shift of 8 bits).

So **re-write** the following line in the draw algorithm to replace the `getPixelColor`

function:

gb.display.drawPixel(px, py, (Color)PictureData[6+intx+(inty<<8)]);

The cast to `(Color)`

is necessary as a formal type conversion, there is no data conversion.

Once optimized, the result is not significantly quicker, let's implement another optimization.

__Optimization of drawPixel__

Here again, we want to replace a function by a direct access to the memory. Looking at the header file of Image, it appears that the picture data is contained in a public member of the `Image `

class called `_buffer`

.

This member has the type* uint16_*

`t*`

i.e. it is a pointer on an array of `Color`

pixels. Unlike the static `Image`

object, this array of pixels has no header (dimensions of the picture are placed in actual members of the class). The modification is then straightforward.**Re-write** the following lines within the draw algorithm to replace the two occurrences of `drawPixel`

:

if(intx<0 || intx>255 || inty<0 || inty>255) gb.display._buffer[p++]=(uint16_t)background; else gb.display._buffer[p++] = PictureData[6+intx+(inty<<8)];

Please note that this optimization is clearly to be put in the "**danger zone**". As far as I can see, the `_buffer`

member is not officially documented and may change without notice, leading to potential incompatibilities in future versions of the Gamebuino API (nevertheless a compiled .bin would work further). This is the price to pay for performance.

Once optimized, the result is a bit quicker:

In this part, we will change small parts of the draw algorithm to increase dramatically the speed.

__Simple 32-bits fixed-point arithmetic without bit shift__

The next optimization is more tricky. It relies on fixed-point arithmetic, which is a general-purpose technique to perform computations with fractional numbers, but without using the floating-point types `float`

or `double`

. The same technique is used, probably among many other creations, in Fractalino and in Raycaster.

Understanding the bits and bytes of this technique is not completely mandatory here, and this section much deeper than necessary for the optimization, __so you can directly go to the implementation in the next section if you wish__.

With fixed-point arithmetic, all fractional numbers are multiplied by a common factor and are stored as integers.

For example, in base 10, let's say that we want to add *a=1.3* and *b=2.6* taking *10* as our common factor. First convert *a* and *b* by multiplying them with the common factor: *A = a * 10 = 13 and the same for B = 26*. Once converted in fixed-point representation, we just add *A* and *B as **C = A + B = 13 + 26 = 39*. Finally, we convert the result back as *c = C/10 = 3.9*. Here it is easy to see that additions or subtractions of numbers multiplied by a same factor will lead to correct results, we apply just the distributive property.

In base 2 (binary), powers of 2 are used as common factors such that multiplications and divisions can be done as bit shifts. In our examples, we will not use bit shifts, preferring carefully selected access to bytes or words, which are equivalent to multiplying/dividing by factors of 256.

Namely, the processor of the Gamebuino can process 32-bits signed or unsigned integers, and it is always possible to access parts of the 32-bits word, either in two 16-bits words or in 4 bytes.

As an example, if we encode the value *1234.5678* on 32 bits with 16 bits after the dot, the factor is *2^16 = 65536 = 256*256.*

Then *1234.5678 * 65536 = 80908635.3408*. The last digits after the dot (*.3408*) will not be encoded, so we look at the value *80908635=*`0x04D2915B`

.

At this point it is very important to mention that the processor of the Gamebuino uses a little endian encoding. It means that our value is represented in memory in reverse order i.e. as the sequence `{ 0x5B, 0x91, 0xD2, 0x04 }`

.

The different views of the 32 bits in memory are represented below (unsigned values only):

----- ----- ----- ----- Hexadecimal | 5B | 91 | D2 | 04 | ----- ----- ----- ----- 4 bytes | 091 | 145 | 210 | 004 | ----------- ----------- 2 x 16 bits | 37211 | 1234 | ----------------------- 1 x 32 bits | 80908635 | -----------------------

Logically, we find the value 1234 (integer part of the value *1234.5678* we encoded) as the second 16-bits integer, and the last byte (here 004) is not zero because 1234 cannot be encoded as one byte (greater than 255). This will be useful to determine if a coordinate is still in the range *[0..255]* i.e. within the source picture.

The same kind of decomposition can be done for a __negative__ value. Let's look at a new example with the negative value *-123.456 i.e.* *-123.456 * 65536 = -8090812.416*. We round the value giving *-8090812 *= `0xFF848B44`

encoded as signed integer.

To give a complete overview (we won't look at all numbers!), the different interpretations of the 32 bits according to different data types, signed or not, are depicted below for `0xFF848B44`

encoded in little-endian:

------ ------ ------ ------ Hexadecimal | 44 | 8B | 84 | FF | ------ ------ ------ ------ as uint8_t | 068 | 139 | 132 | 255 | ------ ------ ------ ------ as int8_t | -188 | 117 | -124 | -001 | ------------- ------------- as uint16_t | 35652 | 65412 | ------------- ------------- as int16_t | -29884 | -124 | --------------------------- as int32_t | -8090812 | ---------------------------

The interpretation is complicated in this case. The main thing we need to check is the second `int16_t `

equal to -124. We encoded *-123.456*, so we could expect -123 here. This difference of 1 is strange at first sight but actually logical.

To try to understand, let's encode *0.0* and decrease it by *0.2* to obtain the encoded version of *-0.2, i.e.* *A *= `0x00000000`

, *0.2 * 65536 = 3276.8* so after rounding *B *= *32769 *= `0x00008001`

, then *A - B = -32769 = *`0xFFFF7FFF`

. We see that to encode *-0.2*, the most-significant byte of the integer part is equal to `0xFF `

= *-1* (second byte here, would be the third byte in little-endian), not zero although we encode *-0.2*. Intuitively it cannot be *0* because it would mean that *+0.1* and *-0.1* would be encoded with the same most-significant byte.

Actually it can be interpreted as following: when a value *W* is separated into its 2 parts *W = M*256 + L*, the most-significant part *M* is the next lower multiple (of 256), and the least-significant part *L* is added to it.

It is obvious with positive values, e.g. for *260* the next lower multiple is * 1*256* (so

`0x01`

) and the value `0x04`

, `0x0104`

).This is the same with negative values, e.g. for *-260*, the next lower multiple is *( -2)*256=*

`0xFE`

) and the value `0xFC`

) is added to it (so `0xFEFC`

). Strange but it works. To come back to our first example, the value `0xFF84`

=*(-1)*256+132=-124*. To re-construct the number correctly, we have to interpret the most-significant byte as signed and the least-significant ones as unsigned. The same to reconstruct the fractional number, we interpret the integer part as signed (most-significant) and the fractional part as unsigned: `0xFF848B44`

*/65536 = -124 + 35652/65536 = -123.45599* (precision was lost at the beginning while rounding).

Fixed-point arithmetic work as well for multiplication and division, but it is another story. The main limitation of the technique is of course that it will completely fail if the result of some operations exceeds the maximum value (here *-32768* or *+32767*), even temporarily.

__Implementation of the optimization with fixed-point numbers__

After the long deep-dive into fixed-point arithmetic of the previous section, let's summarize and remember that for fixed-point numbers on 32 bits, 16 bits after the dot:

- We define the factor
*F = 256*256 = 65536*(it corresponds to a shift of 16 bits due to the 16 bits after the dot) - A negative or positive float
*n*is encoded as fixed-point using`N=(int32_t)round(n*F)`

- Numbers encoded as fixed-point can be added or subtracted just like large integers, resulting in encoded numbers (actually all operations are possible with more complex formulas)
- An encoded number is decoded back to a float with
`n=(float)N/(float)F`

- To get the fractional or integer parts directly from an encoded number, thanks to our choice of 16 bits after the dot, we can just access the data differently, i.e. as 16-bits integer or bytes, instead of performing conversions or bit shifts
- When accessing different parts of the 32 bits, it is important to interpret the most-significant part as signed integer, and the least-significant part as unsigned integer, due to the encoding of negative values
- Data is encoded in "little-endian" on the Gamebuino, i.e. the number
`0x01234567`

is represented in memory as`{ 0x67, 0x45, 0x23, 0x01 }`

In C and C++, the data structure `union`

can be used to access the same memory location with different types.

Insert the following global type definition, which gives a full access as different types (not all types will be used):

union FixedPoint { int32_t asInt32; int16_t asInt16[2]; uint16_t asUInt16[2]; int8_t asInt8[4]; uint8_t asUInt8[4]; };

Then, **re-write** the complete draw code as following, first with the definition of the counters and then the conversion from floats:

// Defines the increments as FixedPoint types and convert values from floats FixedPoint columnIncrementFPX, columnIncrementFPY, rowIncrementFPX, rowIncrementFPY; columnIncrementFPX.asInt32 = (int32_t)round(columnIncrementX*65536.0); columnIncrementFPY.asInt32 = (int32_t)round(columnIncrementY*65536.0); rowIncrementFPX.asInt32 = (int32_t)round(rowIncrementX*65536.0); rowIncrementFPY.asInt32 = (int32_t)round(rowIncrementY*65536.0); // Inits the counters as FixedPoint types and converts values from floats FixedPoint startX, startY, X, Y; startX.asInt32 = (int32_t)round(originX*65536.0); startY.asInt32 = (int32_t)round(originY*65536.0);

Second, re-write the main loop as following:

// Performs loop int p=0; for(int py=0; py<64; py++) { X.asInt32 = startX.asInt32; Y.asInt32 = startY.asInt32; Color background = gb.createColor(0,0,py*4); for(int px=0; px<80; px++) { int intx = X.asInt16[1]; int inty = Y.asInt16[1]; if(intx<0 || intx>255 || inty<0 || inty>255) gb.display._buffer[p++] = (uint16_t)background; else gb.display._buffer[p++] = PictureData[6+intx+(inty<<8)];`X.asInt32 += columnIncrementFPX.asInt32; Y.asInt32 += columnIncrementFPY.asInt32; } startX.asInt32 += rowIncrementFPX.asInt32; startY.asInt32 += rowIncrementFPY.asInt32; }</pre>`

Note that:

- The coordinates within the picture,
`intx`

and`inty`

, are directly extracted from the integer parts (signed) of the current coordinates`X`

and`Y`

. - The addition of the increments are performed classically on 32 bits, adding the integer and fractional parts of the encoded numbers in one operation

The speed increase is astonishing:

__Infinite Picture__

A variant of the previous optimization can be done by changing the lines that define `intx`

and `inty`

, check the coordinates and write the pixels. Here we want to use a direct access to the data and simplify the current position check.

**Re-write** these lines as following:

if(X.asInt8[3]!=0 || Y.asInt8[3]!=0) gb.display._buffer[p++] = (uint16_t)background; else gb.display._buffer[p++] = PictureData[6+X.asUInt8[2]+(Y.asUInt8[2]<<8)];

Worth mentioning that we use directly the bytes of the integer part of the current position:

- There is no comparison any more of
`X`

and`Y`

, we just use the most-significant byte and check if it equals zero:`if(X.asInt8[3]!=0 || Y.asInt8[3]!=0)`

. This is only possible due to the choice of a picture of 256x256 pixels, if the most-significant byte equals zero the value if automatically positive and less than 256. - We use directly and only the least-significant byte, obviously in range
*[0..255]*, to compute the pixel position within the source picture.

The performance increase of this optimization is marginal, but it paves the way for another effect: as we now use directly an unsigned byte for each coordinate within the source picture, we are sure to be inside the picture so it is safe to remove the test. By doing so, due to the nature of the computation, it is like drawing the picture with coordinates modulo 256.

Comment out these lines as following:

//if(X.asInt8[3]!=0 || Y.asInt8[3]!=0) //gb.display._buffer[p++] = (uint16_t)background; //else

At the end, the result should look like an infinite tiling of the source picture:

I hope you enjoyed this tutorial, learned the basics of displaying pixels after a simple transformation, and how to optimize such a draw technique. The result is very fast and could be used out-of-the-box for a game, as long as not too much additional data is necessary (the picture itself uses half of the available flash memory)!.

The strict affine transformation is completed. Possible next steps now would be to:

- Apply the same technique on a tile-map instead of a bitmap such that more flash memory remains available. With carefully chosen tile size and map size, bit-shits can be used to build the location of the source pixel in a tile.
- Switch to high-definition using the high-definition display method by aoneill. At the end we are at about 3-4 milliseconds to draw. Optimistically a draw of 4 times the number of pixels in less than 20 milliseconds seems reachable.
- Add a variant of
`drawImage`

with integrated rotation and scale for any picture - Use 3D projection formulas and modify the increments at each drawn line to mimic the Mode 7 of the SNES. Actually, if you download and install the .bin file, there is a "hidden level" with 3D projection and animation (hopefully for a future tutorial).

That's all folks!