Page 4 of 5

Re: Display functions optimization

Posted: Sun Jan 18, 2015 8:54 pm
by jonnection
I want you all to be aware that there is another limiting factor that needs to be taken into consideration before any big work on optimizations are done, especially optimizations that will affect everything from bitmap bit order to screen rotation etc.

That factor is the response time of a twisted-nematic liquid crystal display (= the Nokia 5110 LCD), which according to a Fujitsu specsheet I have (all TN displays are similar in this respect) is around 60 ms. http://www.fujitsu.com/downloads/MICRO/fma/pdf/LCD_Backgrounder.pdf

60 ms is 16.666666666667 Hz = approx 17 frames per second.

I have seen this also in practise: in "Isle of maniax" I am drawing white houses on top of the black horizon. The end result is that the black horizon is "showing through" the buildings. This is because the LCD response time (60 ms) means that the black pixels do not have time to "turn off" before they are supposed to be white. The result is a blurry mess and it doesn't look nice. You will not see this effect on the Simbuino emulator. Only on the real hardware.

What I am basically getting at here is that at 350 ns (16x16 drawbitmap with Myndale's optimized putpixel routine), you can draw the Gamebuino screen (84x48 pixels) over so many times, that the speed of the routine does not really have any practical meaning

screen total pixels=84*48=4032
testbmp pixels = 16*16=256
paint whole screen once = pixels/bmppixels=15.75
to paint screen with 16x16 bitmaps takes = paintonce*350ns=0.00000551sec
fps limited by 16x16 bmp painting routine = 1s/topaintscreentakes=181405.89569161 fps

So, even with a 350 ns routine, you could achieve a theoretical FPS of 181000 frames per second. Clearly, your LCD is not kind of going to keep up with it. At this point (really, I am not kidding) whether you have a 350 ns or 150 ns drawing routine doesn't make any difference. You wont be able to use that speed in a meaningful way.

These calculations are all assuming Myndales timing measurement in his demo is correct. Which I think they are.

Re: Display functions optimization

Posted: Sun Jan 18, 2015 9:35 pm
by rodot
jonnection wrote:So, even with a 350 ns routine, you could achieve a theoretical FPS of 181000 frames per second. Clearly, your LCD is not kind of going to keep up with it. At this point (really, I am not kidding) whether you have a 350 ns or 150 ns drawing routine doesn't make any difference. You wont be able to use that speed in a meaningful way.


Being able to draw several layers of parallax with different colors (black, white, gray) would require to go over the screen several times per frame. Although I agree that 150 or 350ns doesn't really make any difference at this level (but it is still a very interesting topic about optimization). I would add that having "optimized" routines without the use of the "swizzling" would be a good thing for the portability of the library and bitmaps to other screens... just in case ;)

Re: Display functions optimization

Posted: Sun Jan 18, 2015 11:48 pm
by Myndale
jonnection wrote:to paint screen with 16x16 bitmaps takes = paintonce*350ns=0.00000551sec

jonnection wrote:These calculations are all assuming Myndales timing measurement in his demo is correct. Which I think they are.


Ah crap, I've been saying nanosecond when I meant microsecond. Sorry guys, my bad. The relative speed-up is the same, I've just been using the wrong nomenclature.

The correct time to paint a full screen is thus 0.0055125 seconds, which at 20fps would net you about 10x overdraw per frame, not including library overhead. That's potentially going to have quite a significant impact on titles that could use those clock cycles for other things like physics or AI.

Re: Display functions optimization

Posted: Mon Jan 19, 2015 7:03 am
by rodot
Just a small bump about the the Bitmap class I suggested on the previous page... nobody interested in these more versatile Bitmap and AnimatedBitmap classes ?

Re: Display functions optimization

Posted: Mon Jan 19, 2015 12:39 pm
by Myndale
I really like the idea in principle but I think the implementation shown is going to cause a lot of problems memory-wise. As far as I can tell every bitmap instance of the class shown is going to take 10 bytes, 10 bitmaps = 100 bytes...that in an environment where there's realistically only a few hundred bytes spare to begin with.

If it's the OOP and flexibility of adding extensions that you find appealing then it's entirely possible to store the bitmap class in PROGMEM as well. The down-side is users would have to access members via get() accessors, those accessors would require an extra PROGMEM read and of course the bitmaps themselves would still be read only. Still, if it solves your problems then something like this illustrates how it could work:

Code:
// flashy new bitmap class
struct Bitmap
{
  const uint8_t * raw;
  uint8_t extended_data;
 
  inline uint8_t getWidth() {return pgm_read_byte(getRawData());}
  inline uint8_t getHeight() {return pgm_read_byte(getRawData()+1);}
  inline uint8_t getPixels() {return pgm_read_byte(getRawData()+2);}
  inline const uint8_t * getRawData() {return (const uint8_t *)pgm_read_word(&this->raw);}
  inline uint8_t getExtendedData() {return pgm_read_byte(&this->extended_data);} 
};

// raw bitmap data in current format, stored in PROGMEM
uint8_t raw_bitmap[] PROGMEM = {8, 8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};

// instance of new bitmap class, also stored entirely in PROGMEM and with extended data
struct Bitmap bitmap PROGMEM = {raw:raw_bitmap, extended_data:123};

void setup() {
  Serial.begin(9600);
  Serial.println(sizeof(bitmap));             // outputs -> 3 (2 bytes for raw ptr, 1 for extended data)
  Serial.println(bitmap.getWidth());          // outputs -> 8
  Serial.println(bitmap.getHeight());         // outputs -> 8
  Serial.println(bitmap.getExtendedData());   // outputs -> 123
}

void loop() {
}

Of course it's not entirely necessary to use the raw ptr the way I have here, I just did it like that to show that it could be done in a way that would maintain backward compatibility with the existing code, should you want to.

Re: Display functions optimization

Posted: Thu Jan 22, 2015 9:12 am
by cyberic
Great work everyone!

Is there an optimised drawBitmap function that we can use right now, keeping the same bitmap format?

Do you think there could be a special case when I want to 'draw' on the whole screen?
a kind of memcpy?

Thx

Re: Display functions optimization

Posted: Thu Jan 22, 2015 4:37 pm
by rodot
The second version Myndale posted is compatible with current bitmaps and is 6x faster, but it only supports black color and don't do screen border check (so you can inadvertently write out of the screen buffer which will lead to unstable behaviors).
If someone adds border and colors check I will replace the default drawBitmap library with it :)

Re: Display functions optimization

Posted: Fri Jan 23, 2015 12:01 am
by Myndale
Try this one instead, it's 230uS and also supports WHITE and INVERT. I've also added another version that's 170 bytes smaller but ~20uS slower. I was writing this with the intention of converting it to assembly but a quick check of the LST file reveals the compiler is pretty much doing what I was going to do anyway.

At some point I'll do another version to handle all the other cases i.e. clipping, flipped etc. It won't be 230uS but should still be much faster than the existing function.

Code:
#include <SPI.h>
#include <Gamebuino.h>
Gamebuino gb;

const byte sprite[] PROGMEM = {
  16, 16, 0x1f,0xf8,0x1f,0xf8,0x1f,0xfc,0x1f,0xff,0x1f,0xff,0xf,0xff,0xf,0xff,0x7,0xff,0x87,0xff,0x3,0xff,0x1,0xff,0x0,0x7f,0x2,0x1f,0x0,0x0,0x0,0x0,0x40,0x0,};

void setup(){
  gb.begin();
}

void loop(){
  long start, finish;
  int drawTime;

  if(gb.update()){
    const int count = 101;  // this has to be odd so we can see INVERT

    start = millis();
     for (int i=0; i<count; i++)
     gb.display.drawBitmap(1, 31, sprite);
     finish = millis();
     drawTime = 1000L*(finish-start)/count;
     gb.display.print(F("drawBitmap: "));
     gb.display.print(drawTime);
     gb.display.println(F("ns"));
     
    start = millis();
    for (int i=0; i<count; i++)
      drawBitmapUnrolled(17, 31, sprite, BLACK);
    finish = millis();
    drawTime = 1000L*(finish-start)/count;
    gb.display.print(F("unrolled: "));
    gb.display.print(drawTime);
    gb.display.println(F("ns"));
     
    start = millis();
    for (int i=0; i<count; i++)
      drawBitmapUnrolled2(33, 31, sprite, BLACK);
    finish = millis();
    drawTime = 1000L*(finish-start)/count;
    gb.display.print(F("unrolled2: "));
    gb.display.print(drawTime);
    gb.display.println(F("ns"));
  }
}

/* 782 bytes, 230uS */
void drawBitmapUnrolled(int8_t x, int8_t y, const uint8_t *bitmap, const uint8_t color) {
  int8_t h = pgm_read_byte(bitmap + 1);
  const int8_t byteWidth = (pgm_read_byte(bitmap) + 7) >> 3;
  bitmap += 2;   
  uint8_t * screen_line = gb.display.getBuffer() + (y / 8) * LCDWIDTH_NOROT + x;
 
  if (color == BLACK)
  {
    uint8_t mask = _BV(y & 7);
    while (h--)
    {
      uint8_t * ptr = screen_line;
      uint8_t i = byteWidth;
      while (i--)
      {
        const uint8_t pixels = pgm_read_byte(bitmap++);
        if (pixels & 0x80) ptr[0] |= mask;
        if (pixels & 0x40) ptr[1] |= mask;
        if (pixels & 0x20) ptr[2] |= mask;
        if (pixels & 0x10) ptr[3] |= mask;
        if (pixels & 0x08) ptr[4] |= mask;
        if (pixels & 0x04) ptr[5] |= mask;
        if (pixels & 0x02) ptr[6] |= mask;
        if (pixels & 0x01) ptr[7] |= mask;
        ptr += 8;
      }
      y++;
      if (!(y & 7))
        screen_line += LCDWIDTH_NOROT;
      mask = (mask & 0x80) ? 1 : (mask<<1);
    }
  }
 
  else if (color == WHITE)
  {
    uint8_t mask = ~_BV(y & 7);
    while (h--)
    {
      uint8_t * ptr = screen_line;
      uint8_t i = byteWidth;
      while (i--)
      {
        const uint8_t pixels = pgm_read_byte(bitmap++);
        if (pixels & 0x80) ptr[0] &= mask;
        if (pixels & 0x40) ptr[1] &= mask;
        if (pixels & 0x20) ptr[2] &= mask;
        if (pixels & 0x10) ptr[3] &= mask;
        if (pixels & 0x08) ptr[4] &= mask;
        if (pixels & 0x04) ptr[5] &= mask;
        if (pixels & 0x02) ptr[6] &= mask;
        if (pixels & 0x01) ptr[7] &= mask;
        ptr += 8;
      }
      y++;
      if (!(y & 7))
        screen_line += LCDWIDTH_NOROT;
      mask = (mask & 0x80) ? (mask<<1)+1 : 0xfe;
    }
  }
 
  else  // invert
  {
    uint8_t mask = _BV(y & 7);
    while (h--)
    {
      uint8_t * ptr = screen_line;
      uint8_t i = byteWidth;
      while (i--)
      {
        const uint8_t pixels = pgm_read_byte(bitmap++);
        if (pixels & 0x80) ptr[0] ^= mask;
        if (pixels & 0x40) ptr[1] ^= mask;
        if (pixels & 0x20) ptr[2] ^= mask;
        if (pixels & 0x10) ptr[3] ^= mask;
        if (pixels & 0x08) ptr[4] ^= mask;
        if (pixels & 0x04) ptr[5] ^= mask;
        if (pixels & 0x02) ptr[6] ^= mask;
        if (pixels & 0x01) ptr[7] ^= mask;
        ptr += 8;
      }
      y++;
      if (!(y & 7))
        screen_line += LCDWIDTH_NOROT;
      mask = (mask & 0x80) ? 1 : (mask<<1);
    }
  }
 
}

/* 612 bytes, 250uS */
void drawBitmapUnrolled2(int8_t x, int8_t y, const uint8_t *bitmap, const uint8_t color) {
  int8_t h = pgm_read_byte(bitmap + 1);
  const int8_t byteWidth = (pgm_read_byte(bitmap) + 7) >> 3;
  bitmap += 2;   
  uint8_t * screen_line = gb.display.getBuffer() + (y / 8) * LCDWIDTH_NOROT + x;
  uint8_t mask = _BV(y & 7);
  if (color == WHITE)
    mask = ~mask;
  while (h--)
  {
    uint8_t * ptr = screen_line;
    uint8_t i = byteWidth;
    if (color == BLACK)
      while (i--)
      {
        const uint8_t pixels = pgm_read_byte(bitmap++);
        if (pixels & 0x80) ptr[0] |= mask;
        if (pixels & 0x40) ptr[1] |= mask;
        if (pixels & 0x20) ptr[2] |= mask;
        if (pixels & 0x10) ptr[3] |= mask;
        if (pixels & 0x08) ptr[4] |= mask;
        if (pixels & 0x04) ptr[5] |= mask;
        if (pixels & 0x02) ptr[6] |= mask;
        if (pixels & 0x01) ptr[7] |= mask;
        ptr += 8;
      }
    else if (color == WHITE)
      while (i--)
      {
        const uint8_t pixels = pgm_read_byte(bitmap++);
        if (pixels & 0x80) ptr[0] &= mask;
        if (pixels & 0x40) ptr[1] &= mask;
        if (pixels & 0x20) ptr[2] &= mask;
        if (pixels & 0x10) ptr[3] &= mask;
        if (pixels & 0x08) ptr[4] &= mask;
        if (pixels & 0x04) ptr[5] &= mask;
        if (pixels & 0x02) ptr[6] &= mask;
        if (pixels & 0x01) ptr[7] &= mask;
        ptr += 8;
      }
    else // invert
      while (i--)
      {
        const uint8_t pixels = pgm_read_byte(bitmap++);
        if (pixels & 0x80) ptr[0] ^= mask;
        if (pixels & 0x40) ptr[1] ^= mask;
        if (pixels & 0x20) ptr[2] ^= mask;
        if (pixels & 0x10) ptr[3] ^= mask;
        if (pixels & 0x08) ptr[4] ^= mask;
        if (pixels & 0x04) ptr[5] ^= mask;
        if (pixels & 0x02) ptr[6] ^= mask;
        if (pixels & 0x01) ptr[7] ^= mask;
        ptr += 8;
      }
    y++;
    if (!(y & 7))
      screen_line += LCDWIDTH_NOROT;
    if (color == WHITE)
      mask = (mask & 0x80) ? (mask<<1)+1 : 0xfe;
    else
      mask = (mask & 0x80) ? 1 : (mask<<1);
  }
}

Re: Display functions optimization

Posted: Fri Jan 23, 2015 10:41 am
by cyberic
Thx Myndale!

Re: Display functions optimization

Posted: Sat Jan 24, 2015 8:38 am
by rodot
Thanks Myndale :)
Would it significantly affect the performances to check that the pixels are drawn in the screen ? Because the case where your bitmap overlay the edge of the screen is pretty common, in my opinion it's a must-have feature... it's a shame, but I didn't manage to implement it properly with your optimized. I'm sure it would be a matter of minutes for your to implement it :P