Sat Aug 08, 2015 12:15 am
void ultraDraw(byte data[], char x, char y){
uint8_t* buf = gb.display.getBuffer();
for(char i = 0; i < 8; i++){
if (y>=0)
*(buf + i+x + (y/8) * 84) |= data[i] << (y%8);
*(buf + i+x + ((y+8)/8) * 84) |= data[i] >> (8-y%8);
}
}
void ultraDraw(byte data[], char x, char y){
uint8_t* buf = gb.display.getBuffer();
for(char i = 0; i < 8; i++){
if (y>=0)
*((y>>3) * 84 + buf + i+x ) |= data[i] << (y%8);
*(((y+8)>>3) * 84 + buf + i+x ) |= data[i] >> (8-y%8);
}
}
void ultraDraw2(byte data[], char x, char y){
uint8_t* buf = gb.display.getBuffer() + x;
for(char i = 0; i < 8; i++){
if (y>=0)
*((((y)&0xF8)>>1) * 21 + buf) |= data[i] << (y%8);
*((((y + 8)&0xF8)>>1) * 21 + buf) |= data[i] >> (8-y%8);
buf++;
}
}
void ultraDraw3(byte data[], char x, char y){
uint8_t* buf = ((y&0xF8)>>1) * 21 + gb.display.getBuffer() + x;
uint8_t* buf2 = buf + 84;
for(char i = 0; i < 8;i++){
if(y>=0){
*(buf) |= data[i] << (y%8);
buf++;
}
*(buf2) |= data[i] >> (8-y%8);
buf2++;
}
}
void ultraDraw4(byte data[], char x, char y){
uint8_t* buf = ((y&0xF8)>>1) * 21 + gb.display.getBuffer() + x;
asm volatile(
"ldi R16,8\n\t"
"cpi %[rotnumber],0\n\t"
"breq LoopAligned\n"
"LoopStart:\n\t"
"ld R17,Z+\n\t"
"eor R18,R18\n\t"
"mov R19,%[rotnumber]\n\t"
"LoopShift:\n\t" // carry is still reset from the cpi instruction or from the dec
"rol R17\n\t"
"rol R18\n\t"
"dec R19\n\t"
"brne LoopShift\n\t"
"ld R19,X\n\t"
"or R19,R17\n\t"
"st X+,R19\n\t"
"ld R19,Y\n\t"
"or R19,R18\n\t"
"st Y+,R19\n\t"
"dec R16\n\t"
"brne LoopStart\n\t"
"rjmp End\n"
"LoopAligned:\n\t"
"ld R17,Z+\n\t"
"ld R18,X\n\t"
"eor R18,R17\n\t"
"st X+,R18\n\t"
"dec R16\n\t"
"brne LoopAligned\n"
"End:\n"
::"x" (buf),"y" (buf + 84),"z" (data),[rotnumber] "r" (y%8):"r16","r17","r18","r19");
}
Sat Aug 08, 2015 12:38 am
Sat Aug 08, 2015 6:48 am
Drakker wrote:Great job!
Quick thing you might want to test. With the default library, the more black pixels there are in the sprite, the longer it takes to draw it (not really surprising). So you might want to test your timings with an 8x8 black square and other mixes of transparency/white and black. That will give you a more accurate window for the time required to draw a sprite using your improved drawing function.
Sat Aug 08, 2015 9:27 am
Sat Aug 08, 2015 10:41 am
Sat Aug 08, 2015 12:13 pm
Sat Aug 08, 2015 7:46 pm
Sat Aug 08, 2015 8:40 pm
jonnection wrote:@sorunome
Default optimization setting in Arduino IDE for avr-gcc is -Os (optimize for size). You can find it in platforms.txt in compiler flags... if I remeber it right. -O2 or -O3 will give bigger but faster code.
Edit: this means how c to asm is optimized. It will not affect your inline asm, for the sake of clarification to all. And your asm will be quicker anyhow. In any case, the difference between the compiler output and inline asm should not be 2x. The compiler is doing a lousy job if that is the case.
Sun Aug 09, 2015 5:03 am
void __attribute__((optimize("O2"))) whateverfunction(unsigned char data) {
// Function code
}
Sun Aug 09, 2015 9:31 am
jonnection wrote:Yeah. I just thought I'd bring up the topic since optimization is whats being discussed here. At least its useful to know that Arduino IDE default setting is -Os (optimize for size).
You are 100% correct -O2 will blow up the program size.
Setting optimization flags can also be done for only for 1 funtion:
- Code:
void __attribute__((optimize("O2"))) whateverfunction(unsigned char data) {
// Function code
}
The point is that the C compiler should be clever enough to put (edit: and keep!!!) variables in same registers during the function, just as you have done with asm.