Option to use much faster Hardware SPI instead of shiftOut #13

ericfont · 2021-02-23T13:34:41Z

Measuring on my 16 Mhz Trinket Pro with a MAX7219, a full 8-digit clear takes only 124 to 132 microseconds now with Hardware SPI, instead of 1656 to 1664 microseconds with shiftOut. That is over 13x speedup.

to initiate with hardware spi, use the same constructor but append an additional parameter of useHardwareSPI = true, otherwise that parameter will default to false. useHardwareSPI will ignores the pass dinPin and clkPin parameters, and instead set those internally to USE_HW_SPI which is defined as 255 (which is the same definition used in AdaFruit's DotStar library: https://github.com/adafruit/Adafruit_DotStar/blob/master/Adafruit_DotStar.cpp#L47).

Unix only understands forward slash for directories. Windows understands both forward slash and backward slash.

If create DigitLedDisplay with the optional parameter useHardwareSPI set true, then the write() function uses the much faster SPI.transfer16() function instead of the much slower shiftOut().

ericfont · 2021-02-23T13:37:34Z

let me know if you would like a different interface for enabling hardware SPI or any other changes.

Also, note it may be faster to avoid the if statement in write(), and to maybe combined the writing of all digits into one big SPI.beginTransaction()-SPI.endTransaction() block.

Also it might be faster to create the SPISettings object only once and reuse it, since it will be the same for every call to SPI.beginTransaction();

Note, the SPI header file is: https://github.com/arduino/ArduinoCore-avr/blob/master/libraries/SPI/src/SPI.h, and it seems like there is some overhead in those calls.

Considering the MAX chip is 10 Mhz, then the fastest time to perform a clear of 8 digits with 16 bits for each digit would be exactly 8 * 16 / 10 = 12.8 microseconds, so I think there is still opportunity for improvement of another 10x more than what I'm getting now (unless I'm calculating wrong).

I would also say, it might make more sense to have a #define USE_SPI so all the if statements can go away at compile time.

ericfont · 2021-02-23T14:31:05Z

Out of curiosity, prestoring the SPISettings doesn't really save any noticable time.

However, if I unroll the clear loop and take begin & end transaction calls outside of the main loop, then the speed increases from 132 microseconds to only 100 microseconds:

void DigitLedDisplay::clear() {
	    SPI.beginTransaction(MAXIM_SPISettings);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x100);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x200);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x300);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x400);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x500);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x600);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x700);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x800);
	    digitalWrite(CS_PIN, HIGH);
	    SPI.endTransaction();
}

I've found that most of the time gets spent setting the CS_PIN to HIGH and then LOW. If I remove those digitalWrites, then the speed goes down to 32 microseconds, which is much closer to the theoretical maximum of 12.8 microseconds. But of course those digitalWrites can't be taken out because they're needed for the Maxim to latch the shifted data.

ericfont · 2021-02-23T14:43:16Z

And looking at the specs on the Maxim's serial transfer:

It seems that it would not be possible to do a SPI.transfer(buffer, size) call to speed up the whole transfer of 8 digits, without generating a special hardware clock that rose and fell exactly on every 16th SPI clock. (Maybe there is a way to set CS_PIN as a PWM output that has the duty cycle carefully controlled such that it goes high then low on every 16th SPI clock cycle only.)

ericfont · 2021-02-23T15:48:17Z

There also appears to be some opportunity for optimization by writing a custom pipelined SPI transfer function based off of https://github.com/arduino/ArduinoCore-avr/blob/master/libraries/SPI/src/SPI.h#L220:

  inline static uint16_t transfer16(uint16_t data) {
    union { uint16_t val; struct { uint8_t lsb; uint8_t msb; }; } in, out;
    in.val = data;
    if (!(SPCR & _BV(DORD))) {
      SPDR = in.msb;
      asm volatile("nop"); // See transfer(uint8_t) function
      while (!(SPSR & _BV(SPIF))) ;
      out.msb = SPDR;
      SPDR = in.lsb;
      asm volatile("nop");
      while (!(SPSR & _BV(SPIF))) ;
      out.lsb = SPDR;
    } else {
      SPDR = in.lsb;
      asm volatile("nop");
      while (!(SPSR & _BV(SPIF))) ;
      out.lsb = SPDR;
      SPDR = in.msb;
      asm volatile("nop");
      while (!(SPSR & _BV(SPIF))) ;
      out.msb = SPDR;
    }
    return out.val;
  }

It seems that the computations for figuring out the next Seven Segment byte to send can be done during the while (!(SPSR & _BV(SPIF))) ; spin loop which is polling a SPI Status register bit (bitmask "SPIF") to detect when the transmission is complete. That time would be a good time to do the conversion to decimal and from decimal to Seven Segment code.

Also there is probably no need to be reading from the SPDR to receive data from the slave since the seven segment display is only useful as an output display, not as input.

Anyway, I think I might implement all that as a separate optimized library.

ericfont added 2 commits February 23, 2021 03:54

Fixes build for unix

1854d88

Unix only understands forward slash for directories. Windows understands both forward slash and backward slash.

Fast Hardware SPI option

0b76c7f

If create DigitLedDisplay with the optional parameter useHardwareSPI set true, then the write() function uses the much faster SPI.transfer16() function instead of the much slower shiftOut().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to use much faster Hardware SPI instead of shiftOut #13

Option to use much faster Hardware SPI instead of shiftOut #13

ericfont commented Feb 23, 2021

ericfont commented Feb 23, 2021 •

edited

Loading

ericfont commented Feb 23, 2021

ericfont commented Feb 23, 2021 •

edited

Loading

ericfont commented Feb 23, 2021

Option to use much faster Hardware SPI instead of shiftOut #13

Are you sure you want to change the base?

Option to use much faster Hardware SPI instead of shiftOut #13

Conversation

ericfont commented Feb 23, 2021

ericfont commented Feb 23, 2021 • edited Loading

ericfont commented Feb 23, 2021

ericfont commented Feb 23, 2021 • edited Loading

ericfont commented Feb 23, 2021

ericfont commented Feb 23, 2021 •

edited

Loading

ericfont commented Feb 23, 2021 •

edited

Loading