Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to use much faster Hardware SPI instead of shiftOut #13

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ericfont
Copy link

Measuring on my 16 Mhz Trinket Pro with a MAX7219, a full 8-digit clear takes only 124 to 132 microseconds now with Hardware SPI, instead of 1656 to 1664 microseconds with shiftOut. That is over 13x speedup.

to initiate with hardware spi, use the same constructor but append an additional parameter of useHardwareSPI = true, otherwise that parameter will default to false. useHardwareSPI will ignores the pass dinPin and clkPin parameters, and instead set those internally to USE_HW_SPI which is defined as 255 (which is the same definition used in AdaFruit's DotStar library: https://github.com/adafruit/Adafruit_DotStar/blob/master/Adafruit_DotStar.cpp#L47).

Unix only understands forward slash for directories.  Windows understands both forward slash and backward slash.
If create DigitLedDisplay with the optional parameter useHardwareSPI set true, then the write() function uses the much faster SPI.transfer16() function instead of the much slower shiftOut().
@ericfont
Copy link
Author

ericfont commented Feb 23, 2021

let me know if you would like a different interface for enabling hardware SPI or any other changes.

Also, note it may be faster to avoid the if statement in write(), and to maybe combined the writing of all digits into one big SPI.beginTransaction()-SPI.endTransaction() block.

Also it might be faster to create the SPISettings object only once and reuse it, since it will be the same for every call to SPI.beginTransaction();

Note, the SPI header file is: https://github.com/arduino/ArduinoCore-avr/blob/master/libraries/SPI/src/SPI.h, and it seems like there is some overhead in those calls.

Considering the MAX chip is 10 Mhz, then the fastest time to perform a clear of 8 digits with 16 bits for each digit would be exactly 8 * 16 / 10 = 12.8 microseconds, so I think there is still opportunity for improvement of another 10x more than what I'm getting now (unless I'm calculating wrong).

I would also say, it might make more sense to have a #define USE_SPI so all the if statements can go away at compile time.

@ericfont
Copy link
Author

Out of curiosity, prestoring the SPISettings doesn't really save any noticable time.

However, if I unroll the clear loop and take begin & end transaction calls outside of the main loop, then the speed increases from 132 microseconds to only 100 microseconds:

void DigitLedDisplay::clear() {
	    SPI.beginTransaction(MAXIM_SPISettings);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x100);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x200);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x300);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x400);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x500);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x600);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x700);
	    digitalWrite(CS_PIN, HIGH);
	    digitalWrite(CS_PIN, LOW);
	    SPI.transfer16(0x800);
	    digitalWrite(CS_PIN, HIGH);
	    SPI.endTransaction();
}

I've found that most of the time gets spent setting the CS_PIN to HIGH and then LOW. If I remove those digitalWrites, then the speed goes down to 32 microseconds, which is much closer to the theoretical maximum of 12.8 microseconds. But of course those digitalWrites can't be taken out because they're needed for the Maxim to latch the shifted data.

@ericfont
Copy link
Author

ericfont commented Feb 23, 2021

And looking at the specs on the Maxim's serial transfer:

image

It seems that it would not be possible to do a SPI.transfer(buffer, size) call to speed up the whole transfer of 8 digits, without generating a special hardware clock that rose and fell exactly on every 16th SPI clock. (Maybe there is a way to set CS_PIN as a PWM output that has the duty cycle carefully controlled such that it goes high then low on every 16th SPI clock cycle only.)

@ericfont
Copy link
Author

There also appears to be some opportunity for optimization by writing a custom pipelined SPI transfer function based off of https://github.com/arduino/ArduinoCore-avr/blob/master/libraries/SPI/src/SPI.h#L220:

  inline static uint16_t transfer16(uint16_t data) {
    union { uint16_t val; struct { uint8_t lsb; uint8_t msb; }; } in, out;
    in.val = data;
    if (!(SPCR & _BV(DORD))) {
      SPDR = in.msb;
      asm volatile("nop"); // See transfer(uint8_t) function
      while (!(SPSR & _BV(SPIF))) ;
      out.msb = SPDR;
      SPDR = in.lsb;
      asm volatile("nop");
      while (!(SPSR & _BV(SPIF))) ;
      out.lsb = SPDR;
    } else {
      SPDR = in.lsb;
      asm volatile("nop");
      while (!(SPSR & _BV(SPIF))) ;
      out.lsb = SPDR;
      SPDR = in.msb;
      asm volatile("nop");
      while (!(SPSR & _BV(SPIF))) ;
      out.msb = SPDR;
    }
    return out.val;
  }

It seems that the computations for figuring out the next Seven Segment byte to send can be done during the while (!(SPSR & _BV(SPIF))) ; spin loop which is polling a SPI Status register bit (bitmask "SPIF") to detect when the transmission is complete. That time would be a good time to do the conversion to decimal and from decimal to Seven Segment code.

Also there is probably no need to be reading from the SPDR to receive data from the slave since the seven segment display is only useful as an output display, not as input.

Anyway, I think I might implement all that as a separate optimized library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant