Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HidController: loop until no more messages are available on poll #2970

Merged
merged 18 commits into from
Oct 4, 2020

Conversation

Be-ing
Copy link
Contributor

@Be-ing Be-ing commented Jul 27, 2020

Otherwise, it seems messages build up in a queue which produces
a dramatic lag on Linux when moving faders quickly. This
regression was introduced between Mixxx 2.2 and 2.3 beta, likely
by switching HidController to nonblocking polling in PR #2179.

@codecat previously tested this on Windows. @codecat if you could double check with this branch, that would be helpful.

@codecat
Copy link
Contributor

codecat commented Jul 27, 2020

Are there pre-built binaries for PR's, or do I have to build it myself?

@Be-ing
Copy link
Contributor Author

Be-ing commented Jul 27, 2020

You can use the AppVeyor artifact to test PR builds.

@codecat
Copy link
Contributor

codecat commented Jul 27, 2020

Thanks, I will be able to test it in a few hours.

@Be-ing Be-ing added this to the 2.3.0 milestone Jul 27, 2020
@codecat
Copy link
Contributor

codecat commented Jul 27, 2020

Build on Appveyor right now seems to have broken my HID controller. Only the "initial" update is handled, any changes to knobs and sliders aren't happening in Mixxx. Restarting the HID script in the settings puts the sliders and knobs to what they actually are on the controller.

So it seems like only the first packet is being received and handled.

@Be-ing
Copy link
Contributor Author

Be-ing commented Jul 31, 2020

Hmm, I don't know how to explain that or work around it. @codecat do you have any ideas why that is happening now despite that you reported this approach worked before? What controller are you using again?

@codecat
Copy link
Contributor

codecat commented Jul 31, 2020

I am using a Gemini GMX controller. It's very spammy when it comes to HID messages, so debugging it is quite a pain.. 😂

Looking at the code, I think your poll() function will never actually return true, perhaps that could be the problem? I'm not sure if the return value of poll() is ever checked, but it should probably return true if it handled any data at all.

Edit: Here's my HID controller mapping: https://github.com/codecat/mixxx-gemini-gmx

@Be-ing
Copy link
Contributor Author

Be-ing commented Jul 31, 2020

I set a breakpoint on the return true; line in HidController::poll and it does reach that.

@Be-ing
Copy link
Contributor Author

Be-ing commented Jul 31, 2020

This is a shot in the dark, but maybe try 297537a without 1ac6500?

@codecat
Copy link
Contributor

codecat commented Jul 31, 2020

Oh, I guess hid_read can return 0 too! My bad. I'll have to check AppVeyor for builds with the individual commits, I can test it a bit later today.

@Be-ing
Copy link
Contributor Author

Be-ing commented Jul 31, 2020

AppVeyor didn't make a build 297537a because I pushed the next commit before that build finished.

@Be-ing
Copy link
Contributor Author

Be-ing commented Aug 1, 2020

Build on Appveyor right now seems to have broken my HID controller. Only the "initial" update is handled, any changes to knobs and sliders aren't happening in Mixxx. Restarting the HID script in the settings puts the sliders and knobs to what they actually are on the controller.

I tested the AppVeyor build on my friend's computer running Windows 8 with a NI Traktor Kontrol S2 Mk2 and cannot reproduce this.

@Be-ing
Copy link
Contributor Author

Be-ing commented Aug 1, 2020

@codecat I have a guess what might be happening. IIRC you've mentioned before that your controller sends HID messages very frequently. Perhaps the HidController::poll loop never returns because of this and blocks the Qt event loop so even if the script is executed, its changes to Mixxx ControlObjects are not propagated. @daschuer does that seem plausible?

If that's what is happening, perhaps we could track the execution time of HidController::poll and exit the loop in case it doesn't return in time.

@daschuer
Copy link
Member

daschuer commented Aug 2, 2020

That can be a band-aid, but does imho not really solve the problem.

Do wa have a way to visualize the issue?

If the controller emmit more messages Mixxx can handle they will pile up and overflow at a different buffer.
A better solution is to parse and discard uninteresting messages as early as possible.

Do we know if there are unwanted messages, or messages that we need only in a lower rate?

Are there filter facilities somewhere in the hid stack?

@codecat
Copy link
Contributor

codecat commented Aug 2, 2020

Might it be an idea to handle this at the HID script level? eg. some kind of option you can set in the Javascript that will change this behavior. I think that could work since this might be different for all kinds of controllers.

Mine is incredibly spammy, for example, but a "proper" controller might only send messages when something really changed.

@Be-ing
Copy link
Contributor Author

Be-ing commented Aug 2, 2020

@codecat is your controller spamming identical consecutive messages when nothing changes? If so, those would be easy to filter out.

@Be-ing
Copy link
Contributor Author

Be-ing commented Aug 2, 2020

That can be a band-aid, but does imho not really solve the problem.

I am not sure I have actually identified the problem, that is just a guess.

@codecat
Copy link
Contributor

codecat commented Aug 2, 2020

@codecat is your controller spamming identical consecutive messages when nothing changes? If so, those would be easy to filter out.

Yes, exactly that. It writes the state of everything, all the time. I imagine filtering out identical messages can be problematic though, what if a HID device sends a message "I pressed button X", filtering duplicates would ignore any double presses of button X. (Disclaimer: I'm not familiar with any other HID devices, perhaps this doesn't happen in reality, I'm not sure)

@Be-ing
Copy link
Contributor Author

Be-ing commented Aug 2, 2020

I added a check to filter redundant messages in dccade5. @codecat please give that a try. Unfortunately it requires a performance penalty by making a deep copy of the QByteArray whenever a non-redundant message is received. It might be possible to implement a similar check in your script so not every HID controller has that performance penalty.

@Be-ing
Copy link
Contributor Author

Be-ing commented Aug 2, 2020

@codecat please also try implementing a check for redundant incoming data in your script's incomingData function (before doing any other processing of the data) with the AppVeyor build of 297537a.

@daschuer
Copy link
Member

daschuer commented Aug 2, 2020

if a HID device sends a message "I pressed button X",

I guess the controller sends a press and a release event, else every repeated message would be a new press.

Is the state send as a single blob? If not the filter from the last commit will not work.

@codecat: Do you have a record from on or two cycles of the controller?

Are you sure "everything" is send during the cycle? I can imagine that some controls are updated in a cycle and for instant the jog-wheel is send spontaneous. Please verify.

@Be-ing
Copy link
Contributor Author

Be-ing commented Aug 2, 2020

HID reports the state of multiple components simultaneously. Sometimes devices report everything in one packet, but many split it across a few different packets. AFAIK it is unusually for a controller to send HID data constantly regardless of the state of the controller.

@codecat
Copy link
Contributor

codecat commented Aug 2, 2020

I would imagine that. Yes, even if I don't touch anything on my controller (it's sitting there idle, just plugged in) I receive an endless supply of identical HID packets.

When I first started working with this controller I wrote a quick tool to dump the packets it receives to see how it works;

image

This is the entire state, and the packet is always the same if it's idle.

@codecat
Copy link
Contributor

codecat commented Aug 2, 2020

Just tried the build from dccade5 and my controller works with that fix.

@Be-ing Out of curiosity, why do you want me to try it with that build and a js redundancy check? I'm able to do it, but I'm not sure what the purpose would be.

@Be-ing
Copy link
Contributor Author

Be-ing commented Aug 2, 2020

Okay, glad we identified the issue. I don't think dccade5 is a great solution because it requires a performance penalty for every HID controller by making a deep copy of the incoming data only for the purpose of checking that the next packet isn't redundant. For most controllers this is not necessary, so I think it would be better to check for redundant data in JS if that works.

@codecat
Copy link
Contributor

codecat commented Aug 2, 2020

Ah, gotcha. I don't think that would work though, since the C++ code would still be looping either way (although I guess slightly faster if the js does the check instead of processing the packet?)

@daschuer
Copy link
Member

No, we need a good documentation of the implications of this change as source code comment.

Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

@daschuer
Copy link
Member

daschuer commented Oct 1, 2020

@uklotzde: merge?

Copy link
Contributor

@uklotzde uklotzde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, late to the party.

Just some comments to improve the code quality.

Trace process("HidController process packet");
QByteArray outData(reinterpret_cast<char*>(m_pPollData), result);
receive(outData, mixxx::Time::elapsed());
int result = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result does not need to be defined outside the loop. An infinite for loop with explicit return points would be much easier to follow.

Setting result to 1 before the loop is also conceptually incorrect and an ugly hack. The value represents the number of received bytes. But we haven't received anything yet and the value 1 is arbitrary, anticipating some assumptions of code in another (though near) context.

unsigned char* pCurrentBuffer = m_pPollData[m_iPollingBufferIndex];

result = hid_read(m_pHidDevice, pCurrentBuffer, kBufferSize);
if (result == -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use early returns and avoid all the nesting:

  • result < 0 -> return false
  • result == 0 -> return true
  • ...DEBUG_ASSERT(result > 0) and process the received package

Copy link
Contributor Author

@Be-ing Be-ing Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this obviates the need for the hacky int result = 1 too: 2343bbf

// There is no safety net for this because it has not been demonstrated to be
// a problem in practice.
while (result > 0) {
// Rotate between two buffers so the memcmp below does not require deep copying to another buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Rotate between two buffers ..." -> "Cycle between disjunct input buffers ..."

"two" depends on the value of the constant

@Be-ing
Copy link
Contributor Author

Be-ing commented Oct 4, 2020

ping

unsigned char* pCurrentBuffer = m_pPollData[m_iPollingBufferIndex];

int bytesRead = hid_read(m_pHidDevice, pCurrentBuffer, kBufferSize);
if (bytesRead == -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs -1 is the only negative value that we need to handle. Nevertheless, I would check for bytesRead < 0 and add a DEBUG_ASSERT(bytesRead == -1) in this if branch. All possible values should be handled.

Copy link
Contributor Author

@Be-ing Be-ing Oct 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done dff0202

Ready for merge? This critical bug fix has already been waiting more than 2 months.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 4, 2020

Thank you for your patience. LGTM

@uklotzde uklotzde merged commit dc30a52 into mixxxdj:2.3 Oct 4, 2020
@Be-ing Be-ing deleted the hid_polling branch October 5, 2020 03:08
@ywwg
Copy link
Member

ywwg commented Oct 7, 2020

I'm afraid this seems to break my traktor s3 mapping. Button pushes sometimes work, and sometimes don't. sometimes they don't appear in the debug output and nothing happens, sometimes they don't appear at all. reverting to before this PR fixes the issue. Happy to help with debugging

@ywwg
Copy link
Member

ywwg commented Oct 7, 2020

(faders seem to work ok, it's buttons that are misbehaving)

@Be-ing
Copy link
Contributor Author

Be-ing commented Oct 7, 2020

I have a few suggestions to start debugging:

  1. Try commenting out this line.
  2. Implement some very simple proof-of-concept JS for handling the button without the hid-packet-parser.js library to check that isn't the problem.
  3. Does it matter if you press and release the button very quickly or hold the button down?

@ywwg
Copy link
Member

ywwg commented Oct 8, 2020

The bug is that the traktor S3 constantly spams 0-length messages, but the code flips the buffers every iteration whether a message is read or not. This makes it essentially random whether a message will get loaded into one buffer or the other. If I push a button more than once, there is a high likelihood that the two buffers will contain the same content, and the comparison will succeed and the message will be ignored. Here's my fix:

        unsigned char* pPreviousBuffer = m_pPollData[m_iPollingBufferIndex];
        const int nextBufIndex = (m_iPollingBufferIndex + 1) % kNumBuffers;
        unsigned char* pCurrentBuffer = m_pPollData[nextBufIndex];

        int bytesRead = hid_read(m_pHidDevice, pCurrentBuffer, kBufferSize);
        if (bytesRead < 0) {
            // -1 is the only error value according to hidapi documentation.
            DEBUG_ASSERT(bytesRead == -1);
            return false;
        } else if (bytesRead == 0) {
            return true;
        }
        m_iPollingBufferIndex = nextBufIndex;

i.e., we should not increment m_iPollingBufferIndex for failed/empty reads

@Be-ing
Copy link
Contributor Author

Be-ing commented Oct 8, 2020

Good catch. Could you open a PR for the 2.3 branch with that fix?

@uklotzde
Copy link
Contributor

uklotzde commented Oct 8, 2020

This is not a fix. There is an essential issue with the double buffering code!! We forgot to record and compare the actual length of the received buffers! Otherwise, you are comparing bytes from different read operations in the past, which are considered uninitialized.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 8, 2020

If the size of the payload is strictly limited to 255 bytes the first byte of a 1 + 255 byte buffer could be used for encoding the length in-place.

@ywwg
Copy link
Member

ywwg commented Oct 8, 2020

which are considered uninitialized.

I don't think so -- the pointer assignments mean that pPreviousBuffer and pCurrentBuffer are always pointing to valid data inside m_pPollData

@ywwg
Copy link
Member

ywwg commented Oct 8, 2020

oh I see what you mean, we don't know how much of the buffer to compare! good catch

@ywwg
Copy link
Member

ywwg commented Oct 8, 2020

I don't really have time to make a PR to fix this myself, sorry!

(I lied, PR shortly)

@ywwg
Copy link
Member

ywwg commented Oct 8, 2020

we could also just memset the buffers to zero before the read

@daschuer
Copy link
Member

daschuer commented Oct 9, 2020

Storing the buffer length as first byte seems to be more perfomant.

// This assumes that the redundant packets all use the same report ID. In practice we
// have not encountered any controllers that send redundant packets with different report
// IDs. If any such devices exist, this may be changed to use a separate buffer to store
// the last packet for each report ID.
Copy link
Member

@JoergAtGithub JoergAtGithub Nov 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Be-ing I found an issue with the following assumption, while working on #3317:
There seems to be a bug in the Windows implementation of hid_read. It always returns the number of bytes of the largest input report. While hid_get_input_report returns exact the number of bytes that the report should have.
I filed a bug report for hidapi ( libusb/hidapi#210 ) and hope for clarification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean this always loops infinitely??

Copy link
Contributor Author

@Be-ing Be-ing Nov 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume hid_read is correctly returning 0 when all packets have been read so there is no infinite loop otherwise I presume you or @codecat would have noticed this earlier. Can you confirm this?

What happens when reading the smaller packet past its true size? Are the bytes after that random garbage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No infinite loop, it fills the remaining bytes with garbage (looks like data from the bigger report). I expect that this comparisition will be triggered by this garbage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's not as bad as an infinite loop. I suppose you could temporarily hack around in your controller script until the hidapi bug is fixed upstream. If you want to work on the hidapi bug, we could merge a fix in our bundled version of hidapi if you open a pull request upstream.

As noted in the comment, this comparison checking if the current buffer is identical to the previous buffer will not evaluate to true if the device uses multiple report IDs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not such a controller. I stumbled over this, because I used hid_get_input_report, which behaves different (correct) than the existing code with hid_read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants