HidController: loop until no more messages are available on poll #2970

Be-ing · 2020-07-27T13:17:46Z

Otherwise, it seems messages build up in a queue which produces
a dramatic lag on Linux when moving faders quickly. This
regression was introduced between Mixxx 2.2 and 2.3 beta, likely
by switching HidController to nonblocking polling in PR #2179.

@codecat previously tested this on Windows. @codecat if you could double check with this branch, that would be helpful.

codecat · 2020-07-27T13:23:49Z

Are there pre-built binaries for PR's, or do I have to build it myself?

Be-ing · 2020-07-27T13:27:04Z

You can use the AppVeyor artifact to test PR builds.

codecat · 2020-07-27T13:45:10Z

Thanks, I will be able to test it in a few hours.

codecat · 2020-07-27T18:21:20Z

Build on Appveyor right now seems to have broken my HID controller. Only the "initial" update is handled, any changes to knobs and sliders aren't happening in Mixxx. Restarting the HID script in the settings puts the sliders and knobs to what they actually are on the controller.

So it seems like only the first packet is being received and handled.

Be-ing · 2020-07-31T14:46:32Z

Hmm, I don't know how to explain that or work around it. @codecat do you have any ideas why that is happening now despite that you reported this approach worked before? What controller are you using again?

codecat · 2020-07-31T14:50:40Z

I am using a Gemini GMX controller. It's very spammy when it comes to HID messages, so debugging it is quite a pain.. 😂

Looking at the code, I think your poll() function will never actually return true, perhaps that could be the problem? I'm not sure if the return value of poll() is ever checked, but it should probably return true if it handled any data at all.

Edit: Here's my HID controller mapping: https://github.com/codecat/mixxx-gemini-gmx

Be-ing · 2020-07-31T14:57:59Z

I set a breakpoint on the return true; line in HidController::poll and it does reach that.

Be-ing · 2020-07-31T15:07:21Z

This is a shot in the dark, but maybe try 297537a without 1ac6500?

codecat · 2020-07-31T15:20:12Z

Oh, I guess hid_read can return 0 too! My bad. I'll have to check AppVeyor for builds with the individual commits, I can test it a bit later today.

Be-ing · 2020-07-31T16:13:00Z

AppVeyor didn't make a build 297537a because I pushed the next commit before that build finished.

Be-ing · 2020-08-01T23:28:05Z

Build on Appveyor right now seems to have broken my HID controller. Only the "initial" update is handled, any changes to knobs and sliders aren't happening in Mixxx. Restarting the HID script in the settings puts the sliders and knobs to what they actually are on the controller.

I tested the AppVeyor build on my friend's computer running Windows 8 with a NI Traktor Kontrol S2 Mk2 and cannot reproduce this.

Be-ing · 2020-08-01T23:50:40Z

@codecat I have a guess what might be happening. IIRC you've mentioned before that your controller sends HID messages very frequently. Perhaps the HidController::poll loop never returns because of this and blocks the Qt event loop so even if the script is executed, its changes to Mixxx ControlObjects are not propagated. @daschuer does that seem plausible?

If that's what is happening, perhaps we could track the execution time of HidController::poll and exit the loop in case it doesn't return in time.

daschuer · 2020-08-02T10:05:31Z

That can be a band-aid, but does imho not really solve the problem.

Do wa have a way to visualize the issue?

If the controller emmit more messages Mixxx can handle they will pile up and overflow at a different buffer.
A better solution is to parse and discard uninteresting messages as early as possible.

Do we know if there are unwanted messages, or messages that we need only in a lower rate?

Are there filter facilities somewhere in the hid stack?

codecat · 2020-08-02T10:12:49Z

Might it be an idea to handle this at the HID script level? eg. some kind of option you can set in the Javascript that will change this behavior. I think that could work since this might be different for all kinds of controllers.

Mine is incredibly spammy, for example, but a "proper" controller might only send messages when something really changed.

Be-ing · 2020-08-02T15:54:46Z

@codecat is your controller spamming identical consecutive messages when nothing changes? If so, those would be easy to filter out.

Be-ing · 2020-08-02T15:55:20Z

That can be a band-aid, but does imho not really solve the problem.

I am not sure I have actually identified the problem, that is just a guess.

codecat · 2020-08-02T16:31:23Z

@codecat is your controller spamming identical consecutive messages when nothing changes? If so, those would be easy to filter out.

Yes, exactly that. It writes the state of everything, all the time. I imagine filtering out identical messages can be problematic though, what if a HID device sends a message "I pressed button X", filtering duplicates would ignore any double presses of button X. (Disclaimer: I'm not familiar with any other HID devices, perhaps this doesn't happen in reality, I'm not sure)

Be-ing · 2020-08-02T16:33:19Z

I added a check to filter redundant messages in dccade5. @codecat please give that a try. Unfortunately it requires a performance penalty by making a deep copy of the QByteArray whenever a non-redundant message is received. It might be possible to implement a similar check in your script so not every HID controller has that performance penalty.

Be-ing · 2020-08-02T16:39:12Z

@codecat please also try implementing a check for redundant incoming data in your script's incomingData function (before doing any other processing of the data) with the AppVeyor build of 297537a.

daschuer · 2020-08-02T18:28:12Z

if a HID device sends a message "I pressed button X",

I guess the controller sends a press and a release event, else every repeated message would be a new press.

Is the state send as a single blob? If not the filter from the last commit will not work.

@codecat: Do you have a record from on or two cycles of the controller?

Are you sure "everything" is send during the cycle? I can imagine that some controls are updated in a cycle and for instant the jog-wheel is send spontaneous. Please verify.

Be-ing · 2020-08-02T18:36:20Z

HID reports the state of multiple components simultaneously. Sometimes devices report everything in one packet, but many split it across a few different packets. AFAIK it is unusually for a controller to send HID data constantly regardless of the state of the controller.

codecat · 2020-08-02T18:42:45Z

I would imagine that. Yes, even if I don't touch anything on my controller (it's sitting there idle, just plugged in) I receive an endless supply of identical HID packets.

When I first started working with this controller I wrote a quick tool to dump the packets it receives to see how it works;

This is the entire state, and the packet is always the same if it's idle.

codecat · 2020-08-02T23:10:57Z

Just tried the build from dccade5 and my controller works with that fix.

@Be-ing Out of curiosity, why do you want me to try it with that build and a js redundancy check? I'm able to do it, but I'm not sure what the purpose would be.

Be-ing · 2020-08-02T23:19:44Z

Okay, glad we identified the issue. I don't think dccade5 is a great solution because it requires a performance penalty for every HID controller by making a deep copy of the incoming data only for the purpose of checking that the next packet isn't redundant. For most controllers this is not necessary, so I think it would be better to check for redundant data in JS if that works.

codecat · 2020-08-02T23:25:43Z

Ah, gotcha. I don't think that would work though, since the C++ code would still be looping either way (although I guess slightly faster if the js does the check instead of processing the packet?)

daschuer · 2020-09-26T08:32:39Z

No, we need a good documentation of the implications of this change as source code comment.

daschuer

LGTM, thank you.

daschuer · 2020-10-01T05:27:18Z

@uklotzde: merge?

uklotzde

Sorry, late to the party.

Just some comments to improve the code quality.

src/controllers/controllermanager.h

uklotzde · 2020-10-01T09:35:21Z

src/controllers/hid/hidcontroller.cpp

-        Trace process("HidController process packet");
-        QByteArray outData(reinterpret_cast<char*>(m_pPollData), result);
-        receive(outData, mixxx::Time::elapsed());
+    int result = 1;


result does not need to be defined outside the loop. An infinite for loop with explicit return points would be much easier to follow.

Setting result to 1 before the loop is also conceptually incorrect and an ugly hack. The value represents the number of received bytes. But we haven't received anything yet and the value 1 is arbitrary, anticipating some assumptions of code in another (though near) context.

uklotzde · 2020-10-01T09:37:12Z

src/controllers/hid/hidcontroller.cpp

+        unsigned char* pCurrentBuffer = m_pPollData[m_iPollingBufferIndex];
+
+        result = hid_read(m_pHidDevice, pCurrentBuffer, kBufferSize);
+        if (result == -1) {


I would use early returns and avoid all the nesting:

result < 0 -> return false

result == 0 -> return true

...DEBUG_ASSERT(result > 0) and process the received package

Sure, this obviates the need for the hacky int result = 1 too: 2343bbf

uklotzde · 2020-10-01T09:47:02Z

src/controllers/hid/hidcontroller.cpp

+    // There is no safety net for this because it has not been demonstrated to be
+    // a problem in practice.
+    while (result > 0) {
+        // Rotate between two buffers so the memcmp below does not require deep copying to another buffer.


"Rotate between two buffers ..." -> "Cycle between disjunct input buffers ..."

"two" depends on the value of the constant

Be-ing · 2020-10-04T12:02:39Z

ping

uklotzde · 2020-10-04T12:22:26Z

src/controllers/hid/hidcontroller.cpp

+        unsigned char* pCurrentBuffer = m_pPollData[m_iPollingBufferIndex];
+
+        int bytesRead = hid_read(m_pHidDevice, pCurrentBuffer, kBufferSize);
+        if (bytesRead == -1) {


According to the docs -1 is the only negative value that we need to handle. Nevertheless, I would check for bytesRead < 0 and add a DEBUG_ASSERT(bytesRead == -1) in this if branch. All possible values should be handled.

done dff0202

Ready for merge? This critical bug fix has already been waiting more than 2 months.

uklotzde · 2020-10-04T20:26:52Z

Thank you for your patience. LGTM

ywwg · 2020-10-07T19:14:47Z

I'm afraid this seems to break my traktor s3 mapping. Button pushes sometimes work, and sometimes don't. sometimes they don't appear in the debug output and nothing happens, sometimes they don't appear at all. reverting to before this PR fixes the issue. Happy to help with debugging

ywwg · 2020-10-07T19:15:20Z

(faders seem to work ok, it's buttons that are misbehaving)

Be-ing · 2020-10-07T21:12:52Z

I have a few suggestions to start debugging:

Try commenting out this line.
Implement some very simple proof-of-concept JS for handling the button without the hid-packet-parser.js library to check that isn't the problem.
Does it matter if you press and release the button very quickly or hold the button down?

ywwg · 2020-10-08T03:28:27Z

The bug is that the traktor S3 constantly spams 0-length messages, but the code flips the buffers every iteration whether a message is read or not. This makes it essentially random whether a message will get loaded into one buffer or the other. If I push a button more than once, there is a high likelihood that the two buffers will contain the same content, and the comparison will succeed and the message will be ignored. Here's my fix:

        unsigned char* pPreviousBuffer = m_pPollData[m_iPollingBufferIndex];
        const int nextBufIndex = (m_iPollingBufferIndex + 1) % kNumBuffers;
        unsigned char* pCurrentBuffer = m_pPollData[nextBufIndex];

        int bytesRead = hid_read(m_pHidDevice, pCurrentBuffer, kBufferSize);
        if (bytesRead < 0) {
            // -1 is the only error value according to hidapi documentation.
            DEBUG_ASSERT(bytesRead == -1);
            return false;
        } else if (bytesRead == 0) {
            return true;
        }
        m_iPollingBufferIndex = nextBufIndex;

i.e., we should not increment m_iPollingBufferIndex for failed/empty reads

Be-ing · 2020-10-08T06:44:57Z

Good catch. Could you open a PR for the 2.3 branch with that fix?

uklotzde · 2020-10-08T06:59:16Z

This is not a fix. There is an essential issue with the double buffering code!! We forgot to record and compare the actual length of the received buffers! Otherwise, you are comparing bytes from different read operations in the past, which are considered uninitialized.

uklotzde · 2020-10-08T07:34:08Z

If the size of the payload is strictly limited to 255 bytes the first byte of a 1 + 255 byte buffer could be used for encoding the length in-place.

ywwg · 2020-10-08T14:01:05Z

which are considered uninitialized.

I don't think so -- the pointer assignments mean that pPreviousBuffer and pCurrentBuffer are always pointing to valid data inside m_pPollData

ywwg · 2020-10-08T14:03:01Z

oh I see what you mean, we don't know how much of the buffer to compare! good catch

ywwg · 2020-10-08T14:03:26Z

I don't really have time to make a PR to fix this myself, sorry!

(I lied, PR shortly)

ywwg · 2020-10-08T15:19:15Z

we could also just memset the buffers to zero before the read

daschuer · 2020-10-09T05:22:21Z

Storing the buffer length as first byte seems to be more perfomant.

JoergAtGithub · 2020-11-15T14:55:37Z

src/controllers/hid/hidcontroller.cpp

+        // This assumes that the redundant packets all use the same report ID. In practice we
+        // have not encountered any controllers that send redundant packets with different report
+        // IDs. If any such devices exist, this may be changed to use a separate buffer to store
+        // the last packet for each report ID.


@Be-ing I found an issue with the following assumption, while working on #3317:
There seems to be a bug in the Windows implementation of hid_read. It always returns the number of bytes of the largest input report. While hid_get_input_report returns exact the number of bytes that the report should have.
I filed a bug report for hidapi ( libusb/hidapi#210 ) and hope for clarification.

Does that mean this always loops infinitely??

I presume hid_read is correctly returning 0 when all packets have been read so there is no infinite loop otherwise I presume you or @codecat would have noticed this earlier. Can you confirm this?

What happens when reading the smaller packet past its true size? Are the bytes after that random garbage?

No infinite loop, it fills the remaining bytes with garbage (looks like data from the bigger report). I expect that this comparisition will be triggered by this garbage.

Okay, that's not as bad as an infinite loop. I suppose you could temporarily hack around in your controller script until the hidapi bug is fixed upstream. If you want to work on the hidapi bug, we could merge a fix in our bundled version of hidapi if you open a pull request upstream.

As noted in the comment, this comparison checking if the current buffer is identical to the previous buffer will not evaluate to true if the device uses multiple report IDs.

I've not such a controller. I stumbled over this, because I used hid_get_input_report, which behaves different (correct) than the existing code with hid_read.

Be-ing added the major bug label Jul 27, 2020

Be-ing force-pushed the hid_polling branch from 1d3dadf to 297537a Compare July 27, 2020 13:18

Be-ing mentioned this pull request Jul 27, 2020

ControllerManager: poll every 1ms on Linux #2966

Closed

Be-ing added this to the 2.3.0 milestone Jul 27, 2020

HidController: add more comments

447a432

daschuer approved these changes Oct 1, 2020

View reviewed changes

uklotzde reviewed Oct 1, 2020

View reviewed changes

Be-ing force-pushed the hid_polling branch from 8e7e402 to 9b0d6b2 Compare October 1, 2020 21:03

Be-ing added 2 commits October 1, 2020 16:06

HidController: reorganize polling loop code

2343bbf

HidController: remove reference to constant from comment

018f1e2

Be-ing force-pushed the hid_polling branch from 9b0d6b2 to 018f1e2 Compare October 1, 2020 21:06

uklotzde reviewed Oct 4, 2020

View reviewed changes

HidController: handle all possible return values from hidapi

dff0202

uklotzde approved these changes Oct 4, 2020

View reviewed changes

uklotzde merged commit dc30a52 into mixxxdj:2.3 Oct 4, 2020

Be-ing deleted the hid_polling branch October 5, 2020 03:08

JoergAtGithub reviewed Nov 15, 2020

View reviewed changes

HidController: loop until no more messages are available on poll #2970

HidController: loop until no more messages are available on poll #2970

Conversation

Be-ing commented Jul 27, 2020 • edited Loading

codecat commented Jul 27, 2020

Be-ing commented Jul 27, 2020

codecat commented Jul 27, 2020

codecat commented Jul 27, 2020

Be-ing commented Jul 31, 2020

codecat commented Jul 31, 2020 • edited Loading

Be-ing commented Jul 31, 2020 • edited Loading

Be-ing commented Jul 31, 2020

codecat commented Jul 31, 2020

Be-ing commented Jul 31, 2020

Be-ing commented Aug 1, 2020

Be-ing commented Aug 1, 2020 • edited Loading

daschuer commented Aug 2, 2020

codecat commented Aug 2, 2020

Be-ing commented Aug 2, 2020

Be-ing commented Aug 2, 2020

codecat commented Aug 2, 2020 • edited Loading

Be-ing commented Aug 2, 2020 • edited Loading

Be-ing commented Aug 2, 2020

daschuer commented Aug 2, 2020

Be-ing commented Aug 2, 2020

codecat commented Aug 2, 2020

codecat commented Aug 2, 2020

Be-ing commented Aug 2, 2020

codecat commented Aug 2, 2020

daschuer commented Sep 26, 2020

daschuer left a comment

Choose a reason for hiding this comment

daschuer commented Oct 1, 2020

uklotzde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Be-ing Oct 1, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Be-ing commented Oct 4, 2020

Choose a reason for hiding this comment

Be-ing Oct 4, 2020 • edited Loading

Choose a reason for hiding this comment

uklotzde commented Oct 4, 2020

ywwg commented Oct 7, 2020

ywwg commented Oct 7, 2020

Be-ing commented Oct 7, 2020

ywwg commented Oct 8, 2020 • edited Loading

Be-ing commented Oct 8, 2020

uklotzde commented Oct 8, 2020

uklotzde commented Oct 8, 2020

ywwg commented Oct 8, 2020

ywwg commented Oct 8, 2020

ywwg commented Oct 8, 2020 • edited Loading

ywwg commented Oct 8, 2020

daschuer commented Oct 9, 2020

JoergAtGithub Nov 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Be-ing Nov 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Be-ing commented Jul 27, 2020 •

edited

Loading

codecat commented Jul 31, 2020 •

edited

Loading

Be-ing commented Jul 31, 2020 •

edited

Loading

Be-ing commented Aug 1, 2020 •

edited

Loading

codecat commented Aug 2, 2020 •

edited

Loading

Be-ing commented Aug 2, 2020 •

edited

Loading

Be-ing Oct 1, 2020 •

edited

Loading

Be-ing Oct 4, 2020 •

edited

Loading

ywwg commented Oct 8, 2020 •

edited

Loading

ywwg commented Oct 8, 2020 •

edited

Loading

JoergAtGithub Nov 15, 2020 •

edited

Loading

Be-ing Nov 15, 2020 •

edited

Loading