Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demonstrate a test framework bug #2242

Closed
wants to merge 1 commit into from

Conversation

jdyer1
Copy link
Contributor

@jdyer1 jdyer1 commented Feb 2, 2024

There seems to be a subtle bug in our test framework where it is sometimes possible for an unauthorized user to add document(s). This happens when both the client and server run in the same JVM and CloudSolrClient is set up to use Parallel Updates on the client (as with MiniSolrCloudCluster). This is a potential test-only bug, and to my knowledge no existing tests are affected. Yet if left unsolved, this may cause misleading test results in the future.

CloudSolrClient sets up a MDCAwareThreadPoolExecutor to perform parallel updates on the client JVM. Prior to executing a parallel update, ExecutorUtil#setServerThreadFlag is set on the Client JVM's thread(s) from the pool. Such a flag normally would not matter, because this happens on the client JVM and there this flag has no meaning. But our unit tests can run both client and server in the same JVM. This confuses PKIAuthenticationPlugin (server-side) which sees it as an inter-node request.

Although this can allow an unauthorized user to add a document, it does happen on commit. CloudSolrClient does not use parallel updates to request a commit and these always fail when appropriate. This could lead to an unreliable test case that passes or fails only when run slowly enough for an autocommit to finish before comparing documents.

This PR contains a (demonstration-only) failing unit test. See TestAuthorizationWhenClientInSameJvm. This PR adds a flag to MiniSolrCloudCluster to let tests disable parallel updates on the client. If this is deemed an adequate solution, it would be best to default this flag to false with a comment warning of the downsides.

This issue touches on our efforts to reduce the size of the SolrJ client. It seems to me that because we also use SolrJ for inter-node requests that we've perhaps introduced code on this client library that rightfully belong in the server only. Specifically, it seems dubious that we include ExecutorUtil in the client library.

- an update request is sometimes treated as an inter-node request, bypassing authorization plugins
Copy link

github-actions bot commented Apr 7, 2024

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the dev@solr.apache.org mailing list. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Apr 7, 2024
@janhoy
Copy link
Contributor

janhoy commented Apr 7, 2024

Thanks for being proactive on this. I think too it is scary that SolrJ is hybrid client/server code that risks containing server specific code and dependencies. Your workaround of skipping parallel updates for auth tests is ok to support and document. It sounds like a harder task to re-architect the inter-node traffic auth to be less "magic".

@jdyer1
Copy link
Contributor Author

jdyer1 commented Apr 8, 2024

Thank you for looking at this @janhoy. My belief here is that if server-specific code in ExecutorUtil was moved out of SolrJ, this would completely solve the problem. In particular I would like to propose moving the isServerPool ThreadLocal to a server-side class. This would mean that MDCAwareThreadPoolExecutor also would need to have both server-side and client-side versions.

@github-actions github-actions bot removed the stale PR not updated in 60 days label Apr 11, 2024
Copy link

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the dev@solr.apache.org mailing list. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Jun 12, 2024
@epugh
Copy link
Contributor

epugh commented Jun 12, 2024

@jdyer1 I'd love to see auth testing used a lot more across Solr and this seems like the kind of thing that if not fixed would trip people up in the future. What can I do to help? Your proposal seems reasonable, but maybe an email to the dev list to get more eyes on it?

@jdyer1
Copy link
Contributor Author

jdyer1 commented Jun 13, 2024

@epugh As mentioned earlier in the PR, we can paper over this by disabling Parallel Updates for auth tests. But the real problem in my opinion is we have allowed SolrJ to become a server-side library that clients also use. I would like to move MDCAwareThreadPoolExecutor into a server-side jar and replace it with a client-specific Executor in SolrJ.

As for raising this on the Dev list, I always thought anyone interested in participating in decisions like this would also be subscribed to PR updates. If you or others think my proposal here has merit, I can try and work up a PR for it so it can be evaluated further.

@epugh
Copy link
Contributor

epugh commented Jun 14, 2024

They should be subscribed to the PR, but sometimes emailing dev@ get's some more eyes, especialy if you aren't sure which direction to proceed or even if you should proceed. Your reasoning makes sense to me but again I'm not a deep Java person, so the ramifications of moving MDCAwareThreadPoolExecutor are beyond me ;-).

@github-actions github-actions bot removed the stale PR not updated in 60 days label Jun 19, 2024
Copy link

This PR has had no activity for 60 days and is now labeled as stale. Any new activity or converting it to draft will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the dev@solr.apache.org mailing list. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Aug 18, 2024
Copy link

This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again.

@dsmiley
Copy link
Contributor

dsmiley commented Dec 12, 2024

My belief here is that if server-specific code in ExecutorUtil was moved out of SolrJ, this would completely solve the problem. In particular I would like to propose moving the isServerPool ThreadLocal to a server-side class.

+1

But I don't want lots of code duplication for MDCAwareThreadPoolExecutor. Maybe the isServerPool determination should be redesigned. Namely, The creator thread of a MDCAwareThreadPoolExecutor; if it's a "server thread", then so is the thread pool and thus its threads likewise. And startup CoreContainer as a server thread too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-stale Closed after being stale for 60 days stale PR not updated in 60 days test-framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants