Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Polars ThreadPool #1927

Merged
merged 1 commit into from
Nov 29, 2021
Merged

Use Polars ThreadPool #1927

merged 1 commit into from
Nov 29, 2021

Conversation

ibENPC
Copy link
Contributor

@ibENPC ibENPC commented Nov 29, 2021

As mentioned here: #1925 (comment)

@ritchie46
Copy link
Member

🙌

@ritchie46 ritchie46 merged commit 6dd9ded into pola-rs:master Nov 29, 2021
@alippai
Copy link

alippai commented Nov 29, 2021

Is there a flag / option to prevent implicit (accidental) pool creation in Rayon?

@ritchie46
Copy link
Member

Don't know.. It would be beneficial indeed.

@alippai
Copy link

alippai commented Nov 29, 2021

Maybe calling https://docs.rs/rayon/1.0.0/rayon/struct.ThreadPoolBuilder.html#method.build_global at the tests teardown would help this (it fails if global threadpool exists)

@ibENPC
Copy link
Contributor Author

ibENPC commented Nov 29, 2021

Maybe calling https://docs.rs/rayon/1.0.0/rayon/struct.ThreadPoolBuilder.html#method.build_global at the tests teardown would help this (it fails if global threadpool exists)

It's a smart idea.

However it requires to run tests that shouldn't be run (in PR for example) for the moment because cargo miri would complain about the Rayon remaining thread.
We could use comments in order to make the tests available if needed but not triggered in general, I don't know.

That's a little bit tricky 🤪

@alippai
Copy link

alippai commented Nov 29, 2021

This might be even better. Drop the polars threadpool in teardown and if miri complains after the test, the global threadpool was initialized. I'm not sure what issues miri has right now, but if it's about lingering threads after test run, then this might work.

@ibENPC
Copy link
Contributor Author

ibENPC commented Nov 29, 2021

How to drop the Polars ThreadPool ?

@alippai
Copy link

alippai commented Nov 29, 2021

I didn't use it before, but maybe https://docs.rs/rayon/1.5.1/rayon/struct.ThreadPool.html#impl-Drop?

When the ThreadPool is dropped, that’s a signal for the threads it manages to terminate, they will complete executing any remaining work that you have spawned, and automatically terminate.

@ibENPC
Copy link
Contributor Author

ibENPC commented Nov 29, 2021

From what I understand, the ThreadPool is declared static as it can be seen here:

pub static ref POOL: ThreadPool = ThreadPoolBuilder::new()

Therefore it can't be dropped.

I don't know if it is possible to change that in Polars and be able to manipulate the ThreadPool for the tests with those tricks while keeping an optimized behavior for the usual crate usage.

@alippai
Copy link

alippai commented Nov 29, 2021

Would it help if instead of using the .build() we'd create the registry separately like:

lazy_static! {
    pub static ref REG: Arc<Registry> = Arc::new(Registry::new(ThreadPoolBuilder::new()
        .num_threads(
            std::env::var("POLARS_MAX_THREADS")
                .map(|s| s.parse::<usize>().expect("integer"))
                .unwrap_or_else(|_| num_cpus::get())
        ))?);
}

and then we'd be able to create the POOL as ThreadPool = ThreadPool { REG }? That way REG.terminate() would be accessible. I'm just guessing here as I'm not a Rust programmer. What do you think?

@ibENPC
Copy link
Contributor Author

ibENPC commented Nov 29, 2021

It is difficult for me to understand how or why it would work. I don't think so but I might be wrong.

Do not hesitate to test the code yourself to achieve a complete implementation of your idea, the Rust compiler will help you.

@alippai
Copy link

alippai commented Nov 29, 2021

@ibENPC you are right, Registry is private, we can't access it. I've added rayon-rs/rayon#903, if they find it useful, we may use it in the future. Running POOL.terminate() at the end of the test makes miri happy.
I didn't dig deeper, maybe the tests using POOL might need to run sequentially (similar to this issue rust-lang/rust#47506)

@ibENPC
Copy link
Contributor Author

ibENPC commented Nov 30, 2021

@alippai thank you for the work, the problem is clearer.

If we run the tests sequentially we might find a solution by using an unsafe modification of POOL or another unsafe method. Otherwise it seems to imply a random game with data races 🤪

@ritchie46 Is it a problem to use -Zmiri-disable-leaks as an option to cargo miri for the checks ? Because there are many functions where we can use Rayon, and the miri test as is forces only 2 distinct choices:

  • Avoiding to parallelize --> best potential performance not achieved.
  • Avoiding to run tests --> best potential robustness not achieved.

I can open an issue if you think that the discussion is worth it.

@ritchie46
Copy link
Member

IMO it doesn't matter that miri doesn't run for all tests. If it can do that one day, great. But I don't want to reduce perf, so we can run a test.

@ibENPC ibENPC mentioned this pull request Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants