ARROW-4025: [Python] Avoid pthread_once in ThreadPool #3177

pcmoritz · 2018-12-14T23:45:00Z

This seems to fix the TensorFlow/PyTorch crash for me.

Some disassembling (thanks for the hint @pitrou) brought up that pthread_once is called by the C++11 future library. Replacing it with boost seems to fix it (but unfortunately pulls in a number of other boost dependencies like atomic, chrono and date_time; I wonder if that is avoidable).

pcmoritz · 2018-12-14T23:53:02Z

cpp/src/arrow/util/thread-pool.cc

@@ -294,9 +294,14 @@ std::shared_ptr<ThreadPool> ThreadPool::MakeCpuThreadPool() {
  return pool;
 }

+ThreadPool* arrow_global_thread_pool = nullptr;


This may not be needed any more, I'll check that.

(this turns out to actually not be needed)

pcmoritz · 2018-12-14T23:55:13Z

To test this, the manylinux1 base image needs to be rerun (in arrow/python/manylinux1) with

docker build -t arrow_manylinux1_x86_64_base -f Dockerfile-x86_64_base .

and then wheels can be generated with

docker run --shm-size=2g --rm -t -i -v $PWD:/io -v $PWD/../../:/arrow arrow_manylinux1_x86_64_base:latest /io/build_arrow.sh

pitrou · 2018-12-15T00:04:38Z

Yuck. We don't want to bring in boost just to workaround Tensoflow's standard violations.

If you think this should be solved, I encourage you to find another way.

pitrou · 2018-12-15T00:05:28Z

cpp/src/arrow/util/thread-pool.cc

 ThreadPool* GetCpuThreadPool() {
-  static std::shared_ptr<ThreadPool> singleton = ThreadPool::MakeCpuThreadPool();
-  return singleton.get();
+  if (!arrow_global_thread_pool) {


This is not thread-safe. There is a reason why I used a static singleton, and that reason is the thread safety of its initialization.

I think we can even demote it to a unique_ptr, since the object is never re-assigned.

Yeah, you are both right (this was just to try out if removing the static shared_ptr makes a difference)

pitrou · 2018-12-15T00:06:10Z

cpp/src/arrow/util/thread-pool.cc

-  static std::shared_ptr<ThreadPool> singleton = ThreadPool::MakeCpuThreadPool();
-  return singleton.get();
+  if (!arrow_global_thread_pool) {
+    arrow_global_thread_pool = new ThreadPool();


Creates a memory leak (object is never destroyed, Valgrind will complain).

wesm · 2018-12-15T00:07:45Z

I'm also -0/-1 on eschewing C++11 future/thread -- that seems hardly sustainable. I think we need to engage the TF / PyTorch communities on this. It feels like they swiped the credit card, but sent the bill to our house.

wesm · 2018-12-15T00:13:09Z

It might mean that we have to create some kind of a pip wheel to hoist a modern C++ standard library into the environment (for PyTorch / TF to depend on) so they can avoid all this static linking insanity, similar to what's being done in conda now

pcmoritz · 2018-12-15T07:26:43Z

Thanks for the feedback @wesm @pitrou. Here are my thoughts:

This problem really needs to be fixed. If we don't fix it, doing deep learning with arrow via TensorFlow or PyTorch will be severely hindered and that's a non-starter certainly for Ray but I think also for the adoption of pyarrow in general. The workarounds we had so far work in some cases, but they are sometimes failing and we obviously cannot afford that. Making TensorFlow and pytorch manylinux1 compatible won't happen in the near future, it has been discussed several times and is non-trivial because of the need for GPU support and the nvidia toolchain. Plus even if it were fixed tomorrow, older versions of TensorFlow are still in use for various reasons. So I fear this needs to be fixed on our side.

Here are the solutions I can think of:

Use boost::threads. While minimally invasive from the source code perspective I agree it is not a great solution.
Change the API of the thread pool to get rid of futures (see below).
Make the ThreadPool optional/configurable and provide an alternative that plays nicely with TensorFlow. This could be configurable from the python/C++ side at runtime or via a macro at compile time in which case it would be up to the application runtime (Ray in our case) to do something reasonable.
See if the problem will be fixed with manylinux2010. If it uses a more modern libstdc++, the problem might be fixed there (in that case we would need to use a fork in Ray for the time being, because we need an immediate solution).

At least now that we know that the problem comes from std::call_once and the futures, we have a way of attacking it :)

Concerning 2) I have a sketch in this PR now. It's not ready yet and there are probably bugs at the moment but it may give you an idea on how this would look like.

It would entail the following more limited API that however covers the use cases of the ThreadPool we have at the moment: Only functions void() can be submitted to the thread pool, they can communicate with the caller via closures (arguments and return values) [alternatively we limit ourselves to Status return values as it is sketched in the PR]. Instead of being able to wait for the termination of each function individually, there would be pool.Wait() which waits for all functions currently queued. This API is obviously not as nice as the current one, but we could have it as a temporary workaround until manylinux2010 is ready.

Please let me know about your thoughts/if you have further ideas how to fix it. I'm willing to spend the time making whatever approach we decide on working, but it really needs to be fixed :)

Best,
Philipp.

pitrou · 2018-12-15T10:32:09Z

Concerning 2) I have a sketch in this PR now. It's not ready yet and there are probably bugs at the moment but it may give you an idea on how this would look like.

I just took a look... Do you realize the thread pool can be called from several threads? If you wait for the thread pool to be idle, you may be waiting much longer than necessary.

At least now that we know that the problem comes from std::call_once and the futures, we have a way of attacking it :)

What if some other problem comes up? What if loading Gandiva (which is gonna bundle LLVM) side-to-side with Tensorflow also causes crashes?

See if the problem will be fixed with manylinux2010. If it uses a more modern libstdc++, the problem might be fixed there

It may indeed. Though I'm afraid that depends on the exact libstdc++ version Tensorflow builds against?

This problem really needs to be fixed. If we don't fix it, doing deep learning with arrow via TensorFlow or PyTorch will be severely hindered

I understand the sentiment, but it seems the only sane way to fix this on our side is to ask people to use conda packages, not wheels. If wheels are broken, just say it to users instead of piling up fragile workarounds in the Arrow codebase.

Actually, I would even go as far as asking distutils-sig if the non-conformant wheels can be removed from PyPI. PyPI has no policy for this currently, but it is an obvious detriment to the whole ecosystem that people are allowed to upload rogue wheels and other packages have to deal with the fallout.

pitrou · 2018-12-15T11:07:51Z

By the way, it seems manylinux2010 is almost there now: pypa/manylinux#179

(i.e. it's in pip master, but not any pip releases yet, AFAICT)

fsaintjacques · 2018-12-15T11:58:36Z

cpp/src/arrow/util/memory.h

@@ -33,7 +33,10 @@ uint8_t* pointer_logical_and(const uint8_t* address, uintptr_t bits) {

 // This function is just for avoiding MinGW-w64 32bit crash.
 // See also: https://sourceforge.net/p/mingw-w64/bugs/767/
-void* wrap_memcpy(void* dst, const void* src, size_t n) { return memcpy(dst, src, n); }
+Status wrap_memcpy(void* dst, const void* src, size_t n) {


I wouldn't wrap this with a different return type. There's a non negligible chance that gcc can't apply the inline-memcpy with a single instruction optimization. This can severely affect performance in code that does memcpy in loops.

I'd also add the static inline keywords.

fsaintjacques · 2018-12-15T11:59:20Z

cpp/src/arrow/util/thread-pool.cc

@@ -145,6 +145,11 @@ Status ThreadPool::Shutdown(bool wait) {
  return Status::OK();
 }

+void ThreadPool::Wait() {


The typical nomenclature for this is Join.

fsaintjacques · 2018-12-15T12:06:01Z

cpp/src/arrow/util/thread-pool.h

  }

+  void Wait();


I'd also expose the WaitFor variant with chrono (see https://en.cppreference.com/w/cpp/thread/condition_variable/wait_for).

fsaintjacques · 2018-12-15T12:08:22Z

cpp/src/plasma/client.cc

@@ -650,11 +650,12 @@ Status PlasmaClient::Impl::List(ObjectTable* objects) {
  return ReadListReply(buffer.data(), buffer.size(), objects);
 }

-static void ComputeBlockHash(const unsigned char* data, int64_t nbytes, uint64_t* hash) {
+static Status ComputeBlockHash(const unsigned char* data, int64_t nbytes, uint64_t* hash) {


fsaintjacques · 2018-12-15T12:21:38Z

cpp/src/arrow/util/thread-pool.h

-
-    Status st = SpawnReal(detail::packaged_task_wrapper<Result>(std::move(task)));
+  // Submit a callable for execution.
+  void Submit(std::function<Status()> func) {


I wouldn't enforce that every function must return Status, a thread pool shouldn't care about the logical data. The original code didn't enforce this, what's returned is the the failure of submitting.

fsaintjacques · 2018-12-15T13:17:53Z

cpp/src/arrow/util/utf8.cc

+  // std::call_once(utf8_initialized, internal::InitializeLargeTable);
+  // How should we do this? Maybe use
+  // https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Nifty_Counter
+  internal::InitializeLargeTable();


Note that you can refactor the ut8 stuff into a simple struct class that allocates the array on the heap. Then you use the same static + unique_ptr singleton technique and you'll get rid of the once_flag.

pcmoritz · 2018-12-15T20:50:48Z

Thanks again for the further feedback! It seems to me changing the threadpool API is also suboptimal.

I'd suggest the following solution:

Keep the API as is, but instead of std::future expose arrow::Future
Introduce a flag that redirects the implementation of arrow::Future to either std::future or boost::future.

Then we can leave it to the user which option they prefer. One sane default might be that if boost is vendored (like in our wheels), boost::future will be used. This will fix the problem not only for Ray but also vanilla pyarrow wheels.

In the future (if manylinux2010 and/or TensorFlow and pytorch solve the issue), we can remove the flag and just default to std::future.

wesm · 2018-12-15T21:26:41Z

This problem really needs to be fixed. If we don't fix it, doing deep learning with arrow via TensorFlow or PyTorch will be severely hindered and that's a non-starter certainly for Ray but I think also for the adoption of pyarrow in general. The workarounds we had so far work in some cases, but they are sometimes failing and we obviously cannot afford that. Making TensorFlow and pytorch manylinux1 compatible won't happen in the near future, it has been discussed several times and is non-trivial because of the need for GPU support and the nvidia toolchain. Plus even if it were fixed tomorrow, older versions of TensorFlow are still in use for various reasons. So I fear this needs to be fixed on our side.

I don't think it's reasonable for TF / PyTorch to build with ad hoc C++ toolchains and linker flags and expect the rest of the Python community to conform to their decisions with zero consultation. They need to engage other projects on these toolchain issues -- they have the money and the people to put in effort to be compatible with the rest of the ecosystem, so we need to hold them accountable. There aren't that many players in this ecosystem, and we will not allow ourselves to be pushed around.

I sympathize with the issues you are having, but we cannot treat these other open source projects as immutable entities. Let's talk with the development teams and come up with a solution so we can all use C++ in a consistent way and have working applications.

wesm · 2018-12-15T21:27:54Z

For the avoidance of ambiguity, @pcmoritz, I'm expecting you and @robertnishihara to take the lead on engaging with TF / PyTorch communities on this. Please let us know if we can help

wesm · 2018-12-15T21:33:18Z

Actually, I would even go as far as asking distutils-sig if the non-conformant wheels can be removed from PyPI.

Somehow we need to create some catalyst to cause the TF and PyTorch developers to take more responsibility for the toolchain problem they have created; this might be one way. Either way we need to escalate the issue and make a lot more people aware of what is going on

wesm · 2018-12-15T21:38:45Z

Aside: pyarrow is being installed more frequently than PyTorch and about 1/4 as often as TF on PyPI.

fsaintjacques · 2018-12-15T21:42:03Z

Do you want me to concoct a small work-around using purely atomics that would respect the google style guide?

wesm · 2018-12-15T21:43:32Z

I don't think it's worth investing any more effort in this until we open a dialogue about the toolchain issues with the relevant developer communities

pcmoritz · 2018-12-16T03:36:17Z

Thanks for all the views! I'll send a post to the TensorFlow and PyTorch lists to make sure this issue is raised prominently. Hopefully the community can agree on a way forward that works for everyone. At least we should make sure that there will be compatibility with manylinux2010 in the future.

I'll also prepare a patch for Ray 0.6.1 to make sure this works in the short term for our users.

robertnishihara · 2018-12-16T05:56:20Z

Relevant thread https://groups.google.com/a/tensorflow.org/forum/#!topic/developers/TMqRaT-H2bI.

soumith · 2018-12-16T06:40:16Z

@wesm thanks a lot for pushing to start a discussion and dialogue around these issues. PyTorch certainly isn't immutable and we dont expect to be treated that way.
I've responded on the email thread that @pcmoritz started to get the discussion going and try finding a solution.

It might mean that we have to create some kind of a pip wheel to hoist a modern C++ standard library into the environment (for PyTorch / TF to depend on) so they can avoid all this static linking insanity, similar to what's being done in conda now

Just FYI, PyTorch hasn't statically linked with the std C++ since Jan this year -- we've been trying our best to work with the necessary constraints we have (in terms of CUDA, C++11) and covering as old a linux distro we can possibly go to.

wesm · 2019-02-27T04:10:59Z

Closing this pending some guidance about how to proceed on creating compatible wheels

pcmoritz added 9 commits December 13, 2018 19:02

make thread pool into a global variable

dae7eac

use boost::thread

1a805c0

update

2b68853

add boost to manylinux base

354cab3

build more boost libraries

e5a2b50

update

e89203c

update

c10fcec

update

87dcecc

update

7ae3d0d

pcmoritz commented Dec 14, 2018

View reviewed changes

pitrou reviewed Dec 15, 2018

View reviewed changes

try something different

636390d

pcmoritz force-pushed the global-threadpool branch from e2bd0d0 to 636390d Compare December 15, 2018 05:06

update

62f9e82

pcmoritz force-pushed the global-threadpool branch from 05a729a to 62f9e82 Compare December 15, 2018 05:17

update

4cba4c8

pcmoritz force-pushed the global-threadpool branch from c4778dc to 4cba4c8 Compare December 15, 2018 05:59

pcmoritz added 4 commits December 14, 2018 22:28

temp try to remove futures

2f8b663

bring back utf8_initialized

35b8aea

update

1493d00

update

5d93066

fsaintjacques reviewed Dec 15, 2018

View reviewed changes

robertnishihara mentioned this pull request Dec 15, 2018

Segmentation fault with ray 0.6.0 and PyTorch 0.4.1 and 1.0.0 ray-project/ray#3520

Closed

remove singleton hack

e5e77fd

roll back changes

f5d9b76

update

a500421

soumith mentioned this pull request Dec 17, 2018

manylinux2014 compatible wheels pytorch/pytorch#15294

Open

guoyuhong mentioned this pull request Jan 14, 2019

Update arrow to include https://github.com/apache/arrow/pull/3392 ray-project/ray#3765

Merged

wesm force-pushed the master branch from 3088183 to 0c6b2d2 Compare February 18, 2019 19:34

wesm closed this Feb 27, 2019

soumith mentioned this pull request Apr 9, 2019

Crash when reading pandas parquet file after importing pyTorch pytorch/pytorch#13039

Closed

pramodk mentioned this pull request May 1, 2020

Building wheels : manylinux1 or manylinux2010? neuronsimulator/nrn#511

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-4025: [Python] Avoid pthread_once in ThreadPool #3177

ARROW-4025: [Python] Avoid pthread_once in ThreadPool #3177

pcmoritz commented Dec 14, 2018 •

edited

Loading

pcmoritz Dec 14, 2018

pcmoritz Dec 16, 2018

pcmoritz commented Dec 14, 2018

pitrou commented Dec 15, 2018

pitrou Dec 15, 2018 •

edited

Loading

fsaintjacques Dec 15, 2018

pcmoritz Dec 16, 2018

pitrou Dec 15, 2018

wesm commented Dec 15, 2018

wesm commented Dec 15, 2018

pcmoritz commented Dec 15, 2018 •

edited

Loading

pitrou commented Dec 15, 2018

pitrou commented Dec 15, 2018

fsaintjacques Dec 15, 2018

fsaintjacques Dec 15, 2018

fsaintjacques Dec 15, 2018

fsaintjacques Dec 15, 2018

fsaintjacques Dec 15, 2018

fsaintjacques Dec 15, 2018

pcmoritz commented Dec 15, 2018

wesm commented Dec 15, 2018

wesm commented Dec 15, 2018

wesm commented Dec 15, 2018

wesm commented Dec 15, 2018

fsaintjacques commented Dec 15, 2018

wesm commented Dec 15, 2018 •

edited

Loading

pcmoritz commented Dec 16, 2018 •

edited

Loading

robertnishihara commented Dec 16, 2018

soumith commented Dec 16, 2018

wesm commented Feb 27, 2019

ARROW-4025: [Python] Avoid pthread_once in ThreadPool #3177

ARROW-4025: [Python] Avoid pthread_once in ThreadPool #3177

Conversation

pcmoritz commented Dec 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcmoritz commented Dec 14, 2018

pitrou commented Dec 15, 2018

pitrou Dec 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wesm commented Dec 15, 2018

wesm commented Dec 15, 2018

pcmoritz commented Dec 15, 2018 • edited Loading

pitrou commented Dec 15, 2018

pitrou commented Dec 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcmoritz commented Dec 15, 2018

wesm commented Dec 15, 2018

wesm commented Dec 15, 2018

wesm commented Dec 15, 2018

wesm commented Dec 15, 2018

fsaintjacques commented Dec 15, 2018

wesm commented Dec 15, 2018 • edited Loading

pcmoritz commented Dec 16, 2018 • edited Loading

robertnishihara commented Dec 16, 2018

soumith commented Dec 16, 2018

wesm commented Feb 27, 2019

pcmoritz commented Dec 14, 2018 •

edited

Loading

pitrou Dec 15, 2018 •

edited

Loading

pcmoritz commented Dec 15, 2018 •

edited

Loading

wesm commented Dec 15, 2018 •

edited

Loading

pcmoritz commented Dec 16, 2018 •

edited

Loading