Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible problem with Atom processors and conda-forge builds #67

Open
greglandrum opened this issue Feb 16, 2021 · 10 comments
Open

Possible problem with Atom processors and conda-forge builds #67

greglandrum opened this issue Feb 16, 2021 · 10 comments

Comments

@greglandrum
Copy link
Contributor

This issue filed in the RDKit repo seems to be connected with the conda-forge builds on Atom processors:
rdkit/rdkit#3814

I don't have access to an Atom processor to test on and I'm wondering if any of you has seen something like this before.

@jaimergp
Copy link
Member

jaimergp commented Feb 16, 2021

Hm, this opened a little can of worms, I am afraid.

The gist is that we shouldn't be optimizing with AVX unless RDKit gracefully falls back to a different implementation at runtime. If this cannot happen, we need to rebuild the packages without AVX, which might incur in a performance penalty for those users that could benefit from them.

What do you think @greglandrum? Is there an easy way RDKit could handle that or do we need to rebuild?

In the future, archspec-enabled packages will help handle different builds with and without AVX, but this is not ready yet.

@chrisburr
Copy link
Member

The issue isn't with AVX, it's POPCNT (which often comes with SSE4.x). This package is explicitly enabling popcnt[1] which isn't supported by the CPU shown in the issue. The conda-forge recipe should be changed to use -DRDK_OPTIMIZE_POPCNT=OFF by default (it can be renabled once the archspec stuff is usable).

[1] https://github.com/rdkit/rdkit/blob/bc4d5478478542045a0f89e28a5da8b6d1f0e5d0/CMakeLists.txt#L68-L76
[2] https://ark.intel.com/content/www/us/en/ark/products/58917/intel-atom-processor-n2800-1m-cache-1-86-ghz.html#AdvancedTechnologies

@greglandrum
Copy link
Contributor Author

Assuming that the problem @gokhantahil is actually due to the use of an AVX instruction (which we don't know for sure):
I don't think there's anything we can do on the RDKit side.
The example which is causing the problem is in the SMILES parser, and neither that nor anything it calls intentionally uses architecture-specific code. It could be something in boost::dynamic_bitset, but that's just a guess.

As for what should be done on the conda-forge side: I'm really not sure. I guess this isn't a problem which comes up a lot (this is the first report I can remember of this type of problem), but at the same time I think conda-forge is intended to be a generally useful distribution... so maybe it doesn't make sense to be aggressive with the compile flags and to disable RDK_OPTIMIZE_POPCNT for all platforms?

@mcs07
Copy link
Contributor

mcs07 commented Feb 16, 2021

It seems a shame to cause a performance hit to the vast majority of users, given how widespread popcnt support is? I fall on the side of the conda-forge package catering to the most widespread use case, rather than the lowest common denominator. Especially seeing as we've lasted this long without issues and it seems archspec is coming soon to solve the problem?

@jaimergp
Copy link
Member

What about providing a popcnt-less version for now that is deprioritized with track_features? The default would still be the one with popcnt, but affected users could do conda install rdkit=*=*nopopcnt?

@jaimergp
Copy link
Member

jaimergp commented Feb 16, 2021

We need the variant with and without popcnt anyway. When archspec comes the variant selection would be automatic, but that's it.

@greglandrum
Copy link
Contributor Author

@jaimergp : do you have any sense as to whether or not this is likely to be resolvable?

@jaimergp
Copy link
Member

jaimergp commented Jun 9, 2021

The infrastructure for archspec variants is still being worked on. We could fix our issue with popcnt but I am unsure if that's worth it given the reduced impact. How is the PyPI release being compiled? Does it include pocpnt?

Basically, this requires a couple hours of work and I am currently super busy, but if this is one of those issues that are important but not urgent, I will tackle it eventually!

@greglandrum
Copy link
Contributor Author

I don't think it's particularly urgent. I asked because it came up again this week in rdkit/rdkit#4219 with someone using a really old Intel processor.
I'm not sure supporting 14 year old CPUs is that critical, but on the other side, it might be interesting to see if enabling modern CPU features could improve performance.

@jaimergp
Copy link
Member

jaimergp commented Jun 9, 2021

In that case I'd suggest we wait for an official archspec based solution instead of handcrafting something here on the conda-forge channel. The rdkit channel could benefit from more esoteric variants if needed in the meantime; I don't have bandwidth for that but I can provide pointers on how to start!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants