Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cryptonight variant 2 support #160

Merged
merged 3 commits into from
Sep 15, 2018
Merged

Cryptonight variant 2 support #160

merged 3 commits into from
Sep 15, 2018

Conversation

SChernykh
Copy link
Contributor

@SChernykh SChernykh commented Sep 13, 2018

  • Added new per-thread parameter "unroll_factor" which can be set to 1, 2, 4, 8, 16, 32 or 64, default is 8.
  • CNv2 OpenCL code is in a separate kernel because it uses 1KB more local memory which can hurt other variants performance
  • Fixed a bug with "comp_mode"=1 no matter what was set in config.json

Sample thread setting for Radeon RX 560 which gave me the best performance:

"threads": [
    {
        "index": 0,
        "intensity": 1024,
        "worksize": 32,
        "strided_index": 2,
        "mem_chunk": 2,
        "unroll_factor": 16,
        "comp_mode": false,
        "affine_to_cpu": false
    }
],

Performance was the same as in my previous GPU tests.

- Added new per-thread parameter "unroll_factor" which can be set to 1, 2, 4, 8, 16, 32 or 64, default is 8.
- CNv2 OpenCL code is in a separate kernel because it uses 1KB more local memory which can hurt other variants performance
- Fixed a bug with "comp_mode"=1 no matter what was set in config.json

Sample thread setting for Radeon RX 560 which gave me the best performance:

    "threads": [
        {
            "index": 0,
            "intensity": 1024,
            "worksize": 32,
            "strided_index": 0,
            "mem_chunk": 2,
            "unroll_factor": 16,
            "comp_mode": false,
            "affine_to_cpu": false
        }
    ],
@SChernykh
Copy link
Contributor Author

It turned out to be easier than I thought - it took only a few hours to port my OpenCL code. I only tested it on Windows with my RX 560. Command-line parameter setting was not tested, I don't know xmrig well enough to test that.

@xmrig
Copy link
Owner

xmrig commented Sep 13, 2018

What about strided_index? We can't use global variant in OclCache::load() because it's breaks ability to switch variant in runtime, so new code should not depend of -DSTRIDED_INDEX= if it not supported.

About command-line parameter don't worry, I will fix it.
Thank you.

@SChernykh
Copy link
Contributor Author

Ah, I see. I need to think how to rewrite it properly then.

@SChernykh
Copy link
Contributor Author

Wow, I've added strided index support to V2 and got better performance with strided_index = 2 and mem_chunk = 2 (64 bytes), but strided_index = 1 kills V2 performance because 16 bytes is too small granularity for V2. I'll test it some more and then submit.

Best setting is strided_index=2 and mem_chunk=2.
@Bathmat
Copy link

Bathmat commented Sep 14, 2018

@SChernykh Attempting to run CNv2 with dual threads and strided_index=1 results in very poor performance and/or fails. On my RX470/480 GPUS, setting mem_chunk=4 gave slightly worse performance, and mem_chunk=8 was very poor/failed.

Therefore, I would recommend strided_index=2, mem_chunk=2 as the default config.

Additionally, if the default worksize setting is 8, then I would recommend setting the default unroll_factor to 4. Seems to help slightly with performance (1% maybe?)

@xmrig xmrig merged commit 2d49675 into xmrig:dev Sep 15, 2018
@xmrig
Copy link
Owner

xmrig commented Sep 15, 2018

@SChernykh Merged, thank you, about default options/better autoconfig new PR welcome.

xmrig added a commit that referenced this pull request Sep 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants