Problem: gen-tx don't run in parallel for single node #1645

yihuang · 2024-10-17T04:18:03Z

Solution:

use multiprocessing library to do parallel tx gen

👮🏻👮🏻👮🏻 !!!! REFERENCE THE PROBLEM YOUR ARE SOLVING IN THE PR TITLE AND DESCRIBE YOUR SOLUTION HERE !!!! DO NOT FORGET !!!! 👮🏻👮🏻👮🏻

PR Checklist:

Thank you for your code, it's appreciated! :)

Summary by CodeRabbit

New Features
- Enhanced transaction generation process with parallelization for improved performance.
- New function to split ranges into equal parts for better job distribution.
Bug Fixes
- Resolved various issues related to transaction validation and multisig accounts.
- Improved handling of acknowledgment processes and governance parameters.
Documentation
- Updated CHANGELOG.md to reflect recent changes and improvements.

Solution: - use multiprocessing library to do parallel tx gen

CHANGELOG.md

Signed-off-by: yihuang <huang@crypto.com>

codecov · 2024-10-17T04:23:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 17.87%. Comparing base (3b38bcc) to head (84eb4e6).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1645   +/-   ##
=======================================
  Coverage   17.87%   17.87%           
=======================================
  Files          72       72           
  Lines        5170     5170           
=======================================
  Hits          924      924           
  Misses       4123     4123           
  Partials      123      123

Signed-off-by: yihuang <huang@crypto.com>

coderabbitai · 2024-10-17T07:17:22Z

Walkthrough

The pull request updates the CHANGELOG.md to document enhancements in transaction generation and various bug fixes. It introduces parallel transaction generation in transaction.py, utilizing multiprocessing for efficiency. Additionally, a new utility function for splitting ranges is added in utils.py. The changes collectively aim to improve performance, enhance testing capabilities, and fix critical issues in the system.

Changes

File Path	Change Summary
CHANGELOG.md	Updated with new entry for parallel test transactions and improvements in `v1.4.0-rc1`, including bug fixes.
testground/benchmark/benchmark/transaction.py	Refactored `gen` function for parallel transaction generation; added `Job` named tuple for job handling.
testground/benchmark/benchmark/utils.py	Added `split(a: int, n: int)` function to split a range into `n` parts.

Possibly related PRs

Problem: testground test case not fast enough #1495: This PR addresses performance issues in testground, which may relate to the parallel transaction generation mentioned in the main PR.
Problem: testground infra is not easy to setup #1504: Changes in this PR involve refactoring transaction generation and funding, which connects to the enhancements in transaction handling in the main PR.
Problem: configs are baked in testground #1561: This PR introduces a new parameter for controlling concurrency in transaction sending, which aligns with the parallel transaction generation improvements in the main PR.
Problem: test transactions can't be saved and reused #1575: The restructuring of transaction handling in this PR could relate to the parallel transaction generation capabilities discussed in the main PR.
Problem: multi-threading tx sending not efficient #1587: This PR focuses on improving the efficiency of transaction sending, which is relevant to the enhancements in transaction generation in the main PR.
Problem: test tx generation can't be run in parallel #1596: The changes in this PR regarding transaction generation could be related to the parallelization improvements in the main PR.
Problem: get unnecessary block result when only need header #1600: This PR updates the changelog to reflect improvements in the ethermint component, which is relevant to the overall enhancements documented in the main PR.
Problem: no compatible pebble db supported #1606: The updates regarding PebbleDB support may relate to the overall improvements in functionality and performance in the main PR.

Suggested reviewers

mmsqe
calvinaco

🐇 "In fields of code where bunnies play,
New transactions hop in a parallel way.
With jobs in chunks, they swiftly align,
Bugs fixed and features, all looking fine!
So let’s celebrate with a joyful cheer,
For a robust system, we hold so dear!" 🐇

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)

testground/benchmark/benchmark/utils.py (1)
176-181: LGTM! Consider a more specific function name.

The split function is well-implemented and efficiently splits a range into n parts. The logic is correct and handles uneven divisions properly. The docstring and type hints improve readability and maintainability.

Consider renaming the function to something more specific, like split_range or partition_range, to avoid potential confusion with the built-in split method for strings and to better describe its purpose.
-def split(a: int, n: int):
+def split_range(a: int, n: int):
CHANGELOG.md (1)
7-8: LGTM! Consider fixing the PR link format.

The addition of parallel test transaction generation for single nodes is a valuable improvement that should enhance testing efficiency.

Consider updating the format of the second entry to match the first one:
-* (testground)[#1644](https://github.com/crypto-org-chain/cronos/pull/1644) load generator retry with backoff on error.
+* [#1644](https://github.com/crypto-org-chain/cronos/pull/1644) load generator retry with backoff on error.
This will make the PR link consistent with the other entries in the changelog.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~7-~7: Possible missing comma found.
Context: ...-chain/cronos/pull/1645) Gen test tx in parallel even in single node. * (testground)[#16...

(AI_HYDRA_LEO_MISSING_COMMA)
testground/benchmark/benchmark/transaction.py (2)
68-69: Use the logging module for thread-safe output in multiprocessing

Using print statements within multiprocessing code can result in jumbled or out-of-order console output due to concurrent access to stdout. The logging module is thread-safe and provides better control over log messages.

Consider replacing print with the logging module:
+    import logging
+
+    # Configure logging at the beginning of your script or module
+    logging.basicConfig(level=logging.INFO)
+
     for acct, txs in zip(accounts, acct_txs):
         for nonce in range(job.num_txs):
             txs.append(acct.sign_transaction(job.create_tx(nonce)).rawTransaction.hex())
             total += 1
             if total % 1000 == 0:
-                print("generated", total, "txs for node", job.global_seq)
+                logging.info(f"Generated {total} txs for node {job.global_seq}")
This change ensures that log messages are properly managed and thread-safe during multiprocessing.

1-5: Remove unused imports if not necessary

The imports asyncio and aiohttp are used for asynchronous operations later in the file, but please ensure that all imported modules are necessary for this script.

If any imports are not used, consider removing them to keep the code clean.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3b38bcc and 8fd2ae7.

📒 Files selected for processing (3)

CHANGELOG.md (1 hunks)
testground/benchmark/benchmark/transaction.py (3 hunks)
testground/benchmark/benchmark/utils.py (1 hunks)

🧰 Additional context used

🪛 LanguageTool

CHANGELOG.md

[uncategorized] ~7-~7: Possible missing comma found.
Context: ...-chain/cronos/pull/1645) Gen test tx in parallel even in single node. * (testground)[#16...

(AI_HYDRA_LEO_MISSING_COMMA)

🔇 Additional comments (1)

testground/benchmark/benchmark/utils.py (1)
176-181: Verify the usage of the new function in the codebase.

The new split function looks good, but it's important to ensure it's being used correctly throughout the project.

Let's check for its usage:

This will help us understand how and where the function is being used, ensuring it's implemented correctly in other parts of the codebase.

✅ Verification successful

Adjusting the search to target .py files explicitly.

The usage of the new split function has been verified and is correctly utilized in the codebase.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Search for usage of the split function in Python files.

# Test: Look for import statements or direct usage of the split function
rg --type python -e "from.*utils.*import.*split" -e "utils\.split\(" -e "split\([^\"']" -C 3
Length of output: 127

Script:
#!/bin/bash
# Description: Search for usage of the split function in Python files by targeting .py extensions.

# Test: Look for import statements or direct usage of the split function
rg -g "*.py" -e "from\s+.*utils\s+import\s+split" -e "utils\.split\(" -e "split\([^\"']" -C 3
Length of output: 2769

testground/benchmark/benchmark/transaction.py

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

testground/benchmark/benchmark/transaction.py (1)
74-89: LGTM with a minor suggestion: Parallel transaction generation

The refactoring of the gen function to use multiprocessing is well-implemented and should significantly improve performance for transaction generation. The approach of splitting work into chunks based on CPU count is appropriate.

One minor suggestion:
When combining results using itertools.zip_longest, you might introduce None values if job results have different lengths. Consider filtering out these None values:
all_txs = []
for txs in itertools.zip_longest(*itertools.chain(*acct_txs)):
    all_txs.extend(filter(None, txs))
This change ensures that only valid transactions are added to all_txs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8fd2ae7 and 84eb4e6.

📒 Files selected for processing (1)

testground/benchmark/benchmark/transaction.py (3 hunks)

🧰 Additional context used

🔇 Additional comments (3)

testground/benchmark/benchmark/transaction.py (3)

2-5: LGTM: New imports for parallel processing

The added imports (itertools, multiprocessing, os, and namedtuple from collections) are appropriate for implementing parallel transaction generation. These additions align well with the PR's objective of enabling parallel execution for gen-tx.

55-57: LGTM: Well-structured Job namedtuple

The Job namedtuple is well-defined with all necessary fields for parallel transaction generation. This structure will help in organizing and passing job-related data efficiently between processes.

Line range hint 1-91: Overall: Excellent implementation of parallel transaction generation

The changes made to this file successfully implement parallel transaction generation, which aligns perfectly with the PR's objective. Key points:

The use of multiprocessing should significantly improve performance, especially for large numbers of transactions.

The code structure is clean and maintainable, with clear separation of concerns between job definition, execution, and result aggregation.

The core logic of transaction creation remains intact, minimizing the risk of introducing new bugs.

These changes should result in a substantial performance improvement for the gen-tx process when running on a single node with multiple CPU cores.

To further validate the improvements:

This script will help verify that the parallel implementation indeed provides a performance boost.

testground/benchmark/benchmark/transaction.py

Problem: gen-tx don't run in parallel for single node

c1911b1

Solution: - use multiprocessing library to do parallel tx gen

yihuang requested a review from a team as a code owner October 17, 2024 04:18

yihuang requested review from JayT106 and thomas-nguy and removed request for a team October 17, 2024 04:18

yihuang commented Oct 17, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

a4750a2

Signed-off-by: yihuang <huang@crypto.com>

yihuang requested a review from mmsqe October 17, 2024 04:21

mmsqe approved these changes Oct 17, 2024

View reviewed changes

yihuang added this pull request to the merge queue Oct 17, 2024

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Oct 17, 2024

Merge branch 'main' into parallel-gen-tx

8fd2ae7

Signed-off-by: yihuang <huang@crypto.com>

yihuang enabled auto-merge October 17, 2024 07:17

coderabbitai bot reviewed Oct 17, 2024

View reviewed changes

testground/benchmark/benchmark/transaction.py Show resolved Hide resolved

testground/benchmark/benchmark/transaction.py Outdated Show resolved Hide resolved

testground/benchmark/benchmark/transaction.py Outdated Show resolved Hide resolved

yihuang added this pull request to the merge queue Oct 17, 2024

yihuang removed this pull request from the merge queue due to a manual request Oct 17, 2024

cleanup

84eb4e6

coderabbitai bot reviewed Oct 17, 2024

View reviewed changes

testground/benchmark/benchmark/transaction.py Show resolved Hide resolved

yihuang added this pull request to the merge queue Oct 17, 2024

Merged via the queue into crypto-org-chain:main with commit f3746f6 Oct 17, 2024
35 checks passed

This was referenced Oct 17, 2024

Problem: node can't shutdown by signal #1647

Merged

Problem: single validator benchmark can't run natively #1649

Merged

Problem: benchmark don't support batch tx #1650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem: gen-tx don't run in parallel for single node #1645

Problem: gen-tx don't run in parallel for single node #1645

yihuang commented Oct 17, 2024 •

edited by coderabbitai bot

Loading

codecov bot commented Oct 17, 2024 •

edited

Loading

coderabbitai bot commented Oct 17, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

Problem: gen-tx don't run in parallel for single node #1645

Problem: gen-tx don't run in parallel for single node #1645

Conversation

yihuang commented Oct 17, 2024 • edited by coderabbitai bot Loading

PR Checklist:

Summary by CodeRabbit

codecov bot commented Oct 17, 2024 • edited Loading

Codecov Report

coderabbitai bot commented Oct 17, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

yihuang commented Oct 17, 2024 •

edited by coderabbitai bot

Loading

codecov bot commented Oct 17, 2024 •

edited

Loading

coderabbitai bot commented Oct 17, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)