Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix seed with multiple ranks #4479

Merged
merged 8 commits into from
Dec 25, 2024
Merged

Conversation

njzjz
Copy link
Member

@njzjz njzjz commented Dec 19, 2024

Fix #4440.

Summary by CodeRabbit

  • New Features

    • Enhanced seed handling to support both single integers and lists for improved randomness in distributed training.
    • Added logging for neighbor statistics calculation during training.
  • Bug Fixes

    • Improved error handling in data loading processes to ensure robustness.
  • Documentation

    • Updated documentation for methods related to seed and batch size management for clarity.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

deepmd/pt/utils/dataloader.py Show resolved Hide resolved
Copy link
Contributor

coderabbitai bot commented Dec 19, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces modifications to seed handling across multiple files in the DeePMD-kit codebase. The primary change involves updating the random seed generation mechanism to improve safety and randomness in distributed training scenarios. Instead of simply adding the rank to the seed, the new implementation uses a list containing the rank and a modified seed value. This change is consistently applied across TensorFlow, PyTorch, and other entrypoint files to ensure more robust random number generation.

Changes

File Change Summary
deepmd/pd/entrypoints/main.py Modified prepare_trainer_input_single to use a list for rank_seed
deepmd/pt/entrypoints/main.py Updated prepare_trainer_input_single with new rank_seed calculation
deepmd/pt/utils/dataloader.py Enhanced setup_seed to handle list/tuple seeds, improved batch size handling
deepmd/tf/entrypoints/train.py Modified seed initialization logic, added logging for neighbor statistics
deepmd/utils/random.py Updated seed function signature to accept integer or list of integers
deepmd/pd/utils/dataloader.py Updated setup_seed to accept list/tuple seeds and added mix_entropy function

Assessment against linked issues

Objective Addressed Explanation
Fix unsafe seed generation in parallel training [#4440]
Prevent overlapping worker seeds

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
deepmd/utils/random.py (1)

Line range hint 59-65: Update docstring to reflect new parameter type

The docstring should be updated to document that val can now be either an integer or a list of integers.

 def seed(val: Optional[Union[int, list[int]]] = None) -> None:
     """Seed the generator.
 
     Parameters
     ----------
-    val : int
+    val : Optional[Union[int, list[int]]]
         Seed.
     """
     _RANDOM_GENERATOR.seed(val)
deepmd/pt/utils/dataloader.py (1)

53-57: Simplify code using ternary operator

The if-else block can be simplified using a ternary operator.

-    if isinstance(seed, (list, tuple)):
-        mixed_seed = mix_entropy(seed)
-    else:
-        mixed_seed = seed
+    mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed
🧰 Tools
🪛 Ruff (0.8.2)

53-56: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)


54-54: Undefined name mix_entropy

(F821)

deepmd/pd/entrypoints/main.py (1)

126-126: Improved seed handling for distributed training

The new seed calculation [rank, seed % (2**32)] is a robust improvement that:

  • Prevents potential seed collisions between ranks that could occur with simple addition
  • Ensures reproducibility while maintaining unique seeds per rank
  • Guards against integer overflow with modulo operation

Consider documenting this seed handling pattern in the codebase's distributed training guide to ensure consistent usage across different implementations.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0914e1 and 124f432.

📒 Files selected for processing (5)
  • deepmd/pd/entrypoints/main.py (1 hunks)
  • deepmd/pt/entrypoints/main.py (1 hunks)
  • deepmd/pt/utils/dataloader.py (1 hunks)
  • deepmd/tf/entrypoints/train.py (1 hunks)
  • deepmd/utils/random.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/dataloader.py

53-56: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)


54-54: Undefined name mix_entropy

(F821)

🔇 Additional comments (3)
deepmd/tf/entrypoints/train.py (1)

224-226: LGTM: Improved seed handling for distributed training

The change ensures unique seeds across different ranks by combining rank and seed into a list. This prevents potential seed collisions in distributed training scenarios.

deepmd/pt/entrypoints/main.py (2)

141-141: LGTM: Consistent seed handling across backends

The implementation matches the PaddlePaddle version, ensuring consistent seed handling behavior across different backends. This is crucial for reproducibility in distributed training scenarios.


141-141: Verify consistent seed handling across codebase

Let's verify that all seed handling implementations across the codebase follow this new pattern.

✅ Verification successful

Based on the search results, I can now generate the final response:

Seed handling is consistent across implementations

The seed handling implementation is consistent between PyTorch and PaddlePaddle implementations:

  • Both deepmd/pt/entrypoints/main.py and deepmd/pd/entrypoints/main.py use identical seed handling:
    rank_seed = [rank, seed % (2**32)] if seed is not None else None
  • The seed is correctly passed to DpLoaderSet in both implementations with the same pattern
  • No other locations in the codebase require similar rank-based seed handling updates
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for other potential seed handling implementations that might need updates

# Look for seed-related patterns in Python files
echo "Searching for seed handling patterns..."
rg -l "seed.*=.*rank" --type py

# Look for DpLoaderSet usage with seed parameter
echo "Searching for DpLoaderSet instantiations..."
rg "DpLoaderSet\(.*seed.*\)" --type py

Length of output: 320


Script:

#!/bin/bash
# Let's examine the seed handling in both files to verify consistency

echo "Checking seed handling in PyTorch implementation:"
rg -A 3 -B 3 "seed.*=.*rank" deepmd/pt/entrypoints/main.py

echo -e "\nChecking seed handling in PaddlePaddle implementation:"
rg -A 3 -B 3 "seed.*=.*rank" deepmd/pd/entrypoints/main.py

# Let's also check for any other seed-related patterns
echo -e "\nChecking other seed-related patterns:"
rg "seed" --type py -C 2

Length of output: 74170

deepmd/pt/utils/dataloader.py Show resolved Hide resolved
@njzjz njzjz linked an issue Dec 19, 2024 that may be closed by this pull request
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
deepmd/pt/utils/dataloader.py (2)

56-61: Add input validation for seed parameter

While the seed handling logic is correct, consider adding validation for None or invalid types to make the function more robust.

 def setup_seed(seed) -> None:
+    if seed is None:
+        return
     if isinstance(seed, (list, tuple)):
         mixed_seed = mix_entropy(seed)
     else:
+        if not isinstance(seed, (int, np.integer)):
+            raise TypeError(f"Seed must be an integer or a collection of integers, got {type(seed)}")
         mixed_seed = seed
     torch.manual_seed(mixed_seed)
     torch.cuda.manual_seed_all(mixed_seed)
🧰 Tools
🪛 Ruff (0.8.2)

56-59: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)


56-59: Simplify code using ternary operator

The if-else block can be simplified using a ternary operator for better readability.

-    if isinstance(seed, (list, tuple)):
-        mixed_seed = mix_entropy(seed)
-    else:
-        mixed_seed = seed
+    mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed
🧰 Tools
🪛 Ruff (0.8.2)

56-59: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 124f432 and cf6e6e2.

📒 Files selected for processing (1)
  • deepmd/pt/utils/dataloader.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/dataloader.py

56-59: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

🔇 Additional comments (2)
deepmd/pt/utils/dataloader.py (2)

39-41: LGTM: Clean import addition

The import of mix_entropy is properly placed and necessary for the enhanced seed handling functionality.


56-61: Verify seed consistency across distributed processes

The seed handling changes look good, but let's verify the consistency across distributed processes.

✅ Verification successful

Based on the search results, I can now generate the final response:

Seed handling is properly implemented for distributed training

The changes ensure consistent random number generation across distributed processes by:

  1. Using mix_entropy to handle seed lists/tuples in distributed scenarios
  2. Setting both CPU and CUDA seeds consistently via torch.manual_seed and torch.cuda.manual_seed_all
  3. Properly integrating with PyTorch's distributed sampler which handles seed synchronization across ranks
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential race conditions or inconsistencies in seed handling across distributed processes

# Search for other seed-related code in the codebase
echo "Checking for other seed-related code:"
rg "setup_seed|seed\s*=|\.seed\(" --type python

# Search for distributed training related code that might interact with seed
echo "Checking distributed training code that might affect seed:"
rg "DistributedSampler|dist\.is_initialized|get_rank" --type python -A 3

Length of output: 398


Script:

#!/bin/bash
# Let's try without file type filtering since python extension isn't recognized
echo "Checking for other seed-related code:"
rg "setup_seed|seed\s*=|\.seed\("

echo -e "\nChecking distributed training code that might affect seed:"
rg "DistributedSampler|dist\.is_initialized|get_rank"

# Also check for any other random state management
echo -e "\nChecking for random state management:"
rg "random\.(seed|get_state|set_state)|numpy\.(random|seed)|torch\.(manual_seed|cuda\.manual_seed)"

Length of output: 52957

🧰 Tools
🪛 Ruff (0.8.2)

56-59: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/pd/utils/dataloader.py (1)

33-33: Consider consolidating common utilities

The import of mix_entropy from PyTorch utils (deepmd.pt.utils.utils) into a PaddlePaddle module suggests shared functionality. Consider moving common utilities to a framework-agnostic location (e.g., deepmd.utils.common) to improve maintainability.

Also applies to: 40-42

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf6e6e2 and 63dc3d3.

📒 Files selected for processing (1)
  • deepmd/pd/utils/dataloader.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pd/utils/dataloader.py

57-60: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)


62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

🔇 Additional comments (1)
deepmd/pd/utils/dataloader.py (1)

56-63: Seed handling implementation looks good

The implementation properly handles both scalar and composite seeds, ensuring deterministic behavior across PaddlePaddle, cuDNN, and DeePMD components.

Consider these style improvements:

-    if isinstance(seed, (list, tuple)):
-        mixed_seed = mix_entropy(seed)
-    else:
-        mixed_seed = seed
+    mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed
-    os.environ["FLAGS_cudnn_deterministic"] = "True"
+    os.environ["FLAGS_CUDNN_DETERMINISTIC"] = "True"

Let's verify the seed handling consistency across frameworks:

✅ Verification successful

Seed handling is consistent across frameworks with proper deterministic behavior

The verification confirms that seed handling is implemented consistently across frameworks:

  • PaddlePaddle implementation (deepmd/pd/utils/dataloader.py):

    • Uses mix_entropy for composite seeds
    • Sets paddle.seed and dp_random.seed
    • Enables cuDNN deterministic mode
  • PyTorch implementation (deepmd/pt/utils/dataloader.py):

    • Identical logic for composite seeds
    • Sets torch.manual_seed, torch.cuda.manual_seed_all, and dp_random.seed
    • Enables cuDNN deterministic mode

Both implementations follow the same pattern and ensure deterministic behavior across all components.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check seed handling implementation across frameworks
echo "Checking seed handling in TensorFlow, PyTorch, and PaddlePaddle implementations..."

# Search for setup_seed function implementations
echo "=== Setup Seed Implementations ==="
rg -A 10 "def setup_seed" --type py

# Search for seed-related function calls
echo "=== Seed Usage Patterns ==="
rg "setup_seed|mix_entropy|dp_random.seed" --type py

Length of output: 3541

🧰 Tools
🪛 Ruff (0.8.2)

57-60: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)


62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
deepmd/pd/utils/dataloader.py (4)

39-44: Consider implications of cross-module dependency

The import of mix_entropy from the PyTorch utilities (deepmd.pt.utils) into a Paddle module creates a cross-framework dependency. While this might be intentional for code reuse, it could make maintenance more challenging.

Consider moving shared utilities like mix_entropy to a common module (e.g., deepmd.utils) to maintain better separation between framework-specific code.


58-65: Add type hints and update docstring

The function now accepts both scalar and sequence seeds, but this isn't documented. Consider adding type hints and updating the docstring.

-def setup_seed(seed):
+def setup_seed(seed: Union[int, list, tuple]) -> None:
+    """Set up random seeds for reproducibility.
+    
+    Parameters
+    ----------
+    seed : Union[int, list, tuple]
+        Random seed. If a sequence is provided, it will be mixed using entropy
+        mixing to generate a single seed, useful for distributed training.
+    """
🧰 Tools
🪛 Ruff (0.8.2)

59-62: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)


64-64: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)


63-64: Document the environment variable

The FLAGS_cudnn_deterministic environment variable's purpose and impact should be documented.

Consider adding a comment explaining why this flag is set and its performance implications:

     paddle.seed(mixed_seed)
+    # Enable deterministic cudnn algorithms for reproducibility
+    # Note: This may impact performance
     os.environ["FLAGS_cudnn_deterministic"] = "True"
🧰 Tools
🪛 Ruff (0.8.2)

64-64: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)


64-64: Follow paddle's environment variable convention

The environment variable should be capitalized to follow paddle's convention.

-    os.environ["FLAGS_cudnn_deterministic"] = "True"
+    os.environ["FLAGS_CUDNN_DETERMINISTIC"] = "True"
🧰 Tools
🪛 Ruff (0.8.2)

64-64: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63dc3d3 and c2884d0.

📒 Files selected for processing (1)
  • deepmd/pd/utils/dataloader.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pd/utils/dataloader.py

59-62: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)


64-64: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

🔇 Additional comments (1)
deepmd/pd/utils/dataloader.py (1)

65-65: Verify seed propagation in distributed training

The seed propagation to dp_random needs verification in distributed training scenarios.

✅ Verification successful

Let me gather more information about distributed training implementation to verify the seed propagation.


Based on the analysis of the shell script outputs, I can now provide a final response regarding the seed propagation in distributed training:

Seed propagation is properly handled in distributed scenarios

The codebase shows proper seed handling across different distributed training implementations:

  • TensorFlow (Horovod): Seeds are properly set with rank awareness via dp_random.seed([run_opt.my_rank, seed]) in distributed mode
  • PyTorch (DDP): Seeds are comprehensively set through torch.manual_seed, torch.cuda.manual_seed_all, and dp_random.seed
  • PaddlePaddle: Seeds are properly set through paddle.seed and dp_random.seed

Each framework ensures deterministic behavior by:

  • Using rank-aware seeding in distributed mode
  • Setting framework-specific RNG seeds
  • Enabling deterministic operations (e.g., cudnn.deterministic = True)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if dp_random seed is used consistently across the codebase

# Search for dp_random.seed usage
echo "Checking dp_random.seed usage:"
rg "dp_random\.seed"

# Search for potential random number generation
echo "\nChecking random number generation:"
rg "random\.(rand|randn|choice|shuffle)"

Length of output: 2843


Script:

#!/bin/bash
# Check distributed training implementation and seed handling

# Search for distributed training related code
echo "Checking distributed training implementation:"
rg -A 5 "(DistributedDataParallel|distributed\.launch|distributed\.init|horovod)"

# Search for rank or world_size related code near dp_random
echo "\nChecking rank handling near dp_random:"
rg -B 3 -A 3 "dp_random" --context-separator="---"

# Search for seed handling in training files
echo "\nChecking seed handling in training files:"
fd "train.*\.py" --exec rg -l "seed|random"

Length of output: 28728

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
deepmd/pd/utils/dataloader.py (2)

39-42: Consider consolidating shared utilities

The import of mix_entropy from the PyTorch utilities (deepmd.pt) into the Paddle module suggests shared functionality. Consider moving shared utilities to a common module to maintain better separation of concerns.


62-62: Use uppercase for environment variable name

Following Python conventions, environment variables should be uppercase.

-    os.environ["FLAGS_cudnn_deterministic"] = "True"
+    os.environ["FLAGS_CUDNN_DETERMINISTIC"] = "True"
🧰 Tools
🪛 Ruff (0.8.2)

62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2884d0 and 2d4a8e9.

📒 Files selected for processing (1)
  • deepmd/pd/utils/dataloader.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pd/utils/dataloader.py

57-60: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)


62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

🔇 Additional comments (1)
deepmd/pd/utils/dataloader.py (1)

56-63: Verify seed consistency across distributed processes

The implementation looks correct for handling both single and collection seeds. However, we should verify that this provides consistent results across distributed processes.

✅ Verification successful

Seed handling implementation is correct and consistent across distributed processes

The verification shows that:

  1. Both PyTorch and Paddle implementations use identical mix_entropy functions for handling seed collections
  2. The seed handling is framework-agnostic and properly integrated with distributed training
  3. The implementation maintains consistency by:
    • Using mixed seed for framework-specific RNG (paddle.seed/torch.manual_seed)
    • Using original seed for dp_random
    • Setting deterministic behavior flags
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other seed handling implementations to ensure consistency
# Look for other setup_seed implementations
rg -A 5 "def setup_seed" 

# Look for other uses of mix_entropy
rg "mix_entropy\(" 

Length of output: 1166


Script:

#!/bin/bash
# Check mix_entropy implementation in both pd and pt utils
ast-grep --pattern 'def mix_entropy($$$)'

# Check for any distributed training related seed handling
rg -B 2 -A 5 "distributed|world_size|local_rank" deepmd/pd/utils/dataloader.py deepmd/pt/utils/dataloader.py

Length of output: 3903


Script:

#!/bin/bash
# Check mix_entropy implementation in both utils files
cat deepmd/pd/utils/utils.py deepmd/pt/utils/utils.py | rg -A 10 "def mix_entropy"

# Check if there's any rank-specific seed handling
rg -B 2 -A 5 "get_rank|local_rank.*seed" deepmd/pd/utils/dataloader.py deepmd/pt/utils/dataloader.py

Length of output: 3229

🧰 Tools
🪛 Ruff (0.8.2)

57-60: Use ternary operator mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed instead of if-else-block

Replace if-else-block with mixed_seed = mix_entropy(seed) if isinstance(seed, (list, tuple)) else seed

(SIM108)


62-62: Use capitalized environment variable FLAGS_CUDNN_DETERMINISTIC instead of FLAGS_cudnn_deterministic

Replace FLAGS_cudnn_deterministic with FLAGS_CUDNN_DETERMINISTIC

(SIM112)

Copy link

codecov bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.

Project coverage is 84.42%. Comparing base (c0914e1) to head (2d4a8e9).
Report is 9 commits behind head on devel.

Files with missing lines Patch % Lines
deepmd/tf/entrypoints/train.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            devel    #4479   +/-   ##
=======================================
  Coverage   84.41%   84.42%           
=======================================
  Files         670      670           
  Lines       62147    62158   +11     
  Branches     3487     3486    -1     
=======================================
+ Hits        52464    52474   +10     
- Misses       8556     8558    +2     
+ Partials     1127     1126    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@njzjz njzjz added this to the v3.0.1 milestone Dec 21, 2024
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Dec 25, 2024
@njzjz njzjz removed this from the v3.0.1 milestone Dec 25, 2024
Merged via the queue into deepmodeling:devel with commit f8605ee Dec 25, 2024
60 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Dec 26, 2024
Fix a typo introduced in #4479, which will cause an error if torch is
not installed.

![image](https://github.com/user-attachments/assets/3f9b0955-ec2b-46ac-9efa-1ed4b4e023eb)

cc @njzjz 

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated the import path for the `mix_entropy` function to reflect a
reorganization of utility functions.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Chun Cai <amoycaic@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] seed is unsafe in TF parallel training
3 participants