Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align runtime arguments with run, serve, bench, and perplexity #820

Merged
merged 3 commits into from
Feb 14, 2025

Conversation

cgruver
Copy link
Collaborator

@cgruver cgruver commented Feb 14, 2025

Removed the global argument --gpu

Aligned the runtime argument --ngl with the subcommands bench, perplexity, run, serve

Added arguments to the subcommand perplexity to align it with common arguments of run and serve

Summary by Sourcery

Remove the global --gpu argument and introduce new runtime arguments for the perplexity subcommand, aligning it with the run and serve subcommands. Standardize the --ngl argument across all relevant subcommands. Update the documentation to reflect these changes.

Enhancements:

  • Remove the global --gpu argument.
  • Standardize the --ngl runtime argument across bench, perplexity, run, and serve subcommands.
  • Align the perplexity subcommand with run and serve by adding common runtime arguments.

Documentation:

  • Update documentation for perplexity, run, serve, and bench commands to reflect the addition of new runtime arguments and removal of the global --gpu argument.
  • Document new runtime arguments, including --authfile, --ctx-size, --device, --name, --network, --ngl, --privileged, --seed, --temp, and --tls-verify.
  • Clarify the device specification for the --device argument in the documentation for run and serve commands.
  • Correct a typo in the device specification description for the run command documentation.

Copy link
Contributor

sourcery-ai bot commented Feb 14, 2025

Reviewer's Guide by Sourcery

This pull request refactors command-line arguments to improve consistency and flexibility. The global --gpu argument was removed, and the --ngl argument was moved to the run, serve, bench, and perplexity subcommands. The perplexity subcommand was updated to align with the arguments of run and serve. GPU handling was updated to rely on environment variables. Documentation was updated to reflect the changes.

Updated class diagram for CLI argument parsing

classDiagram
    class ArgumentParser {
        +add_argument(...)
    }
    class Subparsers {
        +add_parser(name, help)
    }
    class PerplexityParser {
        +MODEL
        +func
    }
    class BenchParser {
        +MODEL
        +func
        +--ngl
        +--network
    }
    class RunParser {
        +--ctx-size
        +--device
        +-n --name
        +--network
        +--privileged
        +--seed
        +--temp
        +--ngl
    }

    ArgumentParser *-- Subparsers : subparsers
    Subparsers -- PerplexityParser : "perplexity"
    Subparsers -- BenchParser : "bench"
    Subparsers -- RunParser : "run"

    note for RunParser "Added --ngl"
    note for BenchParser "Added --ngl"
    note for RunParser "Added common runtime arguments"
Loading

File-Level Changes

Change Details Files
The global --gpu argument was removed, and the --ngl argument was moved to subcommands.
  • Removed the global --gpu argument.
  • Moved the --ngl argument from the global scope to the run, serve, bench, and perplexity subcommands.
ramalama/cli.py
docs/ramalama.1.md
The perplexity subcommand was updated to align with the arguments of run and serve.
  • Added common runtime arguments to the perplexity subcommand.
  • Updated the help message for the perplexity command to include common runtime arguments.
ramalama/cli.py
docs/ramalama-perplexity.1.md
GPU handling was updated to rely on environment variables.
  • The code now checks for the presence of environment variables such as HIP_VISIBLE_DEVICES, ASAHI_VISIBLE_DEVICES, CUDA_VISIBLE_DEVICES, and INTEL_VISIBLE_DEVICES to determine GPU availability.
  • The --gpu flag is no longer used.
ramalama/model.py
Documentation was updated to reflect the changes.
  • Removed the documentation for the global --ngl and --gpu options from the main help message.
  • Documented the --ngl option for the run, serve, bench, and perplexity subcommands.
  • Corrected a typo in the device specification description.
docs/ramalama.1.md
docs/ramalama-run.1.md
docs/ramalama-serve.1.md
docs/ramalama-bench.1.md

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@cgruver cgruver marked this pull request as draft February 14, 2025 16:10
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @cgruver - I've reviewed your changes - here's some feedback:

Overall Comments:

  • It might be helpful to add a brief explanation of why the global --gpu argument was removed.
  • Consider adding a test case to verify the functionality of the new --ngl argument across different subcommands.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

docs/ramalama-perplexity.1.md Show resolved Hide resolved
@cgruver
Copy link
Collaborator Author

cgruver commented Feb 14, 2025

This is a draft. I have not been able to fully test yet.

@cgruver
Copy link
Collaborator Author

cgruver commented Feb 14, 2025

@ericcurtin or @rhatdan I'd appreciate a sanity check on this.

After I land and get home I plan to run this change through live testing.

All the CI checks passed, but since I moved args around I'm not sure if the test coverage works for perplexity or bench

perplexity looked like it might have been missing some of the args to modify its behavior.

@rhatdan
Copy link
Member

rhatdan commented Feb 14, 2025

LGTM. but you need a rebase.

Signed-off-by: Charro Gruver <cgruver@redhat.com>
…consistency.

Signed-off-by: Charro Gruver <cgruver@redhat.com>
Signed-off-by: Charro Gruver <cgruver@redhat.com>
@cgruver cgruver marked this pull request as ready for review February 14, 2025 20:00
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @cgruver - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a section to the documentation explaining how GPU selection is handled via environment variables now.
  • It might be helpful to provide a migration guide or a note for users who relied on the removed --gpu flag.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +31 to +32
#### **--ctx-size**, **-c**
size of the prompt context (default: 2048, 0 = loaded from model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Clarify the meaning of context size.

The documentation should explain what "prompt context" refers to and how it affects the model's behavior.

Suggested change
#### **--ctx-size**, **-c**
size of the prompt context (default: 2048, 0 = loaded from model)
#### **--ctx-size**, **-c**
size of the prompt context (default: 2048, 0 = loaded from model)
The prompt context refers to the maximum amount of input text (measured in tokens) that the model uses to generate responses. A larger context allows the model to consider more preceding text, potentially improving the coherence and relevance of its output, but it may also increase memory usage and processing time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion if we want to add this in this PR, it's well worded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback, we will generate more comments like this in the future.

@@ -186,20 +186,6 @@ def configure_arguments(parser):
help="""run RamaLama using the specified container engine.
The RAMALAMA_CONTAINER_ENGINE environment variable modifies default behaviour.""",
)
parser.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting the common CLI argument "--ngl" into a helper function to reduce code duplication.

Consider extracting the common CLI argument (e.g. "--ngl") into a helper function to avoid duplication and ensure consistency. For example:

```python
def add_ngl_argument(parser, config):
    parser.add_argument(
        "--ngl",
        dest="ngl",
        type=int,
        default=config.get("ngl", -1),
        help="Number of layers to offload to the gpu, if available",
    )

Then update the subcommand and _run parser definitions by replacing the duplicated code with a call to the helper:

def bench_parser(subparsers):
    parser = subparsers.add_parser("bench", aliases=["benchmark"], help="benchmark specified AI Model")
    parser.add_argument("--network", type=str, default="none", help="set the network mode for the container")
    add_ngl_argument(parser, config)
    parser.add_argument("MODEL")
    parser.set_defaults(func=bench_cli)

def _run(parser):
    parser.add_argument("--authfile", help="path of the authentication file")
    parser.add_argument("-c", "--ctx-size", dest="context", default=config.get('ctx_size', 2048), help="size of the prompt context (0 = loaded from model)")
    # other arguments...
    add_ngl_argument(parser, config)
    # remaining definitions...

This maintains full functionality while reducing complexity and easing future maintenance.

@rhatdan rhatdan merged commit b8e779c into containers:main Feb 14, 2025
16 checks passed
@cgruver cgruver deleted the arg-work branch February 14, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants