Align runtime arguments with run, serve, bench, and perplexity #820

cgruver · 2025-02-14T16:10:40Z

Removed the global argument --gpu

Aligned the runtime argument --ngl with the subcommands bench, perplexity, run, serve

Added arguments to the subcommand perplexity to align it with common arguments of run and serve

Summary by Sourcery

Remove the global --gpu argument and introduce new runtime arguments for the perplexity subcommand, aligning it with the run and serve subcommands. Standardize the --ngl argument across all relevant subcommands. Update the documentation to reflect these changes.

Enhancements:

Remove the global --gpu argument.
Standardize the --ngl runtime argument across bench, perplexity, run, and serve subcommands.
Align the perplexity subcommand with run and serve by adding common runtime arguments.

Documentation:

Update documentation for perplexity, run, serve, and bench commands to reflect the addition of new runtime arguments and removal of the global --gpu argument.
Document new runtime arguments, including --authfile, --ctx-size, --device, --name, --network, --ngl, --privileged, --seed, --temp, and --tls-verify.
Clarify the device specification for the --device argument in the documentation for run and serve commands.
Correct a typo in the device specification description for the run command documentation.

sourcery-ai · 2025-02-14T16:10:46Z

Reviewer's Guide by Sourcery

This pull request refactors command-line arguments to improve consistency and flexibility. The global --gpu argument was removed, and the --ngl argument was moved to the run, serve, bench, and perplexity subcommands. The perplexity subcommand was updated to align with the arguments of run and serve. GPU handling was updated to rely on environment variables. Documentation was updated to reflect the changes.

Updated class diagram for CLI argument parsing

classDiagram
    class ArgumentParser {
        +add_argument(...)
    }
    class Subparsers {
        +add_parser(name, help)
    }
    class PerplexityParser {
        +MODEL
        +func
    }
    class BenchParser {
        +MODEL
        +func
        +--ngl
        +--network
    }
    class RunParser {
        +--ctx-size
        +--device
        +-n --name
        +--network
        +--privileged
        +--seed
        +--temp
        +--ngl
    }

    ArgumentParser *-- Subparsers : subparsers
    Subparsers -- PerplexityParser : "perplexity"
    Subparsers -- BenchParser : "bench"
    Subparsers -- RunParser : "run"

    note for RunParser "Added --ngl"
    note for BenchParser "Added --ngl"
    note for RunParser "Added common runtime arguments"

File-Level Changes

Change	Details	Files
The global `--gpu` argument was removed, and the `--ngl` argument was moved to subcommands.	Removed the global `--gpu` argument. Moved the `--ngl` argument from the global scope to the `run`, `serve`, `bench`, and `perplexity` subcommands.	`ramalama/cli.py` `docs/ramalama.1.md`
The `perplexity` subcommand was updated to align with the arguments of `run` and `serve`.	Added common runtime arguments to the `perplexity` subcommand. Updated the help message for the `perplexity` command to include common runtime arguments.	`ramalama/cli.py` `docs/ramalama-perplexity.1.md`
GPU handling was updated to rely on environment variables.	The code now checks for the presence of environment variables such as `HIP_VISIBLE_DEVICES`, `ASAHI_VISIBLE_DEVICES`, `CUDA_VISIBLE_DEVICES`, and `INTEL_VISIBLE_DEVICES` to determine GPU availability. The `--gpu` flag is no longer used.	`ramalama/model.py`
Documentation was updated to reflect the changes.	Removed the documentation for the global `--ngl` and `--gpu` options from the main help message. Documented the `--ngl` option for the `run`, `serve`, `bench`, and `perplexity` subcommands. Corrected a typo in the device specification description.	`docs/ramalama.1.md` `docs/ramalama-run.1.md` `docs/ramalama-serve.1.md` `docs/ramalama-bench.1.md`

Possibly linked issues

We are displaying display driver info, scope creep #710: The PR removes the --gpu flag, addressing the issue indirectly by making --ngl the primary way to control GPU usage.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @cgruver - I've reviewed your changes - here's some feedback:

Overall Comments:

It might be helpful to add a brief explanation of why the global --gpu argument was removed.
Consider adding a test case to verify the functionality of the new --ngl argument across different subcommands.

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

docs/ramalama-perplexity.1.md

cgruver · 2025-02-14T16:14:26Z

This is a draft. I have not been able to fully test yet.

cgruver · 2025-02-14T18:30:34Z

@ericcurtin or @rhatdan I'd appreciate a sanity check on this.

After I land and get home I plan to run this change through live testing.

All the CI checks passed, but since I moved args around I'm not sure if the test coverage works for perplexity or bench

perplexity looked like it might have been missing some of the args to modify its behavior.

rhatdan · 2025-02-14T19:30:13Z

LGTM. but you need a rebase.

Signed-off-by: Charro Gruver <cgruver@redhat.com>

…consistency. Signed-off-by: Charro Gruver <cgruver@redhat.com>

Signed-off-by: Charro Gruver <cgruver@redhat.com>

sourcery-ai

Hey @cgruver - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider adding a section to the documentation explaining how GPU selection is handled via environment variables now.
It might be helpful to provide a migration guide or a note for users who relied on the removed --gpu flag.

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-02-14T20:01:43Z

docs/ramalama-perplexity.1.md

+#### **--ctx-size**, **-c**
+size of the prompt context (default: 2048, 0 = loaded from model)


suggestion: Clarify the meaning of context size.

The documentation should explain what "prompt context" refers to and how it affects the model's behavior.

Suggested change

#### **--ctx-size**, **-c**

size of the prompt context (default: 2048, 0 = loaded from model)

#### **--ctx-size**, **-c**

size of the prompt context (default: 2048, 0 = loaded from model)

The prompt context refers to the maximum amount of input text (measured in tokens) that the model uses to generate responses. A larger context allows the model to consider more preceding text, potentially improving the coherence and relevance of its output, but it may also increase memory usage and processing time.

This is a good suggestion if we want to add this in this PR, it's well worded

Thank you for your feedback, we will generate more comments like this in the future.

sourcery-ai · 2025-02-14T20:01:43Z

ramalama/cli.py

@@ -186,20 +186,6 @@ def configure_arguments(parser):
        help="""run RamaLama using the specified container engine.
 The RAMALAMA_CONTAINER_ENGINE environment variable modifies default behaviour.""",
    )
-    parser.add_argument(


issue (complexity): Consider extracting the common CLI argument "--ngl" into a helper function to reduce code duplication.

Consider extracting the common CLI argument (e.g. "--ngl") into a helper function to avoid duplication and ensure consistency. For example: ```python def add_ngl_argument(parser, config): parser.add_argument( "--ngl", dest="ngl", type=int, default=config.get("ngl", -1), help="Number of layers to offload to the gpu, if available", )

Then update the subcommand and _run parser definitions by replacing the duplicated code with a call to the helper:

def bench_parser(subparsers): parser = subparsers.add_parser("bench", aliases=["benchmark"], help="benchmark specified AI Model") parser.add_argument("--network", type=str, default="none", help="set the network mode for the container") add_ngl_argument(parser, config) parser.add_argument("MODEL") parser.set_defaults(func=bench_cli) def _run(parser): parser.add_argument("--authfile", help="path of the authentication file") parser.add_argument("-c", "--ctx-size", dest="context", default=config.get('ctx_size', 2048), help="size of the prompt context (0 = loaded from model)") # other arguments... add_ngl_argument(parser, config) # remaining definitions...

This maintains full functionality while reducing complexity and easing future maintenance.

cgruver requested review from rhatdan, ericcurtin, bmahabirbu, maxamillion, swarajpande5, jhjaggars, slp and engelmi as code owners February 14, 2025 16:10

cgruver marked this pull request as draft February 14, 2025 16:10

sourcery-ai bot reviewed Feb 14, 2025

View reviewed changes

docs/ramalama-perplexity.1.md Show resolved Hide resolved

cgruver added 3 commits February 14, 2025 14:49

Rebase - Align runtime arguments with run, serve, bench, and perplexity

a44616a

Signed-off-by: Charro Gruver <cgruver@redhat.com>

correct missing args in perplexity doc. Align add_argument style for …

eb12498

…consistency. Signed-off-by: Charro Gruver <cgruver@redhat.com>

correct missing args in perplexity doc.

25d07ae

Signed-off-by: Charro Gruver <cgruver@redhat.com>

cgruver force-pushed the arg-work branch from 2a62af9 to 25d07ae Compare February 14, 2025 19:53

cgruver marked this pull request as ready for review February 14, 2025 20:00

sourcery-ai bot reviewed Feb 14, 2025

View reviewed changes

ericcurtin approved these changes Feb 14, 2025

View reviewed changes

rhatdan merged commit b8e779c into containers:main Feb 14, 2025
16 checks passed

cgruver deleted the arg-work branch February 14, 2025 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align runtime arguments with run, serve, bench, and perplexity #820

Align runtime arguments with run, serve, bench, and perplexity #820

cgruver commented Feb 14, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 14, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

cgruver commented Feb 14, 2025

cgruver commented Feb 14, 2025

rhatdan commented Feb 14, 2025

sourcery-ai bot left a comment

sourcery-ai bot Feb 14, 2025

ericcurtin Feb 14, 2025

sourcery-ai bot Feb 14, 2025

sourcery-ai bot Feb 14, 2025

		#### --ctx-size, -c
		size of the prompt context (default: 2048, 0 = loaded from model)

Align runtime arguments with run, serve, bench, and perplexity #820

Align runtime arguments with run, serve, bench, and perplexity #820

Conversation

cgruver commented Feb 14, 2025 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Feb 14, 2025 • edited Loading

Reviewer's Guide by Sourcery

Updated class diagram for CLI argument parsing

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

cgruver commented Feb 14, 2025

cgruver commented Feb 14, 2025

rhatdan commented Feb 14, 2025

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Feb 14, 2025

Choose a reason for hiding this comment

ericcurtin Feb 14, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 14, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 14, 2025

Choose a reason for hiding this comment

cgruver commented Feb 14, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 14, 2025 •

edited

Loading