-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align runtime arguments with run, serve, bench, and perplexity #820
Conversation
Reviewer's Guide by SourceryThis pull request refactors command-line arguments to improve consistency and flexibility. The global Updated class diagram for CLI argument parsingclassDiagram
class ArgumentParser {
+add_argument(...)
}
class Subparsers {
+add_parser(name, help)
}
class PerplexityParser {
+MODEL
+func
}
class BenchParser {
+MODEL
+func
+--ngl
+--network
}
class RunParser {
+--ctx-size
+--device
+-n --name
+--network
+--privileged
+--seed
+--temp
+--ngl
}
ArgumentParser *-- Subparsers : subparsers
Subparsers -- PerplexityParser : "perplexity"
Subparsers -- BenchParser : "bench"
Subparsers -- RunParser : "run"
note for RunParser "Added --ngl"
note for BenchParser "Added --ngl"
note for RunParser "Added common runtime arguments"
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @cgruver - I've reviewed your changes - here's some feedback:
Overall Comments:
- It might be helpful to add a brief explanation of why the global
--gpu
argument was removed. - Consider adding a test case to verify the functionality of the new
--ngl
argument across different subcommands.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This is a draft. I have not been able to fully test yet. |
@ericcurtin or @rhatdan I'd appreciate a sanity check on this. After I land and get home I plan to run this change through live testing. All the CI checks passed, but since I moved args around I'm not sure if the test coverage works for
|
LGTM. but you need a rebase. |
Signed-off-by: Charro Gruver <cgruver@redhat.com>
…consistency. Signed-off-by: Charro Gruver <cgruver@redhat.com>
Signed-off-by: Charro Gruver <cgruver@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @cgruver - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a section to the documentation explaining how GPU selection is handled via environment variables now.
- It might be helpful to provide a migration guide or a note for users who relied on the removed
--gpu
flag.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
#### **--ctx-size**, **-c** | ||
size of the prompt context (default: 2048, 0 = loaded from model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Clarify the meaning of context size.
The documentation should explain what "prompt context" refers to and how it affects the model's behavior.
#### **--ctx-size**, **-c** | |
size of the prompt context (default: 2048, 0 = loaded from model) | |
#### **--ctx-size**, **-c** | |
size of the prompt context (default: 2048, 0 = loaded from model) | |
The prompt context refers to the maximum amount of input text (measured in tokens) that the model uses to generate responses. A larger context allows the model to consider more preceding text, potentially improving the coherence and relevance of its output, but it may also increase memory usage and processing time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion if we want to add this in this PR, it's well worded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback, we will generate more comments like this in the future.
@@ -186,20 +186,6 @@ def configure_arguments(parser): | |||
help="""run RamaLama using the specified container engine. | |||
The RAMALAMA_CONTAINER_ENGINE environment variable modifies default behaviour.""", | |||
) | |||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider extracting the common CLI argument "--ngl" into a helper function to reduce code duplication.
Consider extracting the common CLI argument (e.g. "--ngl") into a helper function to avoid duplication and ensure consistency. For example:
```python
def add_ngl_argument(parser, config):
parser.add_argument(
"--ngl",
dest="ngl",
type=int,
default=config.get("ngl", -1),
help="Number of layers to offload to the gpu, if available",
)
Then update the subcommand and _run parser definitions by replacing the duplicated code with a call to the helper:
def bench_parser(subparsers):
parser = subparsers.add_parser("bench", aliases=["benchmark"], help="benchmark specified AI Model")
parser.add_argument("--network", type=str, default="none", help="set the network mode for the container")
add_ngl_argument(parser, config)
parser.add_argument("MODEL")
parser.set_defaults(func=bench_cli)
def _run(parser):
parser.add_argument("--authfile", help="path of the authentication file")
parser.add_argument("-c", "--ctx-size", dest="context", default=config.get('ctx_size', 2048), help="size of the prompt context (0 = loaded from model)")
# other arguments...
add_ngl_argument(parser, config)
# remaining definitions...
This maintains full functionality while reducing complexity and easing future maintenance.
Removed the global argument
--gpu
Aligned the runtime argument
--ngl
with the subcommandsbench
,perplexity
,run
,serve
Added arguments to the subcommand perplexity to align it with common arguments of
run
andserve
Summary by Sourcery
Remove the global
--gpu
argument and introduce new runtime arguments for theperplexity
subcommand, aligning it with therun
andserve
subcommands. Standardize the--ngl
argument across all relevant subcommands. Update the documentation to reflect these changes.Enhancements:
--gpu
argument.--ngl
runtime argument acrossbench
,perplexity
,run
, andserve
subcommands.perplexity
subcommand withrun
andserve
by adding common runtime arguments.Documentation:
perplexity
,run
,serve
, andbench
commands to reflect the addition of new runtime arguments and removal of the global--gpu
argument.--authfile
,--ctx-size
,--device
,--name
,--network
,--ngl
,--privileged
,--seed
,--temp
, and--tls-verify
.--device
argument in the documentation forrun
andserve
commands.run
command documentation.