Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Support for HuggingFace Self-Served TGI Serving with Updated Dependencies and Improved AI Handler #472

Closed
wants to merge 4 commits into from

Conversation

okotek
Copy link
Contributor

@okotek okotek commented Nov 23, 2023

Type

Enhancement


Description

This PR introduces support for HuggingFace self-served TGI serving and makes several enhancements to the AI Handler. The main changes include:

  • Added a new attribute api_base to the AIHandler class.
  • Modified the chat_completion method to include api_base in the acompletion call.
  • Adjusted the chat_completion method to add prefixes and suffixes to system and user messages based on the configuration.
  • Handled the case where the model is not found in the MAX_TOKENS dictionary in get_max_tokens function.
  • Updated the openai version from 0.27.8 to 1.3.5 and litellm version from 0.12.5 to 1.7.1 in the requirements.txt file.
  • Replaced the import of TryAgain from openai.error to tenacity.

PR changes walkthrough

Relevant files                                                                                                                                 
Enhancement
1 files
ai_handler.py                                                                                             
    pr_agent/algo/ai_handler.py

    The AIHandler class has been updated to include a new
    attribute api_base. The chat_completion method has been
    modified to include api_base in the acompletion call and
    to add prefixes and suffixes to system and user messages
    based on the configuration. The import of TryAgain has
    been replaced from openai.error to tenacity.

+38/-17
Bug fix
1 files
utils.py                                                                                                       
    pr_agent/algo/utils.py

    The get_max_tokens function has been updated to handle the
    case where the model is not found in the MAX_TOKENS
    dictionary.

+1/-1
Dependencies
1 files
requirements.txt                                                                                       
    requirements.txt

    The openai version has been updated from 0.27.8 to
    1.3.5 and litellm version from 0.12.5 to 1.7.1.

+2/-2

Copy link
Contributor

github-actions bot commented Nov 23, 2023

PR Analysis

(review updated until commit 5d9c767)

  • 🎯 Main theme: Enhancement of AI Handler to support HuggingFace self-served TGI serving and update dependencies
  • 📝 PR summary: This PR introduces support for HuggingFace self-served TGI serving in the AI Handler. It adds a new attribute api_base to the AIHandler class and modifies the chat_completion method to include api_base in the acompletion call. It also updates the openai and litellm versions in the requirements.txt file and replaces the import of TryAgain from openai.error to tenacity.
  • 📌 Type of PR: Enhancement
  • 🧪 Relevant tests added: No
  • ⏱️ Estimated effort to review [1-5]: 3, because the PR introduces significant changes to the AI Handler and updates dependencies, which requires a careful review to ensure compatibility and correct functionality.
  • 🔒 Security concerns: No security concerns found

PR Feedback

  • 💡 General suggestions: The PR is well-structured and the changes are clear. However, it would be beneficial to include tests for the new functionality introduced, particularly the changes in the chat_completion method and the handling of the api_base attribute. This would help ensure the new functionality works as expected and does not introduce regressions.

  • 🤖 Code feedback:
    relevant filepr_agent/algo/ai_handler.py
    suggestion      

    Consider handling the case where 'huggingface' is not in get_settings().config.model when setting litellm.api_base and self.api_base. This could prevent potential errors if the HUGGINGFACE.API_BASE setting is present but the model is not 'huggingface'. [important]

    relevant lineif get_settings().get("HUGGINGFACE.API_BASE", None) and 'huggingface' in get_settings().config.model:

    relevant filepr_agent/algo/ai_handler.py
    suggestion      

    It would be beneficial to handle the case where the api_base attribute is None in the chat_completion method. This could prevent potential errors when calling acompletion with api_base=None. [important]

    relevant lineapi_base=self.api_base,

    relevant filepr_agent/algo/utils.py
    suggestion      

    Consider handling the case where the model is not found in the MAX_TOKENS dictionary more explicitly. For example, you could log a warning message when the default value is used. This could help with debugging if unexpected behavior occurs due to the model not being found in the dictionary. [medium]

    relevant linemax_tokens_model = MAX_TOKENS.get(model, 4000)

    relevant filerequirements.txt
    suggestion      

    Ensure that the updated versions of openai and litellm are compatible with the rest of the codebase and do not introduce breaking changes. It would be beneficial to perform thorough testing after updating these dependencies. [important]

    relevant lineopenai==1.3.5

@mrT23
Copy link
Collaborator

mrT23 commented Nov 26, 2023

/improve --pr_code_suggestions.summarize=true

Copy link
Contributor

PR Code Suggestions

💡 Suggestion:

Avoid hardcoding values directly in the code. Instead, consider defining them as constants at the top of your file or in a separate config file. This will make it easier to manage and modify these values in the future.

File: pr_agent/algo/ai_handler.py (74-79)

Example code:

Existing code:

@retry(
    exceptions=(APIError, TryAgain, AttributeError, RateLimitError),
    tries=OPENAI_RETRIES,
    delay=2,
    backoff=2,
    jitter=(1, 3),
)

Improved code:

OPENAI_RETRY_DELAY = 2
OPENAI_RETRY_BACKOFF = 2
OPENAI_RETRY_JITTER = (1, 3)

@retry(
    exceptions=(APIError, TryAgain, AttributeError, RateLimitError),
    tries=OPENAI_RETRIES,
    delay=OPENAI_RETRY_DELAY,
    backoff=OPENAI_RETRY_BACKOFF,
    jitter=OPENAI_RETRY_JITTER,
)

💡 Suggestion:

It's a good practice to use f-strings for formatting strings in Python. It's more readable and efficient than the traditional string formatting methods.

File: pr_agent/algo/ai_handler.py (135-142)

Example code:

Existing code:

get_logger().info(
    "AI response",
    response=resp,
    messages=messages,
    finish_reason=finish_reason,
    model=model,
    usage=usage,
)

Improved code:

get_logger().info(
    f"AI response: {resp}, messages: {messages}, finish_reason: {finish_reason}, model: {model}, usage: {usage}"
)

💡 Suggestion:

It's better to use the dict.get() method instead of dict[key] for dictionary access. This will prevent KeyError exceptions if the key is not present in the dictionary.

File: pr_agent/algo/utils.py (373-373)

Example code:

Existing code:

max_tokens_model = MAX_TOKENS.get(model, 4000)

Improved code:

max_tokens_model = MAX_TOKENS.get(model, 4000)

💡 Suggestion:

It's recommended to pin the versions of your dependencies. This can help to avoid unexpected breaks due to updates in the dependencies.

File: requirements.txt (25-25)

Example code:

Existing code:

litellm @ git+https://github.com/Codium-ai/litellm.git

Improved code:

litellm @ git+https://github.com/Codium-ai/litellm.git@<commit-hash>

@GadiZimerman
Copy link
Collaborator

/add_docs

pr_agent/algo/ai_handler.py Show resolved Hide resolved
pr_agent/algo/ai_handler.py Show resolved Hide resolved
pr_agent/algo/ai_handler.py Show resolved Hide resolved
pr_agent/algo/ai_handler.py Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Dec 1, 2023

Persistent review updated to latest commit 5d9c767

@Codium-ai Codium-ai deleted a comment from github-actions bot Dec 1, 2023
@CodiumAI-Agent CodiumAI-Agent changed the title Changed to support HuggingFace self served TGI serving Support for HuggingFace self served TGI serving Dec 3, 2023
@CodiumAI-Agent CodiumAI-Agent added the enhancement New feature or request label Dec 3, 2023
@github-actions github-actions bot changed the title Support for HuggingFace self served TGI serving Enhancement: Support for HuggingFace self served TGI serving Dec 3, 2023
@mrT23 mrT23 changed the title Enhancement: Support for HuggingFace self served TGI serving Enhancement: Support for HuggingFace self served TGI serving with updated dependencies Dec 4, 2023
@mrT23 mrT23 changed the title Enhancement: Support for HuggingFace self served TGI serving with updated dependencies Enhancement: Support for HuggingFace Self-Served TGI Serving with Updated Dependencies and Improved AI Handler Dec 4, 2023
@mrT23
Copy link
Collaborator

mrT23 commented Dec 4, 2023

PR Description updated to latest commit (5d9c767)

1 similar comment
@mrT23
Copy link
Collaborator

mrT23 commented Dec 4, 2023

PR Description updated to latest commit (5d9c767)

@Codium-ai Codium-ai deleted a comment from github-actions bot Dec 4, 2023
@Codium-ai Codium-ai deleted a comment from github-actions bot Dec 4, 2023
@Codium-ai Codium-ai deleted a comment from github-actions bot Dec 4, 2023
@Codium-ai Codium-ai deleted a comment from github-actions bot Dec 4, 2023
@Codium-ai Codium-ai deleted a comment from github-actions bot Dec 4, 2023
@Codium-ai Codium-ai deleted a comment from github-actions bot Dec 4, 2023
@mrT23
Copy link
Collaborator

mrT23 commented Dec 17, 2023

Persistent review updated to latest commit 5d9c767

1 similar comment
@mrT23
Copy link
Collaborator

mrT23 commented Dec 17, 2023

Persistent review updated to latest commit 5d9c767

@mrT23
Copy link
Collaborator

mrT23 commented Dec 17, 2023

PR Code Suggestions

💡 Suggestion:

Instead of using the get_settings() function multiple times, consider storing the settings in a variable and reusing it.

File: pr_agent/algo/ai_handler.py (30-60)

Example code:

Existing code:

if get_settings().get("OPENAI.KEY", None):
    openai.api_key = get_settings().openai.key
    litellm.openai_key = get_settings().openai.key
if get_settings().get("HUGGINGFACE.KEY", None):
    litellm.huggingface_key = get_settings().huggingface.key
if get_settings().get("HUGGINGFACE.API_BASE", None) and 'huggingface' in get_settings().config.model:
    litellm.api_base = get_settings().huggingface.api_base
    self.api_base = get_settings().huggingface.api_base

Improved code:

settings = get_settings()
if settings.get("OPENAI.KEY", None):
    openai.api_key = settings.openai.key
    litellm.openai_key = settings.openai.key
if settings.get("HUGGINGFACE.KEY", None):
    litellm.huggingface_key = settings.huggingface.key
if settings.get("HUGGINGFACE.API_BASE", None) and 'huggingface' in settings.config.model:
    litellm.api_base = settings.huggingface.api_base
    self.api_base = settings.huggingface.api_base

💡 Suggestion:

Instead of concatenating strings using '+', consider using f-strings for better readability.

File: pr_agent/algo/ai_handler.py (112-115)

Example code:

Existing code:

system = get_settings().get("CONFIG.MODEL_SYSTEM_PREFIX", "") + system + \
         get_settings().get("CONFIG.MODEL_SYSTEM_SUFFIX", "")
user = get_settings().get("CONFIG.MODEL_USER_PREFIX", "") + user + \
       get_settings().get("CONFIG.MODEL_USER_SUFFIX", "")

Improved code:

system = f"{get_settings().get('CONFIG.MODEL_SYSTEM_PREFIX', '')}{system}{get_settings().get('CONFIG.MODEL_SYSTEM_SUFFIX', '')}"
user = f"{get_settings().get('CONFIG.MODEL_USER_PREFIX', '')}{user}{get_settings().get('CONFIG.MODEL_USER_SUFFIX', '')}"

💡 Suggestion:

Instead of using the get_settings() function multiple times, consider storing the settings in a variable and reusing it.

File: pr_agent/algo/utils.py (378-380)

Example code:

Existing code:

settings = get_settings()
max_tokens_model = MAX_TOKENS.get(model, 4000)
if settings.config.max_model_tokens:
    max_tokens_model = min(settings.config.max_model_tokens, max_tokens_model)

Improved code:

settings = get_settings()
max_tokens_model = MAX_TOKENS.get(model, 4000)
if settings.config.max_model_tokens:
    max_tokens_model = min(settings.config.max_model_tokens, max_tokens_model)

💡 Suggestion:

Keep the dependencies in the requirements.txt file sorted alphabetically. This makes it easier to find a dependency and see its version at a glance.

File: requirements.txt (5-25)

Example code:

Existing code:

openai==1.3.5
...
litellm==1.7.1

Improved code:

litellm==1.7.1
...
openai==1.3.5

@mrT23
Copy link
Collaborator

mrT23 commented Dec 20, 2023

PR Description updated to latest commit (5d9c767)

@Codium-ai Codium-ai deleted a comment from github-actions bot Dec 23, 2023
@mrT23
Copy link
Collaborator

mrT23 commented Dec 24, 2023

PR Code Suggestions

💡 [enhancement]

It seems that the api_base attribute is only used in the chat_completion method. Instead of storing it as an instance variable, you could retrieve it directly from the settings within the chat_completion method. This would make the code more concise and easier to understand.

File: pr_agent/algo/ai_handler.py (29-60)

Example code:

Existing code:

self.api_base = None
...
if get_settings().get("HUGGINGFACE.API_BASE", None) and 'huggingface' in get_settings().config.model:
    litellm.api_base = get_settings().huggingface.api_base
    self.api_base = get_settings().huggingface.api_base

Improved code:

if get_settings().get("HUGGINGFACE.API_BASE", None) and 'huggingface' in get_settings().config.model:
    litellm.api_base = get_settings().huggingface.api_base

💡 [maintainability]

The chat_completion method is quite long and does a lot of different things. Consider breaking it down into smaller helper methods to improve readability and maintainability.

File: pr_agent/algo/ai_handler.py (81-147)

Example code:

Existing code:

async def chat_completion(
    self, model: str, system: str, user: str, temperature: float = 0.2
):
    ...

Improved code:

async def chat_completion(
    self, model: str, system: str, user: str, temperature: float = 0.2
):
    system, user = self._add_prefix_suffix(system, user)
    messages = self._create_messages(system, user)
    response = await self._get_completion_response(model, messages, temperature)
    self._log_response(response, messages, model)
    return response["choices"][0]["message"]["content"], response["choices"][0]["finish_reason"]

💡 [enhancement]

The get_max_tokens function could be improved by handling the case where the model is not found in the MAX_TOKENS dictionary more gracefully. Instead of defaulting to 4000, consider raising an exception or logging a warning.

File: pr_agent/algo/utils.py (378-378)

Example code:

Existing code:

max_tokens_model = MAX_TOKENS.get(model, 4000)

Improved code:

if model not in MAX_TOKENS:
    get_logger().warning(f"Model {model} not found in MAX_TOKENS dictionary. Defaulting to 4000 tokens.")
max_tokens_model = MAX_TOKENS.get(model, 4000)

@Codium-ai Codium-ai deleted a comment from CodiumAI-Agent Dec 24, 2023
@mrT23
Copy link
Collaborator

mrT23 commented Dec 24, 2023

/improve

Copy link
Contributor

PR Code Suggestions

💡 Suggestion:

Instead of using the get_settings() function multiple times, consider storing the settings in a variable and reusing it.

File: pr_agent/algo/ai_handler.py (30-60)

Example code:

Existing code:

if get_settings().get("OPENAI.KEY", None):
    openai.api_key = get_settings().openai.key
    litellm.openai_key = get_settings().openai.key
if get_settings().get("HUGGINGFACE.KEY", None):
    litellm.huggingface_key = get_settings().huggingface.key
if get_settings().get("HUGGINGFACE.API_BASE", None) and 'huggingface' in get_settings().config.model:
    litellm.api_base = get_settings().huggingface.api_base
    self.api_base = get_settings().huggingface.api_base

Improved code:

settings = get_settings()
if settings.get("OPENAI.KEY", None):
    openai.api_key = settings.openai.key
    litellm.openai_key = settings.openai.key
if settings.get("HUGGINGFACE.KEY", None):
    litellm.huggingface_key = settings.huggingface.key
if settings.get("HUGGINGFACE.API_BASE", None) and 'huggingface' in settings.config.model:
    litellm.api_base = settings.huggingface.api_base
    self.api_base = settings.huggingface.api_base

💡 Suggestion:

Consider using f-strings for string concatenation for better readability.

File: pr_agent/algo/ai_handler.py (112-115)

Example code:

Existing code:

system = get_settings().get("CONFIG.MODEL_SYSTEM_PREFIX", "") + system + \
         get_settings().get("CONFIG.MODEL_SYSTEM_SUFFIX", "")
user = get_settings().get("CONFIG.MODEL_USER_PREFIX", "") + user + \
       get_settings().get("CONFIG.MODEL_USER_SUFFIX", "")

Improved code:

system = f"{get_settings().get('CONFIG.MODEL_SYSTEM_PREFIX', '')}{system}{get_settings().get('CONFIG.MODEL_SYSTEM_SUFFIX', '')}"
user = f"{get_settings().get('CONFIG.MODEL_USER_PREFIX', '')}{user}{get_settings().get('CONFIG.MODEL_USER_SUFFIX', '')}"

💡 Suggestion:

Consider using the dict.get() method with a default value to avoid a KeyError if the model is not found in the MAX_TOKENS dictionary.

File: pr_agent/algo/utils.py (378-378)

Example code:

Existing code:

max_tokens_model = MAX_TOKENS[model]

Improved code:

max_tokens_model = MAX_TOKENS.get(model, 4000)

💡 Suggestion:

Consider maintaining the order of dependencies in the requirements file. This can help in identifying the changes made to the dependencies more easily.

File: requirements.txt (5-25)

Example code:

Existing code:

openai==1.3.5
...
litellm==1.7.1

Improved code:

openai==1.3.5
litellm==1.7.1
...

Comment on lines +131 to 133
except Exception as e:
get_logger().error("Unknown error during OpenAI inference: ", e)
raise TryAgain from e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider using a more specific exception instead of the generic Exception in the chat_completion method. This will make the error handling more precise and prevent catching and handling unexpected exceptions that might need different handling logic. [best practice]

Suggested change
except Exception as e:
get_logger().error("Unknown error during OpenAI inference: ", e)
raise TryAgain from e
except SpecificException as e: # Replace SpecificException with the specific exception you want to catch
get_logger().error("Specific error during OpenAI inference: ", e)
raise TryAgain from e

Comment on lines +74 to +80
@retry(
exceptions=(APIError, TryAgain, AttributeError, RateLimitError),
tries=OPENAI_RETRIES,
delay=2,
backoff=2,
jitter=(1, 3),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: It's a good practice to avoid using magic numbers directly in the code. The numbers used in the retry decorator in the chat_completion method (like 2, 1, 3) could be replaced with well-named constants to improve code readability and maintainability. [best practice]

Suggested change
@retry(
exceptions=(APIError, TryAgain, AttributeError, RateLimitError),
tries=OPENAI_RETRIES,
delay=2,
backoff=2,
jitter=(1, 3),
)
DELAY = 2
BACKOFF = 2
JITTER = (1, 3)
@retry(
exceptions=(APIError, TryAgain, AttributeError, RateLimitError),
tries=OPENAI_RETRIES,
delay=DELAY,
backoff=BACKOFF,
jitter=JITTER,
)

@@ -375,7 +375,7 @@ def get_user_labels(current_labels: List[str] = None):

def get_max_tokens(model):
settings = get_settings()
max_tokens_model = MAX_TOKENS[model]
max_tokens_model = MAX_TOKENS.get(model, 4000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider adding a default value for MAX_TOKENS in the get_max_tokens function. This will make the function more robust in case the MAX_TOKENS dictionary doesn't contain the model as a key. [enhancement]

Suggested change
max_tokens_model = MAX_TOKENS.get(model, 4000)
DEFAULT_MAX_TOKENS = 4000
max_tokens_model = MAX_TOKENS.get(model, DEFAULT_MAX_TOKENS)

requirements.txt Outdated
@@ -23,3 +22,4 @@ pinecone-client
pinecone-datasets @ git+https://github.com/mrT23/pinecone-datasets.git@main
loguru==0.7.2
google-cloud-aiplatform==1.35.0
litellm==1.7.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: It's a good practice to pin the versions of your dependencies to ensure your project does not break due to updates in the dependencies. Consider pinning the version of litellm in your requirements.txt file. [best practice]

Suggested change
litellm==1.7.1
litellm==1.7.1 # Pin the version to ensure compatibility

@benlaniado
Copy link

/analyze

Copy link
Contributor

PR Analysis

Here is a components analysis of the PR, as well as a change summary for each component:

Relevant filesComponents                                                                                                                        
ai_handler.py
2 components changed
__init__ (method of AiHandler)

    Changes summary:

    The __init__ method in the PR made the following
    changes:

    • Added the initialization of self.api_base to
      None.
    • Modified the condition for setting
      litellm.api_base and self.api_base based on the presence
      of "HUGGINGFACE.API_BASE" in the configuration file and the
      presence of "huggingface" in the model configuration.

    • Removed an unnecessary line of code.
+4/-3
chat_completion (function)

    Changes summary:

    The chat_completion function in the PR underwent several
    changes. The function signature was modified to include a
    new parameter temperature. Additionally, the system and
    user messages were modified by adding prefixes and
    suffixes. The acompletion function call was also modified
    to include the api_base parameter. Some exception handling
    was updated to remove the Timeout exception and include
    the RateLimitError exception.

+24/-11
utils.py
1 components changed
get_max_tokens (function)

    Changes summary:

    The PR modified the get_max_tokens function by changing
    the way the max_tokens_model variable is assigned. Instead
    of directly accessing the MAX_TOKENS dictionary with the
    model key, the function now uses the get method with a
    default value of 4000. This change allows for a fallback
    value in case the model key is not found in the
    dictionary.

+1/-1

@okotek okotek closed this Jan 8, 2024
@okotek okotek deleted the ok/support_tgi branch January 8, 2024 09:06
Copy link
Contributor

@barnett-yuxiang barnett-yuxiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants