-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support for Azure AI Studio #779
[Feature] Support for Azure AI Studio #779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions about how best to implement this
def _generator(self, prompt, temperature: float): | ||
# Initial parts of this straight up copied from OpenAIChatEngine | ||
|
||
# The next loop (or one like it) appears in several places, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this?
This is a straight-up copy of what is in _open_ai.py
, but there are almost-but-not-quite the same versions elsewhere. Chopping up the internal 'guidance state string' into individual chat messages feels like a task that all chat model will need - and hence should be centrall provided. Individual engines can then turn that format into whatever they precisely need.
def __init__( | ||
self, | ||
*, | ||
tokenizer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never try setting the tokeniser, and it appears that it eventually defaults to GPT2. I don't quite see why a remote model like this would even need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so theoretically for token healing. However, I have a feeling that trying to figure out what tokeniser to use will be an exercise in fragility.
self._reset_shared_data(prompt[:pos], temperature) | ||
|
||
# Use cache only when temperature is 0 | ||
if temperature == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This caching logic is a bit of a concern - at least for models where T=0 doesn't actually get determinism. And in general, it means that a bunch of our tests might not quite be doing what we think they're doing, because they may just be hitting the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about it, the less I like the idea of a disk-based cache. In some ways it's worse on the OpenAI side, where both AzureOpenAI and OpenAI will wind up sharing the same cache.
How much speed up does it really give, compared to the Heisenbug potential it represents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most models have more reliable temp=0 determinism now, but agree that perhaps sharing between AzureOpenAI and OpenAI is problematic (though there shouldn't be differences between the two APIs in theory?). I do think caching is a nice feature to have in general, as production workflows often have shared inputs coming in that save time and money to reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a clear_cache
argument to the constructor. I think I should add this to the OpenAI side as well, in a separate PR.
guidance/models/_azureai_studio.py
Outdated
req = urllib.request.Request(self._endpoint, body, headers) | ||
|
||
response = urllib.request.urlopen(req) | ||
result = json.loads(response.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic around result
is another thing which might be model specific.
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #779 +/- ##
==========================================
+ Coverage 55.42% 62.05% +6.63%
==========================================
Files 55 56 +1
Lines 4076 4159 +83
==========================================
+ Hits 2259 2581 +322
+ Misses 1817 1578 -239 ☔ View full report in Codecov by Sentry. |
tests/models/test_azureai_studio.py
Outdated
assert isinstance(lm, models.AzureAIStudioChat) | ||
lm.engine.cache.clear() | ||
|
||
# No "system" role for Mistral? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me unhappy
@@ -58,13 +58,23 @@ jobs: | |||
python -c "import torch; assert torch.cuda.is_available()" | |||
- name: Test with pytest | |||
env: | |||
# Configure endpoints | |||
# Configure endpoints for Azure OpenAI | |||
AZUREAI_CHAT_ENDPOINT: ${{ secrets.AZUREAI_CHAT_ENDPOINT }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably change the 'ENDPOINT' variables here to be vars
rather than secrets
AZUREAI_CHAT_ENDPOINT: ${{ secrets.AZUREAI_CHAT_ENDPOINT }} | ||
AZUREAI_CHAT_KEY: ${{ secrets.AZUREAI_CHAT_KEY }} | ||
AZUREAI_CHAT_MODEL: ${{ secrets.AZUREAI_CHAT_MODEL }} | ||
AZUREAI_COMPLETION_ENDPOINT: ${{ secrets.AZUREAI_COMPLETION_ENDPOINT }} | ||
AZUREAI_COMPLETION_KEY: ${{ secrets.AZUREAI_COMPLETION_KEY }} | ||
AZUREAI_COMPLETION_MODEL: ${{ secrets.AZUREAI_COMPLETION_MODEL }} | ||
# Configure endpoints for Azure AI Studio | ||
AZURE_AI_STUDIO_PHI3_ENDPOINT: ${{ vars.AZURE_AI_STUDIO_PHI3_ENDPOINT }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Harsha-Nori we will need to work together on getting these into the repository
("system", b"<|im_start|>system\n"), | ||
("user", b"<|im_start|>user\n"), | ||
("assistant", b"<|im_start|>assistant\n"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do AzureAI models uniformly use the same role tags across their models? I don't think we can hard code a check for these start_bytes in this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't coming from the model, surely? They're coming from guidance
as it sends its current prompt back into this class, to be formatted and forwarded on to the model. Or.... have I managed to miss another action-at-a-distance part of guidance?
tests/models/test_model.py
Outdated
print(f"{dir(lm.engine.tokenizer)=}") | ||
assert hasattr(lm.engine.tokenizer,"sp_model") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to have this assert for every model -- not every tokenizer is sentencepiece based.
…-studio-support-01
…-studio-support-01
Add support for models deployed in Azure AI Studio. This has been done by combining the code for OpenAI models, and the same provided by Azure AI Studio.
Since there are a bunch of common test cases which need to run with multiple models, start refactoring those a bit as well (and hook the Azure OpenAI tests into this). This isn't using the same mechanism as the testing of local models, since we won't be running into trouble with fitting multiple LLMs on a single machine.