-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completed Newbie Project #292
base: master
Are you sure you want to change the base?
Conversation
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
8604658 | Triggered | Generic High Entropy Secret | 8f873f3 | backend/ohq/vector_db.py | View secret |
8604658 | Triggered | Generic High Entropy Secret | 8f873f3 | backend/ohq/vector_db.py | View secret |
8604658 | Triggered | Generic High Entropy Secret | 82b8c3e | backend/ohq/vector_db.py | View secret |
8604658 | Triggered | Generic High Entropy Secret | 82b8c3e | backend/ohq/vector_db.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
backend/ohq/vector_db.py
Outdated
openai.api_key = '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' | ||
encoding = tiktoken.get_encoding("cl100k_base") | ||
|
||
pinecone.init(api_key='2a81b230-ba5d-4290-a3ac-b60e6ad17a66', environment='gcp-starter') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should put all the API keys in the .env
file and get the secrets back via os.environ.get(API_KEY)
. Strongly recommend against committing keys without using .env
and without adding .env
into .gitignore
when we are committing into a public repo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a cleaner way to do this would be to also set up the keys inside the django.conf
object. You can look at the sms.py to see how it was setup but essential, the steps are:
- Go to
officehoursqueue/settings/base.py
and define your environment secret keys there (follow the TWILIO example and set up the keys there)and it would look like thisPINECONE_API_KEY=os.environ.get("PINECONE_API_KEY")
- Now you can just import the global settings object like this:
from django.conf import settings
anywhere within the codebase or look atohq/sms.py
. - Get the secret keys by calling
settings.PINECONE_API_KEY
and load it as needed
@@ -9,6 +9,8 @@ frontend/yarn-error.log | |||
__pycache__/ | |||
*.pyc | |||
|
|||
.env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -411,3 +411,24 @@ class Announcement(models.Model): | |||
author = models.ForeignKey(User, related_name="announcements", on_delete=models.CASCADE) | |||
time_updated = models.DateTimeField(auto_now=True) | |||
course = models.ForeignKey(Course, related_name="announcements", on_delete=models.CASCADE) | |||
|
|||
|
|||
class VectorDB(models.Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also add a __str__
method to this model to format the name of the VectorDB objects within django admin. So perhaps:
def __str__(self):
return f"{self.course}: {self.name} Vector DB"
backend/ohq/permissions.py
Outdated
return False | ||
|
||
if view.action == "retrieve": | ||
return membership.is_ta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would only mean that TAs roles can have permission to retrieve but not Leadership roles (like Head TA or Professor), so should be:
return membership.is_ta or membership.is_leadership
to have all course staff have access
backend/ohq/permissions.py
Outdated
return False | ||
|
||
if view.action == "retrieve": | ||
return membership.is_ta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the DocumentCreatePermission class! Include head tas and professor in retrieve permisson!
backend/ohq/permissions.py
Outdated
return membership.is_ta | ||
|
||
# Head TAs+ can make changes | ||
if view.action in ["retrieve", "create", "destroy", "update", "partial_update"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the DocumentCreatePermission class! Remove retrieve if retrieve permission is already defined above!
@ethanwang-04 Overall good job on this PR (and cool stuff) and I left a few comments here and there. I would leave over some more comments as I see more possible fixes! |
backend/ohq/vector_db.py
Outdated
def search_index(term, top_k=5): | ||
embedded_term = embed_vectors(term) | ||
|
||
pinecone.init(api_key='2a81b230-ba5d-4290-a3ac-b60e6ad17a66', environment='gcp-starter') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as the one about secret keys. Use the global django.conf settings object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to remove this
backend/ohq/vector_db.py
Outdated
def search_index_with_metadata(term, metadata, top_k=5): | ||
embedded_term = embed_vectors(term) | ||
|
||
pinecone.init(api_key='2a81b230-ba5d-4290-a3ac-b60e6ad17a66', environment='gcp-starter') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as the one about secret keys. Use the global django.conf
settings object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
backend/ohq/vector_db.py
Outdated
|
||
results = [] | ||
for match in matches: | ||
print(match) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to remove print statements from production code!
backend/ohq/vector_db.py
Outdated
results = [] | ||
for match in matches: | ||
print(match) | ||
print("this is not metadata query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to remove print statements from production code!
…r upload to Pinecone with metadata. Also implemented vector retrieval from Pinecone.
32c3566
to
8f873f3
Compare
|
||
class DocumentCreateView(generics.CreateAPIView): | ||
|
||
# permission_classes = [DocumentCreatePermission | IsSuperuser] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment this line to allow permissions to apply
from ohq.models import Document | ||
|
||
openai.api_key = settings.OPENAI_API_KEY | ||
pinecone.init(api_key=settings.PINECONE_API_KEY, environment='gcp-starter') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think u should also include 'gcp-starter' within the global settings object as well
Encouraged writing some tests to make sure vector databases models works correctly, especially with parsing different kinds of documents into chunks |
Implemented document file upload, parsing, text extraction, and vector upload to Pinecone with metadata. Also implemented vector retrieval from Pinecone. @trangiabach