-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move code samples to master branch #3271
Conversation
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started" | ||
"/main/data/bikes.json" | ||
) | ||
response = requests.get(URL, timeout=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this data should be included here instead of being pulled from another repo that can change at any moment. That'd drop the requests
dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akx Thanks for the review - very much appreciated! I did wonder myself if it would be better to add the JSON to the code. However, I think the tutorial gets the data from a URL to illustrate the kind of task the user might want to do in real life. Also, the file on the Github repo was deliberately created for this tutorial, so it shouldn't change unless we want it to. I'd say we should leave it like this for now but I'll find out if we've got any feedback about this page and see if the users have any problems with the HTTP request.
doctests/search_vss.py
Outdated
# Optional: convert the table to Markdown using Pandas | ||
queries_table = pd.DataFrame(results_list) | ||
queries_table.sort_values( | ||
by=["query", "score"], ascending=[True, False], inplace=True | ||
) | ||
queries_table["query"] = queries_table.groupby("query")["query"].transform( | ||
lambda x: [x.iloc[0]] + [""] * (len(x) - 1) | ||
) | ||
queries_table["description"] = queries_table["description"].apply( | ||
lambda x: (x[:497] + "...") if len(x) > 500 else x | ||
) | ||
queries_table.to_markdown(index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is "optional" and nothing is actually even done with the markdown return value, this should probably be removed so pandas
and tabulate
aren't required to run the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akx Again, I think it's trying to show the user a real world example. The code sample doesn't use the Markdown explicitly but the output Markdown table is actually shown in the tutorial page. I think the doc text on the page could probably use some improvement but that's a separate PR on the docs repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_markdown()
returns the table as a string, so this would just do a lot of work to dataframe-ify things, group them, transform them... and then throw the result away. I really don't think it's a great idea to spend CI CPU cycles (and contribute to our planet burning in a minor way) for that.
I assume these are somehow extracted from something like a notebook session, where the value of the last expression might actually be shown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've added a return value to the function and then the main code prints that out when the function is called. The doc page shows the Markdown table as rendered, but it looks like it only shows the first query currently (this is an issue with the doc page, not the code sample - I'll fix that separately).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I'm not following. Why bother with that either? Is there something asserting on the output?
If there is, if the idea of these being here (and e.g. being run as part of CI?) is to test that "typical" Redis-utilizing code works, then it would break if pandas
or tabulate
decide to change their (human-oriented) output.
If there isn't, I don't see a point in doing the work to human-format data that no one will care about.
(On that note, running sentence-transformers
in CI also sounds like a bad idea, since it will have to download the transformers models, then run embedding computation on CPU, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might be misunderstanding what this code is for :-) The code samples in this folder are included in the docs (eg, https://redis.io/docs/latest/develop/get-started/vector-database/). Their purpose is to explain to the user how to do stuff with Redis and to supply bits of code for copy/paste. The reason for putting them in the client repo as tests is to check that the sample code works against the current version of the client, not to act as real world example tests for the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akx , these tests were living in a separate branch until now, this is a first step to put them as close as possible to the code they are documenting. Right now there is no CI in this repo for them, I guess we can add something in other PRs. The purpose so far is educational, and we want to decrease maintenance effort. I'll merge this as it is.
Move the code samples used for documentation into the master branch, so they sit next to the code they document.
Move the code samples used for documentation into the master branch, so they sit next to the code they document.
Move the code samples used for documentation into the master branch, so they sit next to the code they document.
Move the code samples used for documentation into the master branch, so they sit next to the code they document.
Pull Request check-list
Please make sure to review and check all of these items:
NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.
Description of change
The code samples for the main documentation pages are kept in this repo but they were previously in a separate branch (emb-examples). The plan is to move them to the master branch and run tests on them so they get checked frequently.