Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepdocs: Documents are skipped on redeployment #1779

Open
1 of 4 tasks
phoevos opened this issue Jul 2, 2024 · 2 comments
Open
1 of 4 tasks

prepdocs: Documents are skipped on redeployment #1779

phoevos opened this issue Jul 2, 2024 · 2 comments
Labels
open issue A validated issue that should be tackled. Comment if you'd like it assigned to you.

Comments

@phoevos
Copy link

phoevos commented Jul 2, 2024

Due to the change introduced here prepdocs generates an .md5 file locally with an MD5 hash of each file that gets uploaded. Whenever the prepdocs script is re-run, that hash is checked against the current hash and the file is skipped if it hasn't changed.

This can be an issue when redeploying in a new environment (e.g. running azd up inside the same local dir, but for a new target deployment); although the index is empty, the script detects the local .md5 files and, consequently, skips the upload.

This issue is for a: (mark with an x)

  • bug report
  • feature request
  • documentation issue or request
  • regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

  1. azd up
  2. azd down
  3. azd up
  4. The newly deployed AI Search service is empty

Expected/desired behavior

I understand this was meant to be an optimisation, but the hash files alone can't be used to determine whether the documents exist on the remote. A simple workaround I've implemented locally (that doesn't involve querying the index) would be to add a new flag (e.g. --forceupload) to enable ignoring the local .md5 files; at the very least, enabling the optimisation should be configurable.

Alternatively, the docs could include a warning regarding this scenario and a simple cleanup command, e.g.

find data -type f -name "*.md5" -delete

azd version?

azd version 1.9.3 (commit e1624330dcc7dde440ecc1eda06aac40e68aa0a3)
phoevos added a commit to ucl-contoso-chat/ucl-openai-search that referenced this issue Jul 2, 2024
Add a new CLI flag ('--forceupload') to prepdocs.py that will force the
upload of the document to the index, even if the '.md5' file is present
locally.

Refs Azure-Samples#1779

Signed-off-by: Phoevos Kalemkeris <phoevos@protonmail.com>
@phoevos
Copy link
Author

phoevos commented Jul 9, 2024

Bumped into this relevant PR, looks like it was abandoned:

@pamelafox
Copy link
Collaborator

Yes, that's the relevant PR. See my most recent comment on improvements needed to it. We'd still like to incorporate that change, with the improvements I mentioned. I haven't had time to make them myself, but if you do, I'm happy to review an updated version.

@pamelafox pamelafox added the open issue A validated issue that should be tackled. Comment if you'd like it assigned to you. label Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open issue A validated issue that should be tackled. Comment if you'd like it assigned to you.
Projects
None yet
Development

No branches or pull requests

2 participants