You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to the change introduced hereprepdocs generates an .md5 file locally with an MD5 hash of each file that gets uploaded. Whenever the prepdocs script is re-run, that hash is checked against the current hash and the file is skipped if it hasn't changed.
This can be an issue when redeploying in a new environment (e.g. running azd up inside the same local dir, but for a new target deployment); although the index is empty, the script detects the local .md5 files and, consequently, skips the upload.
This issue is for a: (mark with an x)
bug report
feature request
documentation issue or request
regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
azd up
azd down
azd up
The newly deployed AI Search service is empty
Expected/desired behavior
I understand this was meant to be an optimisation, but the hash files alone can't be used to determine whether the documents exist on the remote. A simple workaround I've implemented locally (that doesn't involve querying the index) would be to add a new flag (e.g. --forceupload) to enable ignoring the local .md5 files; at the very least, enabling the optimisation should be configurable.
Alternatively, the docs could include a warning regarding this scenario and a simple cleanup command, e.g.
find data -type f -name "*.md5" -delete
azd version?
azd version 1.9.3 (commit e1624330dcc7dde440ecc1eda06aac40e68aa0a3)
The text was updated successfully, but these errors were encountered:
phoevos
added a commit
to ucl-contoso-chat/ucl-openai-search
that referenced
this issue
Jul 2, 2024
Add a new CLI flag ('--forceupload') to prepdocs.py that will force the
upload of the document to the index, even if the '.md5' file is present
locally.
Refs Azure-Samples#1779
Signed-off-by: Phoevos Kalemkeris <phoevos@protonmail.com>
Yes, that's the relevant PR. See my most recent comment on improvements needed to it. We'd still like to incorporate that change, with the improvements I mentioned. I haven't had time to make them myself, but if you do, I'm happy to review an updated version.
pamelafox
added
the
open issue
A validated issue that should be tackled. Comment if you'd like it assigned to you.
label
Jul 10, 2024
Due to the change introduced here
prepdocs
generates an.md5
file locally with an MD5 hash of each file that gets uploaded. Whenever the prepdocs script is re-run, that hash is checked against the current hash and the file is skipped if it hasn't changed.This can be an issue when redeploying in a new environment (e.g. running
azd up
inside the same local dir, but for a new target deployment); although the index is empty, the script detects the local.md5
files and, consequently, skips the upload.This issue is for a: (mark with an
x
)Minimal steps to reproduce
azd up
azd down
azd up
Expected/desired behavior
I understand this was meant to be an optimisation, but the hash files alone can't be used to determine whether the documents exist on the remote. A simple workaround I've implemented locally (that doesn't involve querying the index) would be to add a new flag (e.g.
--forceupload
) to enable ignoring the local.md5
files; at the very least, enabling the optimisation should be configurable.Alternatively, the docs could include a warning regarding this scenario and a simple cleanup command, e.g.
find data -type f -name "*.md5" -delete
azd version?
The text was updated successfully, but these errors were encountered: