-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not index large case property values #30009
Conversation
I did not include a migration for this to be automatically applied on deploy, as I was not sure what our recommended process was. I think that is safe and easy, but so is running it manually, and if we do have to rollback, the revert would be cleaner without a migration. Open to suggestions about what others prefer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will still be stored in the document, so will be retrieved when a case matches another query
Have you confirmed this with testing?
I did not include a migration for this to be automatically applied on deploy, as I was not sure what our recommended process was. I think that is safe and easy, but so is running it manually, and if we do have to rollback, the revert would be cleaner without a migration. Open to suggestions about what others prefer
In the past we've always done these kind of changes manually. I could see an argument for doing it automatically for non-destructive changes.
@@ -34,7 +34,8 @@ | |||
"exact": { | |||
"index": "not_analyzed", | |||
"type": "string", | |||
"null_value": "" | |||
"null_value": "", | |||
"ignore_above": 8191 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you also need it on the main value
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so. Specifically the error message referenced this field, not case properties as a whole Document contains at least one immense term in field="case_properties.value.exact"
, which makes sense because each field is indexed separately rather than as a whole, so it's specifically this string one that is having the issue, but as with the other question above, I will make sure to confirm on staging.
Confirmed that I can save both 10k character properties (greater than the setting, less than the lucene limit) and 40k character properties (above the lucene limit), and it appears in the doc shown in I am offline after tomorrow, so I am not going to deploy this, although I think it is good to go if someone else would like to push it forward. I think the sooner we deploy this the better, since in the meantime, relevant cases will not be updated in this index. |
I'll roll this out today |
Update on prod successful. |
@calellowitz I think this deserves a commcare-cloud changelog entry |
Summary
Followup from https://dimagi-dev.atlassian.net/browse/SAAS-12367
This change will cause the case search index not to index case property values that are over 8191 characters. They will still be stored in the document, so will be retrieved when a case matches another query, but cannot be searched. This is due to a limit in lucene that indexed fields cannot exceed 32766 bytes. Given how case search is used (equality searching only), this is extremely unlikely to affect actual usage, since it would require someone inputting 8000 character exact matches in the application.
8192 was chosen because it is the max length allowed if all characters were 4 byte utf-8 characters, as recommended by ES.
ES docs on the setting https://www.elastic.co/guide/en/elasticsearch/reference/2.4/ignore-above.html
Safety Assurance
Safety story
This is a simple settings change that will not affect any existing data, and can be applied with zero downtime.
Rollback instructions
To rollback this change, the PR must be reverted and then
cchq <env> django-manage update_es_mapping case_search
must be run on all environments on which it was deployed.