-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KV Data Loading: Logging #56
Comments
Yes for logic that process requests after the requests are decrypted, it requires certain protections For logic unrelated to processing requests, they are considered "safe" and logs/metrics can be exported as-is. Btw for data loading failures, we're interested to hear what you think the requirements are for handling row failures: other than skipping the row and logging/recording a metric, do you expect other error handling behaviors such as only committing a whole file or a group of rows in the file if all rows are successfully read? |
@peiwenhu interesting question indeeds, I'd say there's no one right answer there for a generic tool, allowing skipping and logging of bad rows I think will likely be important, some configureability seems warranted. Without logging it takes quite the Jedi to find bad rows. Given this data is onboarded by the ad tech, telling them which rows were rejected seems safe to me. For stopping the entire file or some other type of atomicity, you can definitely see both cases (i.e. some data sets if a row is bad you want to move on and not stop the train, for others it's real important to get some changeset atomically). In theory if you only support skip, clients can adjust to that with some costs. Also going to ping some other experts here @swapnilpandit and @truemike and others once I find their handles. |
For production use, will the application do full/normal logging specifically for data loading? I would this would be OK from a privacy perspective as the entity loading the data is the ad tech, so they can't gain any new information from data load metrics, and from an operational perspective it will happen that data loads fail for odd reasons and you'll want to get the "failed rows", failed reasons, etc.
The text was updated successfully, but these errors were encountered: