-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check interval range to avoid cases where year is inappropriately entered #16945
Comments
@asdf2014 , we already support validation of intervals:
Do you want to just filter out such records (which is already supported as listed above) or also raise an alert when an out-of-range record is encountered? |
Hi @kfaraz , Apache Druid certainly supports checking data dates. This proposal is about checking at the Task's Payload level because we have encountered errors in filling out intervals on business side, which led to reading a large amount of data from HDFS. It is not the same level of checking as what you mentioned 😅 |
I see, thanks for the clarification, @asdf2014 . So you want to add a validation on the input time interval while persisting the task payload itself. That said, it does make sense for an admin to allow users to perform only valid actions. To that effect, the admin could specify a property called say cc: @abhishekagarwal87 , what are your thoughts on such validations? |
Description
In Apache Druid, we need to support a new feature that can check the interval range to avoid cases where the year is inappropriately entered.
Specifically, when dealing with time data, there are instances where incorrect years are entered due to typos or other reasons. For example, entering the year as 20240 instead of 2024. These incorrect years can lead to significant deviations in data processing and analysis results, affecting the accuracy and reliability of the data.
To avoid such situations, we plan to add an interval range check feature in Apache Druid. This feature will allow users to set a reasonable range for years, such as from the year 2000 to 2100. During data input and processing, the system will automatically check whether the year falls within this range. If a year outside this range is detected, the system will issue a warning or error message, prompting the user to make corrections.
The implementation of this new feature will include the following steps:
By introducing this interval range check feature, we can effectively avoid data issues caused by incorrect year entries, enhancing the accuracy and reliability of data processing. This will provide users with higher quality data analysis services, ensuring that their decisions are based on accurate and error-free data.
The text was updated successfully, but these errors were encountered: