-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ieduplicates: datetime ID format #146
Comments
Thanks, Matteo! I encountered something similar, as noted in issue #103, but couldn't replicate it later. Since we've agreed that testing duplicates in time vars should be included in HFCs, we should find a way to fix this. |
We will solve this by testing it the idvar specified is a date or time formatted variable. If not, then the command will behave as before. If it is date or time, then a new variable is created with the time display format saved as a string. This variable will be used to check for duplicates etc. We will add a new option called generate() that will take the string that is the name of the variable created. This option is required with the idvar is a time or date variable, and not allowed when it is not. This will bring the users attention to the fact that we are creating and running the command on string copy of the variable used. |
Treat as regular dup and let user solve with iecompdup
This is the fix to the orignal issue. Unique vars may no longer be time vars. Remove precision option. Merge report to oringal data only on unique vars
simplify, document better, better naming conventions of tempvars
naming conventions idvar, typo in srgumentvars, use missing() when possible, clearer tempfile names
I ended up solving this differently and then ended up doing a pretty significant re-write of the command. The solution to the original issue was to only merge the report on unique vars and not allowing them to be time vars. This fits the use case of testing for duplicates in starttime that I did not have in mind when this command was first written. To the user the only change is this new requirement on the unique vars and that the option One other change is that the command does no longer automatically drop observations that are duplicates in all variables in the data set. These cases are now treated as a regular duplicate that the user will have to chose how to solve. Most of the other re-writes was just to update this command based on experiences and best practices we have developed while working on ietoolkit. Nothing of this should change anything to how the user interact with this command. These re-writes will make future updates easier, especially to someone other than me. |
I have tested the new version of the command on old do-files to check backward compatibility. All results were replicated without errors. |
Issue will be closed on once update is published on SSC |
When checking duplicates using
starttime
as ID, with the aim of capturing cases when an enumerator re-submits a survey with a different ID (starttime
would be the same, but the ID would be different), the command gives an error sayingI think this issue is related to the datetime format in Stata vs. Excel: I solved the issue by generating a new
starttime
float-type variable before runningieduplicates
, and then I use the latter one as ID.Any alternative way?
The text was updated successfully, but these errors were encountered: