Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ieduplicates: datetime ID format #146

Closed
MRuzzante opened this issue Aug 30, 2018 · 5 comments
Closed

ieduplicates: datetime ID format #146

MRuzzante opened this issue Aug 30, 2018 · 5 comments
Assignees
Labels
minor bug Bug unlikely to lead to incorrect analysis resolved but not yet published Issue is fixed, but not yet published on SSC

Comments

@MRuzzante
Copy link
Contributor

When checking duplicates using starttime as ID, with the aim of capturing cases when an enumerator re-submits a survey with a different ID (starttime would be the same, but the ID would be different), the command gives an error saying

"One or several observations in the Excel report are no longer found in the data set".

I think this issue is related to the datetime format in Stata vs. Excel: I solved the issue by generating a new starttime float-type variable before running ieduplicates, and then I use the latter one as ID.

Any alternative way?

@luizaandrade
Copy link
Collaborator

Thanks, Matteo! I encountered something similar, as noted in issue #103, but couldn't replicate it later. Since we've agreed that testing duplicates in time vars should be included in HFCs, we should find a way to fix this.

@luizaandrade luizaandrade added the minor bug Bug unlikely to lead to incorrect analysis label Aug 31, 2018
@kbjarkefur kbjarkefur self-assigned this Sep 28, 2018
@kbjarkefur
Copy link
Contributor

We will solve this by testing it the idvar specified is a date or time formatted variable. If not, then the command will behave as before. If it is date or time, then a new variable is created with the time display format saved as a string. This variable will be used to check for duplicates etc.

We will add a new option called generate() that will take the string that is the name of the variable created. This option is required with the idvar is a time or date variable, and not allowed when it is not. This will bring the users attention to the fact that we are creating and running the command on string copy of the variable used.

kbjarkefur added a commit that referenced this issue Oct 12, 2018
Treat as regular dup and let user solve with iecompdup
kbjarkefur added a commit that referenced this issue Oct 12, 2018
This is the fix to the orignal issue. Unique vars may no longer be time vars.  Remove precision option. Merge report to oringal data only on unique vars
kbjarkefur added a commit that referenced this issue Oct 12, 2018
simplify, document better, better naming conventions of tempvars
kbjarkefur added a commit that referenced this issue Oct 12, 2018
naming conventions idvar, typo in srgumentvars, use missing() when possible, clearer tempfile names
@kbjarkefur
Copy link
Contributor

I ended up solving this differently and then ended up doing a pretty significant re-write of the command.

The solution to the original issue was to only merge the report on unique vars and not allowing them to be time vars. This fits the use case of testing for duplicates in starttime that I did not have in mind when this command was first written. To the user the only change is this new requirement on the unique vars and that the option minprecision() is removed. This was an already existing work around to the original issue, but no-one found it so it cannot have been intuitive, and it was just work around and not a solution.

One other change is that the command does no longer automatically drop observations that are duplicates in all variables in the data set. These cases are now treated as a regular duplicate that the user will have to chose how to solve.

Most of the other re-writes was just to update this command based on experiences and best practices we have developed while working on ietoolkit. Nothing of this should change anything to how the user interact with this command. These re-writes will make future updates easier, especially to someone other than me.

kbjarkefur added a commit that referenced this issue Oct 12, 2018
@kbjarkefur kbjarkefur added the resolved but not yet published Issue is fixed, but not yet published on SSC label Oct 12, 2018
@kbjarkefur kbjarkefur mentioned this issue Oct 15, 2018
kbjarkefur added a commit that referenced this issue Oct 15, 2018
@luizaandrade
Copy link
Collaborator

I have tested the new version of the command on old do-files to check backward compatibility. All results were replicated without errors.

@kbjarkefur
Copy link
Contributor

Issue will be closed on once update is published on SSC

kbjarkefur added a commit that referenced this issue Oct 22, 2018
Version 6.0 - merge from Develop

Addressing issue #135, , #137, #139, #141, #142. #145, #146, #153. #158 and partially addressing #152.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor bug Bug unlikely to lead to incorrect analysis resolved but not yet published Issue is fixed, but not yet published on SSC
Projects
None yet
Development

No branches or pull requests

3 participants