ieduplicates: datetime ID format #146

MRuzzante · 2018-08-30T20:17:32Z

When checking duplicates using starttime as ID, with the aim of capturing cases when an enumerator re-submits a survey with a different ID (starttime would be the same, but the ID would be different), the command gives an error saying

"One or several observations in the Excel report are no longer found in the data set".

I think this issue is related to the datetime format in Stata vs. Excel: I solved the issue by generating a new starttime float-type variable before running ieduplicates, and then I use the latter one as ID.

Any alternative way?

The text was updated successfully, but these errors were encountered:

luizaandrade · 2018-08-31T10:39:51Z

Thanks, Matteo! I encountered something similar, as noted in issue #103, but couldn't replicate it later. Since we've agreed that testing duplicates in time vars should be included in HFCs, we should find a way to fix this.

kbjarkefur · 2018-09-28T20:58:26Z

We will solve this by testing it the idvar specified is a date or time formatted variable. If not, then the command will behave as before. If it is date or time, then a new variable is created with the time display format saved as a string. This variable will be used to check for duplicates etc.

We will add a new option called generate() that will take the string that is the name of the variable created. This option is required with the idvar is a time or date variable, and not allowed when it is not. This will bring the users attention to the fact that we are creating and running the command on string copy of the variable used.

Treat as regular dup and let user solve with iecompdup

This is the fix to the orignal issue. Unique vars may no longer be time vars. Remove precision option. Merge report to oringal data only on unique vars

simplify, document better, better naming conventions of tempvars

naming conventions idvar, typo in srgumentvars, use missing() when possible, clearer tempfile names

kbjarkefur · 2018-10-12T15:35:07Z

I ended up solving this differently and then ended up doing a pretty significant re-write of the command.

The solution to the original issue was to only merge the report on unique vars and not allowing them to be time vars. This fits the use case of testing for duplicates in starttime that I did not have in mind when this command was first written. To the user the only change is this new requirement on the unique vars and that the option minprecision() is removed. This was an already existing work around to the original issue, but no-one found it so it cannot have been intuitive, and it was just work around and not a solution.

One other change is that the command does no longer automatically drop observations that are duplicates in all variables in the data set. These cases are now treated as a regular duplicate that the user will have to chose how to solve.

Most of the other re-writes was just to update this command based on experiences and best practices we have developed while working on ietoolkit. Nothing of this should change anything to how the user interact with this command. These re-writes will make future updates easier, especially to someone other than me.

Issue #146

luizaandrade · 2018-10-15T17:28:27Z

I have tested the new version of the command on old do-files to check backward compatibility. All results were replicated without errors.

kbjarkefur · 2018-10-15T17:29:25Z

Issue will be closed on once update is published on SSC

Version 6.0 - merge from Develop Addressing issue #135, , #137, #139, #141, #142. #145, #146, #153. #158 and partially addressing #152.

luizaandrade added the minor bug Bug unlikely to lead to incorrect analysis label Aug 31, 2018

kbjarkefur self-assigned this Sep 28, 2018

kbjarkefur added a commit that referenced this issue Oct 12, 2018

ieduplicates - better string/num prectice #146

13929e3

kbjarkefur added a commit that referenced this issue Oct 12, 2018

iedup - dont worry about alldup #146

d5e86fb

Treat as regular dup and let user solve with iecompdup

kbjarkefur added a commit that referenced this issue Oct 12, 2018

iedup - more coherent intro section #146

6ee4610

kbjarkefur added a commit that referenced this issue Oct 12, 2018

iedup - fix to #146 - merge only on uniquevars

5ee864e

This is the fix to the orignal issue. Unique vars may no longer be time vars. Remove precision option. Merge report to oringal data only on unique vars

kbjarkefur added a commit that referenced this issue Oct 12, 2018

iedup - re-write report input test section #146

69b988f

simplify, document better, better naming conventions of tempvars

kbjarkefur added a commit that referenced this issue Oct 12, 2018

iedup - naming conventions and missing best practices #146

cc561fa

naming conventions idvar, typo in srgumentvars, use missing() when possible, clearer tempfile names

kbjarkefur added a commit that referenced this issue Oct 12, 2018

iedup - comments and sec numbering - #146

bc43793

kbjarkefur added a commit that referenced this issue Oct 12, 2018

iedup - issue #146 helpfile update

40409b7

kbjarkefur added the resolved but not yet published Issue is fixed, but not yet published on SSC label Oct 12, 2018

kbjarkefur mentioned this issue Oct 15, 2018

Issue #146 #154

Merged

kbjarkefur added a commit that referenced this issue Oct 15, 2018

Merge pull request #154 from worldbank/issue-146

5be71e3

Issue #146

kbjarkefur added a commit that referenced this issue Oct 22, 2018

Merge pull request #164 from worldbank/develop

e4ee5f4

Version 6.0 - merge from Develop Addressing issue #135, , #137, #139, #141, #142. #145, #146, #153. #158 and partially addressing #152.

kbjarkefur closed this as completed Oct 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ieduplicates: datetime ID format #146

ieduplicates: datetime ID format #146

MRuzzante commented Aug 30, 2018

luizaandrade commented Aug 31, 2018

kbjarkefur commented Sep 28, 2018

kbjarkefur commented Oct 12, 2018

luizaandrade commented Oct 15, 2018

kbjarkefur commented Oct 15, 2018

ieduplicates: datetime ID format #146

ieduplicates: datetime ID format #146

Comments

MRuzzante commented Aug 30, 2018

luizaandrade commented Aug 31, 2018

kbjarkefur commented Sep 28, 2018

kbjarkefur commented Oct 12, 2018

luizaandrade commented Oct 15, 2018

kbjarkefur commented Oct 15, 2018