Make ts copy support sparse ids #381

forus · 2019-01-09T10:28:56Z

To be able to load current test_data.

This PR removes the requirement to the source data to specify ids in accordance with row number (row - 2 when we count with the header).

SIDE CHANGES:

Use long instead of integer type to represent source data ids (aka index)

TODO:

add a check that neither source id (aka index), nor result db id are not null. Add this check on lookup and before setting a new value.
add check for duplicates. Smth like:

if (indexToPatientNum.containsKey(patientIndex)) {
	throw new IllegalStateException("Patient with patient_num=${patientIndex} has bee already inserted")
}

gijskant · 2019-01-09T10:39:13Z

Where does this change come from? It was explicitly designed to use subsequent numbers, e.g., to prevent mistakes, like missing data records.
I'm missing a rationale and a ticket that describes the issue that this resolves.

forus · 2019-01-09T17:28:12Z

@gijskant There is TMT-242 issue.

This changes assist uploading studies with negative identifiers in test_data/studies folder with transmart-copy and intended to come next after this PR https://github.com/thehyve/transmart-core/pull/382/files#diff-704dbcd2e8f8b25cc213b29a52a9f1c1R3. Of course, we could write a single-use script to regenerate ids of that studies instead. But I am not convinced that following a strict order of ids in the source file would help to avoid any practical mistakes. And I imagine it could be useful to make ts-copy work on sparse ids as soon as fk columns point to valid ids .e.g. Think of pg COPY dump of database table data to tsv file.

gijskant · 2019-01-09T23:18:09Z

Think about checking for duplicates.
Sorry, but I really don't see a use for sparse id's here. These numbers are supposed to be only an index. Let's not give people the impression that they can load whatever id's they like. Id's are generated by the system for you.

forus · 2019-01-10T11:15:02Z

@gijskant Thank you for duplicates example.
checking for duplicates could and has to be done with the same amount of efforts for map-based approach as well of course.

if (indexToPatientNum.containsKey(patientIndex)) {
	throw new IllegalStateException("Patient with patient_num=${patientIndex} has already been inserted")
}

I've added it to TODO list of this PR.

Sorry, but I just don't understand what do we gain from having this constraint.

forus requested a review from gijskant January 9, 2019 10:29

forus force-pushed the map_custom_ids branch from 2c0931a to e849a9c Compare January 9, 2019 10:34

Make ts copy support arbitrary ids

0106137

forus force-pushed the map_custom_ids branch from e849a9c to 0106137 Compare January 9, 2019 16:38

gijskant force-pushed the dev branch from 05f3131 to e03d378 Compare June 5, 2020 20:19

ewelinagr force-pushed the dev branch 3 times, most recently from 93183e7 to b651751 Compare August 16, 2022 07:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ts copy support sparse ids #381

Make ts copy support sparse ids #381

forus commented Jan 9, 2019 •

edited

Loading

gijskant commented Jan 9, 2019 •

edited

Loading

forus commented Jan 9, 2019

gijskant commented Jan 9, 2019

forus commented Jan 10, 2019

Make ts copy support sparse ids #381

Are you sure you want to change the base?

Make ts copy support sparse ids #381

Conversation

forus commented Jan 9, 2019 • edited Loading

gijskant commented Jan 9, 2019 • edited Loading

forus commented Jan 9, 2019

gijskant commented Jan 9, 2019

forus commented Jan 10, 2019

forus commented Jan 9, 2019 •

edited

Loading

gijskant commented Jan 9, 2019 •

edited

Loading