Skip to content
This repository has been archived by the owner on Dec 18, 2023. It is now read-only.

Make ts copy support sparse ids #381

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

Make ts copy support sparse ids #381

wants to merge 1 commit into from

Conversation

forus
Copy link
Contributor

@forus forus commented Jan 9, 2019

To be able to load current test_data.

This PR removes the requirement to the source data to specify ids in accordance with row number (row - 2 when we count with the header).

SIDE CHANGES:

  • Use long instead of integer type to represent source data ids (aka index)

TODO:

  • add a check that neither source id (aka index), nor result db id are not null. Add this check on lookup and before setting a new value.
  • add check for duplicates. Smth like:
if (indexToPatientNum.containsKey(patientIndex)) {
	throw new IllegalStateException("Patient with patient_num=${patientIndex} has bee already inserted")
}

@forus forus requested a review from gijskant January 9, 2019 10:29
@gijskant
Copy link
Contributor

gijskant commented Jan 9, 2019

Where does this change come from? It was explicitly designed to use subsequent numbers, e.g., to prevent mistakes, like missing data records.
I'm missing a rationale and a ticket that describes the issue that this resolves.

@forus
Copy link
Contributor Author

forus commented Jan 9, 2019

@gijskant There is TMT-242 issue.

This changes assist uploading studies with negative identifiers in test_data/studies folder with transmart-copy and intended to come next after this PR https://github.com/thehyve/transmart-core/pull/382/files#diff-704dbcd2e8f8b25cc213b29a52a9f1c1R3. Of course, we could write a single-use script to regenerate ids of that studies instead. But I am not convinced that following a strict order of ids in the source file would help to avoid any practical mistakes. And I imagine it could be useful to make ts-copy work on sparse ids as soon as fk columns point to valid ids .e.g. Think of pg COPY dump of database table data to tsv file.

@gijskant
Copy link
Contributor

gijskant commented Jan 9, 2019

Think about checking for duplicates.
Sorry, but I really don't see a use for sparse id's here. These numbers are supposed to be only an index. Let's not give people the impression that they can load whatever id's they like. Id's are generated by the system for you.

@forus
Copy link
Contributor Author

forus commented Jan 10, 2019

@gijskant Thank you for duplicates example.
checking for duplicates could and has to be done with the same amount of efforts for map-based approach as well of course.

if (indexToPatientNum.containsKey(patientIndex)) {
	throw new IllegalStateException("Patient with patient_num=${patientIndex} has already been inserted")
}

I've added it to TODO list of this PR.

Sorry, but I just don't understand what do we gain from having this constraint.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants