-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all values for pos in vcfRecord are strings #97
Comments
Hi @cmungall I think there is no bug here. The Phenopacket Schema defines VcfRecord as following: message VcfRecord {
string genome_assembly = 1;
string chrom = 2;
uint64 pos = 3;
string id = 4;
string ref = 5;
string alt = 6;
string qual = 7;
string filter = 8;
string info = 9;
} Note This is also why the JSON-schema of So, it is a bit odd, but I think |
I thought we agreed the documentation was the SoT? It clearly states positions are integers. IMO this is really odd. I'd strongly recommend merging this PR and before the current behavior becomes enshrined. For the protobuf users (surely a minority? Everyone uses json/jsonschema/pydantic) we can just have a minimal shim layer. |
Another option is to use |
@cmungall |
It’s a tough one… my belief is that it’s better to take the short term
churn now for the better long term result but I may be in the minority here…
…On Tue, May 28, 2024 at 4:59 AM Peter Robinson ***@***.***> wrote:
@cmungall <https://github.com/cmungall>
thanks for finding this. I am not sure what to do since the software is
now pretty entrenched and it is working in many places.
In any case, the place to fix it, if we want to, is not here, it would be
upstream in the library code! I think that given the software that we are
now using for about 5 other projects, this is probably something we might
need to live with, or is there some easy solution that will not cause other
issues? @ielis <https://github.com/ielis>
—
Reply to this email directly, view it on GitHub
<#97 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOKX72WU6FMRY4P44PLZERBPVAVCNFSM6AAAAABILT6462VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUGY4TKMJTGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I am not worried too much about our use cases - we can update our projects. I worry more about the public use and I think we should do what is allowed by the semantic versioning in order to fix errors and bugs. It is important to decide what is the SoT, I get it wrong all the time. Then, regardless of choosing RTD or protobuf, we have plenty issues that should be fixed, some of them waiting for resolution for almost 2 years. We should find a way how to "touch" the schema, this is what I perceive as a very important issue. Nothing in the software world is static. Nothing. In this particular case, I think that fixing the issue will be a "breaking" change. Either update the RTD to
To the best of my knowledge, In other words, running this: package org.phenopackets.schema.v2;
import com.google.protobuf.util.JsonFormat;
import org.ga4gh.vrsatile.v1.VcfRecord;
import org.junit.jupiter.api.Test;
public class VcfRecordPosTest {
@Test
public void printPos() throws Exception {
VcfRecord record = VcfRecord.newBuilder()
.setChrom("chr1")
.setPos(123_456)
.setRef("C")
.setAlt("G")
.build();
String json = JsonFormat.printer().print(record);
System.err.println(json);
}
} produces this: {
"chrom": "chr1",
"pos": "123456",
"ref": "C",
"alt": "G"
} |
------------------------------
Another option is to use uint32. This means we can't ever use phenopackets
for plant genomes, but I think that's fine.
To the best of my knowledge, uint32 is serialized to JSON in the same way
as uint32 - as a str. I ran a unit test in Java, checked the output, and
observed a str despite the Protobuf documentation
<https://protobuf.dev/programming-guides/proto3/#json> advertising that
int32, fixed32, uint32 are mapped to a number.
Thanks for checking, that is unfortunate
… Message ID: <monarch-initiative/phenopacket-store/issues/97/2135083968@
github.com>
|
We discussed this at length two years ago -- I think I saw the same thing when I was working on the JSON Schema. Our decision was basically that while it is not elegant, there is zero chance that anybody will actually treat the position as a string and so it is not a biggie. |
pos should be an int:
https://phenopacket-schema.readthedocs.io/en/latest/variant.html#vcfrecord
But all the values here are strings
(found by checking using linkml-validate)
E.g
The text was updated successfully, but these errors were encountered: