Skip to content
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.

Do we need the GA prefix on all objects? #87

Closed
cassiedoll opened this issue Jul 2, 2014 · 37 comments
Closed

Do we need the GA prefix on all objects? #87

cassiedoll opened this issue Jul 2, 2014 · 37 comments

Comments

@cassiedoll
Copy link
Member

All of our Avro files are namespaced under ga4gh, so a read alignment is currently org.ga4gh.GAReadAlignment

So the GA seems a little redundant from a packaging perspective, and is making all of the code/docs a bit harder to read (and type :)

What does everyone think about removing GA from everything?
We'd then have org.ga4gh.ReadAlignment

@massie
Copy link
Member

massie commented Jul 2, 2014

One thing to consider. Having a prefix on each object reduces the chance of namespace collisions.

Developers will likely build on more than just the GA4GH libraries and use others as well. Those other libraries may have an object called e.g. ReadAlignment. In that case, you will need to differentiate between the two using the full-path.

@fnothaft
Copy link
Contributor

fnothaft commented Jul 2, 2014

-0, the advantage of removing the GA prefix is that you remove redundancy and simplify the class name, but the disadvantage is that you can run into overlapping class names from multiple packages the implement similar functionality (e.g., net.sourceforge.samtools.SAMRecord vs. org.bdgenomics.formats.ADAMRecord).

@cassiedoll
Copy link
Member Author

I see. In proto you have to fully qualify anything not in the same package. So you'd have to type out org.ga4gh.x no matter what.

@fnothaft
Copy link
Contributor

fnothaft commented Jul 2, 2014

It depends on how you're importing it in each language; if you imported the records in C++, you'd just namespace them, or in Scala you'd just rename on import:

import org.ga4gh.{AlignedRead => GAAlignedRead}
import someone.elses.api.{AlignedRead => OtherAlignedRead}

It's not an insurmountable issue, but it's a bit frustrating to have your interop code be inconsistent with the rest of your codebase because you need to use full namespace paths/etc to refer to classes.

@cassiedoll
Copy link
Member Author

btw - some of the API providers will still be outputting json - will you be able to load those into your Avro objects easily?

@fnothaft
Copy link
Contributor

fnothaft commented Jul 2, 2014

@cassiedoll I believe you can create Avro directly from JSON in both Java and C++. I'm not sure about the other avro language bindings.

@cassiedoll
Copy link
Member Author

excellent - thanks frank!

@vadimzalunin
Copy link

-1 because of 1) auto-generated code and 2) name collisions.

@jeromekelleher
Copy link
Contributor

Avoiding namespace collisions is important, but wouldn't this be relatively rare? If the common case we expect is not to have conflicting namespaces in end-user code, then it seems a bit heavy-handed to force the GA prefix on everyone. How hard is it to work around namespace conflicts in any case? In Python, it would be easy to sort out using "import X as Y" or whatever.

@adamnovak
Copy link
Contributor

IMHO adding prefixes to the names when there is a namespace collision will
probably hinder code readability. It's relatively hard to grasp that what's
a Record everywhere else is a GARecord in this particular file, and it's
worse if some files use Record for our record and some other files in the
same project use Record for the conflicting record type.

On Mon, Jul 7, 2014 at 1:30 AM, Jerome Kelleher notifications@github.com
wrote:

Avoiding namespace collisions is important, but wouldn't this be
relatively rare? If the common case we expect is not to have conflicting
namespaces in end-user code, then it seems a bit heavy-handed to force the
GA prefix on everyone. How hard is it to work around namespace conflicts in
any case? In Python, it would be easy to sort out using "import X as Y" or
whatever.


Reply to this email directly or view it on GitHub
#87 (comment).

@cassiedoll
Copy link
Member Author

Maybe I'm just too used to Google's style - we never prefix anything :)

@pgrosu
Copy link
Contributor

pgrosu commented Jul 7, 2014

IMHO collisions should rarely happen with well-written code. In any case, there is @namespace() if one would like specify the namespace. Just my $0.02 :)

@fnothaft
Copy link
Contributor

fnothaft commented Jul 7, 2014

IMHO collisions should rarely happen with well-written code. In any case, there is @namespace() if one would like specify the namespace. Just my $0.02 :)

@pgrosu conceptually, I agree. However, if we're exporting data into/out of the API, we may need to interoperate with other libraries that also use the *Record name (e.g., net.sf.samtools.SAMRecord, org.bdgenomics.format.avro.ADAMRecord).

@pgrosu
Copy link
Contributor

pgrosu commented Jul 7, 2014

Thanks @fnothaft for the clarification - now that makes more sense regarding the direction we want to head toward.

@jeromekelleher
Copy link
Contributor

@fnothaft wouldn't it be better to fully qualify the class references in the (surely relatively rare?) cases when there is a namespace conflict, than to force all code to use a prefix? I think the common case should be as easy and intuitive as possible - I certainly found the GA prefixes confusing and unnecessary when I first looked at the API. Perhaps I'm misunderstanding what the common case will be though?

@pd3
Copy link

pd3 commented Jul 8, 2014

@jeromekelleher +1

@heuermh
Copy link
Contributor

heuermh commented Jul 16, 2014

For what it is worth, I am also in favor of this (although I don't have voting authority); see e.g. pull request #94.

@dglazer
Copy link
Member

dglazer commented Jul 16, 2014

+1 -- I think the benefits of a more readable API for everyone outweigh the costs of occasionally, in some programming contexts, needing to do explicit namespacing.

@benedictpaten
Copy link
Contributor

+1 -- agree benefits (far) outweigh the costs. This is not C, we have well
defined namespaces!

On Wed, Jul 16, 2014 at 6:28 PM, David Glazer notifications@github.com
wrote:

+1 -- I think the benefits of a more readable API for everyone outweigh
the costs of occasionally, in some programming contexts, needing to do
explicit namespacing.


Reply to this email directly or view it on GitHub
#87 (comment).

@cassiedoll cassiedoll added this to the vNext milestone Jul 23, 2014
@cassiedoll
Copy link
Member Author

We have no consensus here, so I'm closing this issue.
(We can always revisit it later if we like)

@jeromekelleher
Copy link
Contributor

After discussing this with @fnothaft, @massie and @vadimzalunin it seems we might be a bit closer to consensus on this - perhaps we could have another vote? It would be good to discuss in person in San Diego if we are still divided on the topic.

I'm +1 for removing the prefix.

@richarddurbin
Copy link
Contributor

I would be OK with removing the prefix if that is the consensus. So a cautious +1.

Richard

On 15 Oct 2014, at 01:41, Jerome Kelleher notifications@github.com wrote:

After discussing this with @fnothaft, @massie and @vadimzalunin it seems we might be a bit closer to consensus on this - perhaps we could have another vote? It would be good to discuss in person in San Diego if we are still divided on the topic.

I'm +1 for removing the prefix.


Reply to this email directly or view it on GitHub.

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

@pgrosu
Copy link
Contributor

pgrosu commented Oct 15, 2014

@jeromekelleher - Wait, you convinced @fnothaft and @massie regarding this? :) So if we have multiple libraries with the same object names, then how do we differentiate between them?

@lh3
Copy link
Member

lh3 commented Oct 15, 2014

I mildly prefer to keep the GA prefix when Avro itself does not have a concept of namespace (if I am right). When I read code, the prefix immediately tells me the source of the APIs, which to me is useful. Nonetheless, I don't mind if the consensus is to remove the prefix. -0

@diekhans
Copy link
Contributor

Conceptually, I like removing the prefix. There is a concerned
about how this will impact code generation for languages that
don't support namespaces, such as C.

@mbaudis
Copy link
Member

mbaudis commented Oct 15, 2014

There is some advantage for keeping it in metadata, where the GA will unanimously point to our interpretation of value range in this context (see the lengthy GAsex related exchanges)

I understand that this is different for reads/file formats, where the contributions are meant even more general (GA or not).

Michael

On 15 Oct 2014, at 15:54, Heng Li notifications@github.com wrote:

I mildly prefer to keep the GA prefix when Avro itself does not have a concept of namespace (if I am right). When I read code, the prefix immediately tells me the source of the APIs, which to me is useful. Nonetheless, I don't mind if the consensus is to remove the prefix. -0


Reply to this email directly or view it on GitHub.

@fnothaft
Copy link
Contributor

+1

@pgrosu

@jeromekelleher - Wait, you convinced @fnothaft and @massie regarding this? :) So if we have multiple libraries with the same object names, then how do we differentiate between them?

We've de-prefixed ADAM over the past few months: bigdatagenomics/adam#418, bigdatagenomics/adam#342...

@delagoya
Copy link
Contributor

+1 for removing most prefixes, but I do agree that some of the metadata records can stand to keep it so we stay out of developing and clashing with Ontologies.

@vadimzalunin
Copy link

0.1 and 0.5 proved prefixes less useful the I anticipated before, so changing mine -1 to +0.

@jeromekelleher
Copy link
Contributor

It seems that a lot of people are in favour of removing the GA prefix, and no one is strongly against the idea. The prefix is certainly useful in some situations, but I think that our overriding concern should be for end-users writing client code. In this case, we need to ensure that the API is as easy as possible to learn and understand, and the GA prefix is a major obstacle to this in my view.

@mbaudis and @delagoya - I don't quite follow why metadata is different. Why is org.ga4gh.GASex better than org.ga4gh.Sex (for example)?

@pgrosu
Copy link
Contributor

pgrosu commented Oct 22, 2014

@jeromekelleher, I think it comes down to the idea that if a set of all possible attributes for a class in GA4GH becomes a superset - and its has specific properties in the way it was defined - where the majority of users might only use a subset of the same class. Then the superset sort of becomes its own class.

I think this discussion ( https://github.com/cassiedoll/schemas/commit/51cd6ece5c83fc8076959a3be997190303718170#commitcomment-7663742 ) - as one of several that @mbaudis alluded to - helps to illustrate that. I think with some classes, we're working on first defining the class, attributes and their interpretation (including their relation to other classes), in order to collect all the variations under those definitions, which strives to behave as sets of controlled vocabularies.

@mbaudis
Copy link
Member

mbaudis commented Oct 22, 2014

@pgrosu Well said. While the domain points to the technical definition of the attribute, the prefix makes it more obvious that the contained values may need some consideration.

That said, the "GA" here is of a more psychological nature and technically not necessary if the context is provided through the org.ga4gh domain. I am not against removing it; there always will be trade-offs ... And we still can change the names of the attributes (org.ga4gh.geneticSexAsWeSeeIt) ;-)

@delagoya
Copy link
Contributor

+1
Things that can be misunderstood as "GA4GH is developing their own Ontology" should be prefixed to make it clear that this we are not attempting to do so.

On Oct 22, 2014, at 3:57 AM, Paul Grosu notifications@github.com wrote:

@jeromekelleher, I think it comes down to the idea that if a set of all possible attributes for a class in GA4GH becomes a superset - and its has specific properties in the way it was defined - where the majority of users might only use a subset of the same class. Then the superset sort of becomes its own class.

I think this discussion ( cassiedoll@51cd6ec#commitcomment-7663742 ) - as one of several that @mbaudis alluded to - helps to illustrate that. I think with some classes, we're working on first defining the class, attributes and their interpretation (including their relation to other classes), in order to collect all the variations under those definitions, which strives to behave as sets of controlled vocabularies.


Reply to this email directly or view it on GitHub.

@pgrosu
Copy link
Contributor

pgrosu commented Oct 22, 2014

+1, @mbaudis, Thank you and I agree :)

+1, @delagoya, I think we should be fine as a long as we mention that the focus of GA4GH is to enable a standard for sharing sequencing and associated clinical information (meta)data, which requires accommodating a large number of possibilities.

@jeromekelleher
Copy link
Contributor

@mbaudis, @delagoya, OK, this seems sensible. Perhaps we should make a PR without all the GAs and see how things look? We can add some back if it looks like we are trying to be too all-encompassing with the definitions.

This would be a pretty disruptive change, so perhaps we should wait until some of the outstanding PRs are merged?

@cassiedoll
Copy link
Member Author

Let's definitely make a PR - it looks like everybody is now on board. I think we're always going to have conflicting PRs, so we don't have to really wait. Whoever has free time can pick this up :)

@adamnovak
Copy link
Contributor

Closing this since a fix has been merged.

dcolligan pushed a commit to dcolligan/ga4gh-schemas that referenced this issue Jul 20, 2016
Removed duplicate test, changed behavior of another to look for empty list in response.
dcolligan pushed a commit to dcolligan/ga4gh-schemas that referenced this issue Jul 20, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests