-
Notifications
You must be signed in to change notification settings - Fork 111
Testing the schema with data on Mesa at Google #131
Comments
Paul, Mesa is a close source. Max On Sun, Aug 24, 2014 at 1:20 PM, Paul Grosu notifications@github.com
Maxim Mikheev M.D. Ph.D. www.BioDatomics.com http://www.biodatomics.com/ |
Hi Max, I understand and thank you for the alternatives, but I'm not sure that it should preclude us - especially if we gain other benefits. You'll notice BigQuery is also closed source, but we have an implementation of it for Google Genomics here: https://github.com/googlegenomics/bigquery-examples Thus utilizing the capabilities via a service does not require seeing the source code. All I am saying is that we have a better platform in operation ready to go for large-scale storage and analysis, and there seem to be advantages of such a closed platform that has components implemented in C/C++ vs Java with other optimizations (i.e. Collosus, etc.). Think of Google Caffeine which is based on Percolator that replaced processing on MapReduce, because it allowed more efficient processing of its indexing system. In fact Google Pregel can be very helpful for the variant analysis step where for Since we are all working together as a team, everyone has their expertise which makes this project great. At least for me, the source code is not that critical in the system we use. If the service to the system can accept and process correctly a schema for updating the keys or data for storage and processing, then I see no downside. If a new platform exists with added benefits over the current implementations, and it does not impact production negatively, then I say let's try it out. Otherwise we keep tweaking the schema because of limitations in on older technologies, which might limit some of the analysis possibilities down the line. Paul |
@pgrosu - this kind of thing probably isn't a good fit for ga4gh. Like Max said, there are plenty of open source alternatives for this use case if we decide we need this kind of solution. In this repo though (ga4gh/schemas) we actually don't have a need for any large scale backend as this is just an API definition - and not an implementation. Because of that, I'm closing this issue. |
@cassiedoll - I understand, no problem :) |
Hi Cassie, David, et al,
I just read the paper regarding Google's Mesa from the following links:
http://research.google.com/pubs/pub42851.html
http://research.google.com/pubs/archive/42851.pdf
I noticed it has several advantages such as online schema changes as well as query by function on sets of values, which can perform in near real-time. Another advantage is that it has petascale data-warehousing with ACID properties for transactions. The function on sets can be especially useful for the many-to-many relationships we have in our schema.
This seems to have some advantages over Megastore, Spanner, and F1 that we can try to leverage.
I was wondering if we can test the schema with data on a development area of Mesa.
Thank you,
Paul
The text was updated successfully, but these errors were encountered: