-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#9? Put big data where and how you can compute on it rather than moving complete raw datasets around. #39
Comments
Can I assign this paragraph to you @dlebauer I think it's a good issue. |
I might also frame this as 'how important is the movement of data'. This is kind of touched on in #32, but this issue needs more detail. This might also be another place to talk about data access via an API. |
I'll take it On Fri, Apr 10, 2015 at 12:01 AM, Edmund Hart notifications@github.com
|
Wondering if we can come up with a pithy title for this rule that doesn't include the "size matters" stuff. Some suggestions:
Well, those aren't very good, but ... |
I agree that "size matters" isn't the best for a title, but the section goes beyond just moving data, so I think the first and third suggestions are too narrow in scope. I'll think about it |
How about "data volume has consequences" ? The people who worked on the Sloan Digital Sky Survey used to say that (at least, as of 10 years ago) there wasn't much that could beat the bandwidth of a FedEx truck filled with tapes or hard drives. I always find that amusing and I wonder if we can work it in somehow. |
When I was a student we found that the bandwidth of sending a CD through the post was very favourable; these days USB sticks can hold huge volumes of data, but it's an interesting calcluation. |
@PBarmby I like that one and like the FedEx truck/post analogies. Maybe even bring up "sneakernet" |
any such citations for referencing the sneakernet or FedEx stories?
|
"station wagon full of tapes" is Tanenbaum according to Wikipedia: https://en.wikipedia.org/wiki/Sneakernet#Non-fiction I've probably read that text, but I can't verify the quotation now. |
And the authoritative source, xkcd! Also some refs in the ref section of the wikipedia article that @drj11 linked. |
@dlebauer do you need help with this one? I can start on it if need be. |
@PBarmby sorry I forgot to flush this one out. Would be great if you want to work on this, otherwise I can work on it next week. |
Covers proximity / mounting and architecture. Don't hog active space if slow storage is sufficient - scan for untouched files to put in longer term storage.
Don't move it around if you can help it. If you must, use appropriate tools, Store local 'cached' copies (eg use knitr argument) instead of writing scripts that always download archived data. Only do so if there are changes.
Do sub setting server-side, computing in database (dplyr lazy eval) etc.
The text was updated successfully, but these errors were encountered: