ideas for best practices with binary files or large files like images #35
Replies: 6 comments 30 replies
-
|
@Tom-van-Woudenberg did i capture it all? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
This is a difficult one. I think the main problem can be postponed/mostly avoided with some good practices. However, it does require incorporating all these "good practices" in everyone's way of working. These will probably get you most of the way:
And for reducing the git history: do any development on the book in branches and use PR's with squash merges. This will reduce how many commits (and probably also changes in binary files) will be in the history.
Not yet 😅 |
Beta Was this translation helpful? Give feedback.
-
|
@ibcmrocha, you'll like this ;) |
Beta Was this translation helpful? Give feedback.
-
|
@BSchilperoort , @moorepants @rlanzafame , I had some very nice talks with the DCC (digital competency center) people about this today. I'll forward the info they'll send me when I have it, but the main conclusions:
Other insights:
|
Beta Was this translation helpful? Give feedback.
-
|
@timonidema, opinions from your side here? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Tom and I have been brainstorming how to set up our MUDE book better now that we are moving it from TUD GitLab to GitHub. On the one hand it is a chance to reverse some bad practices like dumping huge binary assets like image files in the repo, as well as having a better way to store large files like a zip containing a notebook, csv or other data for an assignment (e.g., downloaded from a page using the download link replacer).
There are two main reasons for addressing this:
For reference, the MUDE book build is currently around 350 MB (unzipped) and includes about 169 MB of figures, 84 MB of ipynb files (including cell outputs), 27 MB of HTML and 2 MB md files (yes, I know this doesn't add up, but you get the idea...the rest is probably files like wheels for the thebe/pyodide feature, py files, etc).
Some obvious things to consider are:
While these things are relatively straightforward, they also present a burden to newcomers to book-building, as they are already trying to learn about Git, Python, Sphinx, Jupyter Book, etc, etc, etc. Item no. 3 has the added disadvantage of making build time for a big book like MUDE very long.
So far this is what we have considered:
Git Large File Storage: this could work nicely if you don't plan on using the images in more than one place (e.g., book 2 uses image from book 1, then book 1 changes it without telling book 2)
Using a second GitHub repo adds an extra step, but is easy to do. One thing to consider is that there are limits to how often an image can be requested, see here...i don't understand it all, but seems like once you have a few hundred students working on the same set of files at once, you might have issues, and this is definitely something we have in MUDE.
jsDelivr seems like a totally free and reliable way to serve "big" files (less than 20MB) that are already in a github repo, and would prevent github from getting angry if our websites start to get too many requests (see here...i don't understand it all, but I worry that 300 students loading images from the same repo page will eventually lead to issues). So jsDelivr is cool because you can use a jsdelivr URL for files instead of github with predicatable URLs. I found this page and set up an example, it's insanely easy:
so the idea here would be that this is a good solution for images that don't change (don't want to track in git) but also jsDelivr versioning and updating may lead to slow changes
In WWJD fashion we know that Jason uses Dreamhost DreamObject to store assets, see README for policy. We are hesitant to adopt this because it is another platform to manage and for the case where we want to create shared edu resources that are easy to find and reuse, it's more difficult to collaborate. GitHub may be a better option because with PR's contributions can be done the same way as contributing to a book. And a took like jsDelivr solves the problem that GitHub is not intended to be a CDN (even though for our edu purposes it would be uncommon to exceed the request limits)
So at the moment we are thinking of doing this:
Beta Was this translation helpful? Give feedback.
All reactions