-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repo size issue: consider using git submodules #1554
Comments
Please see #1050 where we go into the repo size in detail. Here's some summary points:
This sums up to an annoyingly inescapable conclusion - rewriting the history for this project won't help reduce size, submodules won't help reduce size. THe only thing that will help reduce size is to move to a clean repo that does not contain the long history that this project does. Going to close this as addressed, but happy to continue chatting if you disagree or have other ideas |
Thanks for your answer. |
One more thing (since I've also just PR-ed something)... You state "this repo should never contain source code for any frameworks", I've been reading a few times your guidelines for contribution and I've totally missed that point (it might be me, of course). |
What he meant there is that a given test should only contain the source code required in spinning up that implementation, but NOT the framework's source code. |
Ouch! Ok yes... thanks @msmith-techempower |
Sure thing. I think we had examples of issues like CakePHP (which is quite large) being included entirely in the test's directory instead of being treated as a requirement of the test itself. Today, we have a system implemented where dependencies can be specified, downloaded, and built in an environment automatically for a few KB, so that is preferred. |
Yeah totally... I didn't even think of adding framework code, that's why I didn't get it at all :) |
Ah yea, poor clarity For what it's worth, one "hack" is to use the |
How much easier and pleasant it would be to maintain framework benchmarks if they were located at their own repos. Just imagine, you're developing your benchmark as a usual app in its own repo, using travis and some kind of microservice provided by TechEmpower to validate the app works as expected. No need to install ruby, vagrant, VirtualBox, try to find out why VB 4.10 doesn't mount directories, etc. That's it. And then TechEmpower's main script could clone the repos and run the apps to measure whatever you want. But one repo is enough for all? That's terrible no matter sources of frameworks are included or not. |
That's actually exactly what our main script does 😄. I recommend you read the thread above this point to get some clarifications on a few points you raised, and invite you to comment on #1050 |
@hamiltont Here is a random PR that adds Goji. The app that could be tested in a minute takes 39 minutes. Is it a straightforward solution to rerun tests of 129 apps when only 1 is affected by change? And every test installs all available databases just in case some of them will be used (though, only one is required per test). Is my understanding correct? |
It's the best option we have, but feel free to prove me wrong with a PR! Running ~129 instead of 1 is a Travis-CI limitation, and one that we have discussed with travis many times (both via Github and telephone conferences). They are being incredibly generous with their computing time, so we're in no position to complain. There are a bunch of long threads in our issues (filter by travis) where we discuss how we integrate with travis and potential improvements. FYI, once we enabled using travis pull request merge time went from months to days, so while it may seem slow it has been a massive improvement up to now. If travis ever enables the use of sudo in their docker-based execution environment then you can expect there to be another order-of-magnitude improvement in build execution times as we won't have to wait for the unneeded VMs to launch. While it is technically possible for someone to use the travis requests API and submit each job independently, 1) a pretty decent undertaking, and no one has stepped up 2) not really the way the requests API is intended to be used, and would likely cause lots of breakage of nice stuff like the badges on github, github linking to travis builds, and perhaps even the history page on travis-ci You are correct that we install all databases for each test, see here for a location that could be updated, a PR would be great! It might save up to ~45 seconds per test that is actually required to run, so in the worst case (which currently consumes ~21 hours) of needing to test every framework that would recover us 45*125 = ~90 minutes |
@hamiltont I'll try to illustrate what I meant by my first message. Very likely I'm missing something. But I'm really curious could the following approach work for your project?
install:
# The script below installs PostgreSQL and makes all necessary
# data available as environment variables.
- curl https://techempower.github.io/scripts/PREPARE_POSTGRES.sh | sh
# Now $PGQL_NAME, $PGQL_PASS, $PGQL_PORT, etc. environment
# variables are available and may be used by the app.
- ./this/app's/BUILD.sh # Script that describes the building process of the app.
- ./this/app's/START_DAEMON.sh # Script for running the app.
# The line below tests the app and at the end kills it.
- curl https://techempower.github.io/scripts/START_TESTS.sh | sh
This approach looks the most natural to me. No need to do something unusual with the travis API, possible to test only things that are changed. What is the motivation to use a single repo for all apps instead and try to defeat travis that's on your way because of that? |
@alkchr This is a really interesting idea. Previously the TFB repo was pretty unorganized so this would have been impossible, but now it's more realistic. I would probably choose to have one Here are my thoughts:
My main concern is the core toolset issue. We've consistently had to make changes that end up affecting frameworks in unexpected ways, so automated testing of the entire set of frameworks is critical. Input welcome... @msmith-techempower @LadyMozzarella thoughts? PS - I'm well aware that our travis-ci today is having serious issues testing anything. There are bugs due to many small changes (we depend on 10s of external projects just to get Travis-CI up and running for a basic test), and no one seems to have had the time to put in a PR. |
Oh, @bhauer too. I forgot who to tag - it's been a while ;-) |
This is a long thread that I honestly ignored at first since submodules have been annoying to me in the past, so please excuse me for trying to summarize and probably getting it wrong. It sounds like what is being suggested here is similar to something we did internally with our Gemini framework recently: update the current project to be I am definitely not the guy to ping on this matter, and probably Brian is going to balk at it as well. I think I would love to get the input of ... crap. I thought Michael Hixson had a github account on here, but he either does not or his handle is different than I am expecting. Regardless, I shall simply send him an email directly. |
What we did with Gemini involved Maven modules, not Git submodules. (They are sort of related in that we used Git submodules in Gemini-based projects to point at Gemini. We had a bad time with Git submodules. Then they became totally unnecessary once we Maven'd it up and starting hosting actual versioned releases of Gemini internally.) For the framework benchmarks, Git submodules might be a good idea. It makes sense. How much Git history would we lose? Any? |
I'm thinking the frameworks part of your repo could contain 'just' submodules for all different language so that people could just clone/push the part that includes the language they need instead of the whole thing (which is now almost 180Mb)
The text was updated successfully, but these errors were encountered: