-
-
Notifications
You must be signed in to change notification settings - Fork 115
Design and create a test suite #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I was about to create an issue to track this @abhijithnraj. This a good point to start integrating tests NOW. @initialcommit-io even I was skeptical while adding support for the
I didn't know how to test the images that will be generated but this looks promising.
Exactly. This will ensure great test coverage and using tools like Python version is already an issue with the new |
@abhijithnraj Thank you for creating this issue and for your proposed contributions! Yes, it's clear we need to start on this now, as @abhijitnathwani brought up some breaking changes across Python versions that we had not been testing for. For context - I didn't have any tests going into the project because I wanted to gauge user interest before spending too much time on it, but now that we know the project has a solid user base, we need to prioritize a robust test suite for the reasons you both mentioned. @abhijithnraj I don't have any specific test design in mind, but I agree with all of your points, and good find on the Manim test option. Maybe a good place to start is if you want to put together a PR with a simplified version of an initial test suite, that covers a single aspect of the codebase - probably Then we can all discuss the PR and see if it will work for the whole codebase/program. |
@abhijitnathwani @abhijithnraj FYI I meant to mention - I will be working on a small tool that can generate dummy Git repos with specific numbers of commits / branch structures / etc. I am thinking we can use this in functional tests to create the structure we need for testing certain commands, like merge/rebase, etc. |
@initialcommit-io Yes, that can help with generating test data. |
@abhijithnraj Do you have a timeline for an initial version of the test suite that provides coverage for the |
@abhijithnraj @abhijitnathwani @paketb0te Here is the first version of a tool I created to generate dummy Git repos: https://github.com/initialcommit-com/git-dummy I'm thinking we can use this in our functional tests. |
@initialcommit-io I am studying the graphical test cases for manim. I think I can raise the first PR at max 2 days. We will add more testcases once we finalize the framework in the PR. Meanwhile I will checkout git-dummy. currently for testing, I wrote a simple code to generate some commits. |
@abhijithnraj Sounds great. FYI I added a |
@abhijithnraj @abhijitnathwani @paketb0te Hi again - last update for a bit on git-dummy, I added flags I also added Its not much, but I think at this point git-dummy has enough so that we can use it for basic functional test coverage, to set up automated scenarios/repos we need for the tests. I am also planning to add hooks for git-dummy into Git-Sim, with something like a |
@abhijithnraj Hi again! Just wanted to check in and see if this is still something you were working on? |
Hi, Still working on it. |
Hi @abhijithnraj @abhijitnathwani @paketb0te ! Hope you all are doing great =) Just a note I released a significant refactor in git-sim v0.2.6 which overhauls the way it parses the commit history and adds several important new options/flags. Git-Sim now properly traces parent/child relationships, which means it can display more accurate and complex graph structures, with multiple branches and the relationships between them. See the release notes for more details: https://github.com/initialcommit-com/git-sim/releases/tag/v0.2.6 I figured I'd mention it here since it might affect the test suite, depending on how it was being written. Let me know if you get the chance to try it out. It's likely there are some bugs since we still don't have the test suite set up yet, but I really wanted to get this released since I feel it's an important milestone for the project, will make the tool much more useful for users, and will make future development smoother. |
Hey @initialcommit-io |
Hi @abhijithnraj, @abhijitnathwani. This is such a cool project! Do either of you have the work you've done public at all? I saw your forks, but don't see any additional tests. I would like to jump in, but don't want to step on your toes if you're about to share some progress. |
Hey @ehmatthes |
Sure, here's a quick take on how I'd start this. Because this is still an evolving project, I'd focus on end to end tests and not worry too much about unit tests. I think I can write a demo test suite this week that would do the following:
pytest handles this kind of setup really nicely. If everything passes, you never even see the temp dir. If the tests fail, you can drop into that temp dir and see what happened. In that temp dir you can then run the command manually, and do a number of other troubleshooting actions. If that sounds good to you all, I should be able to share a demo by the end of the week. |
@ehmatthes This sounds great. I completely agree that end-to-end tests make more sense for the project at this point, to provide baseline insurance that code changes won't break functionality. I think the tests should catch 2 types of errors caused by code changes (let me know if you have other ideas):
The nice thing about comparing to a reference image is that it is comprehensive - the image captures all elements drawn by git-sim, including commits, arrows, text, etc. So any unintended change to those would cause a test failure. The other benefit is that the default git-sim output is an image file, and this corresponds to the final frame of git-sim video output if the user uses the One downside of comparing images might be performance, so I'm curious how that turns out. Another consideration is that reference files will need to be updated/maintained as functionality changes over time. For setting up the Git repos to use in each test case, I recommend (again :D) using git-dummy for reproducibility and flexibility. Certain test cases require a Git repo to be in a certain state, and the ability to generate a consistent repo to meet that criteria is why I created git-dummy. Ideally the dummy repos would just be created/cleaned up as a part of the test cases. If you run into a scenario that git-dummy can't currently generate a repo for, let me know and I can add that functionality into git-dummy. Let us know if you have any other thoughts! Looking forward to seeing what you put together as a demo 😄 |
Yes, this matches my thinking as well.
I have done this in another project, and performance was not a significant issue. For end to end tests you have to generate the images that the project depends on. Once they're generated, the comparison itself does not take long. One really nice thing about this approach is that it makes it really straightforward to update the reference files. When you run a test, you can pass a flag to stop after the first failed test. You can then drop into the test's temp dir, and look at the generated image that doesn't match the reference file. If this image is correct, you can simply copy that into the reference file folder, and the test will run. You don't have to generate the test file in a separate process; running the tests and failing becomes a way to keep the test suite up to date. The test suite becomes much more than just a pass/fail test, it shows you exactly what the current code would do for an end user.
Does git-dummy generate the same hashes each time it's run? That seems like an issue for testing. |
Great point. The hashes would screw that up, since they depend on the timestamp of the commit which would be regenerated each time git-dummy is executed. Git-Sim actually has a global flag called Another option is that I can create a new global option for Git-Sim called Alternatively, we might be able to update git-dummy to use hardcoded, expected timestamps via environment variables Do you have any thoughts/preferences? |
I don't think any of this is an issue. I think it's possible to create a single git repo to use in all tests. There will be new git-sim commands run, but not new git commands run. So git-dummy might help to build the initial test suite, but it doesn't need to be used on each test run. |
Hmm, we could do that, but using a single repo to cover all test cases could get increasingly complex as more and more test cases need to be covered. To eventually get full scenario coverage we may need to jump around in the Git repo to do stuff like checkouts/switches/merges/rebases from different starting states. There are also certain scenarios captured by git-sim logic that are based on other factors, like the number of commits in the repo / on the active branch. For example, in a repo with less than 5 commits, there is special logic to handle the display. My thinking with git-dummy is that it would be used to create a tailor-made repo for each test case with the desired number of commits, branches, merges, HEAD location, etc, and then just clean it up after each test run. This guarantees a minimally complex repo that suits the specific goals of each test case. Of course it does add time since git-dummy would need to repeatedly create/clean up the reference repos, but it performs pretty fast and has the benefit that the git-dummy command for each test case could be hard coded into the test cases, instead of having to ship a test repo around with the codebase. Maybe there is a happy medium where we can use git-dummy to create a handful of test repos that are each suited to subsets of test cases, instead of generating and cleaning up a new repo for each test case... Thoughts? |
Is there any way to prevent the image from opening automatically when calling something like I tried |
Never mind, I just found it: |
Yes! Use the You can use in conjunction with |
Okay, I think the test for |
Awesome - I actually switched to pc last year but I do have a mac as well I can try it on... Feel free to submit a PR to the dev branch and I'll take a look. |
I did not run into any file permission issues during development, and I don't see any now. I just modified def test_log(tmp_repo):
"""Test a simple `git-sim log` command."""
...
print("fp_generated:", fp_generated)
assert compare_images(fp_generated, fp_reference) Here's the output of running just this test on macOS: (.venv)$ pytest -s -k "test_log"
...
fp_generated: /private/var/folders/gk/y2n2jsfj23g864pdr38rv4ch0000gn/T/pytest-of-eric/pytest-414/sample_repo0/sample_repo/git-sim_media/sample_repo/images/git-sim-log_07-04-23_15-33-49.png And here's the same output on my Windows VM: (.venv)> pytest -s -k "test_log"
...
fp_generated: C:\Users\eric\AppData\Local\Temp\pytest-of-eric\pytest-71\sample_repo0\git-sim_media\sample_repo0\images\git-sim-log_07-04-23_16-07-22.png I get similar output in a Git Bash shell on my Windows VM. What output do you get for the path to the generated image file? The path to the generated file is being read from the output of the > cd C:\Users\eric\AppData\Local\Temp\pytest-of-eric\pytest-71\sample_repo0
> git-sim log
Output image location: C:\Users\eric\AppData\Local\Temp\pytest-of-eric\pytest-71\sample_repo0\git-sim_media\sample_repo0\images\git-sim-log_07-04-23_16-11-38.jpg
> git-sim -d --output-only-path --img-format=png log
C:\Users\eric\AppData\Local\Temp\pytest-of-eric\pytest-71\sample_repo0\git-sim_media\sample_repo0\images\git-sim-log_07-04-23_16-16-55.png The first command here shows that I would be curious to know if git-sim is generating different output in your environment, or if the test code is not parsing the output properly in your environment. If def test_log(tmp_repo):
"""Test a simple `git-sim log` command."""
...
print("output:", output.stdout.decode().strip())
print("fp_generated:", fp_generated)
assert compare_images(fp_generated, fp_reference) |
Sorry about that. It looks like the changes I was making to git-sim and trying to test caused an error that led to the output not being generated at all, hence the confusion. After tweaking my change, the tests now run, but still fail due to bad ratio diff. I did confirm it is using the Windows reference files to compare, but looks like the fonts used differ between my output and the Windows reference files. I think this happened to you at one point as well, but we thought it was across OS's. This font difference makes the ratio_diff up to 5%. Attaching images of the git-sim log Windows reference file and my output showing the differing fonts (see below). On my system it uses Monospace Bold which is the font that git-sim is configured to use for all output. I assume the reference files were generated on a system where Monospace was not available as a system font (or maybe just not in the correct bold weight), so it used the default system font as a fallback. This could get hairy as devs try to test across different systems... I wonder if we could get around this by adding a global |
I think the most reliable fix for this would be to bundle a specific font with the test suite. I added a .ttf file to the test suite, and modified To experiment with different fonts in the class GitSimBaseCommand(m.MovingCameraScene):
def __init__(self):
super().__init__()
self.init_repo()
self.fontColor = m.BLACK if settings.light_mode else m.WHITE
self.font = "Monospace"
self.drawnCommits = {}
... With this change, all tests still pass. If I set To use a bundled font, I had to make the following change to def construct(self):
with m.register_font("/Users/eric/projects/git-sim/tests/e2e_tests/ProggyClean.ttf"):
self.font = "ProggyCleanTT"
if not settings.stdout and not settings.output_only_path and not settings.quiet:
print(
f"{settings.INFO_STRING} {type(self).__name__.lower()}{' --all' if self.all_subcommand else ''}{' -n ' + str(self.n) if self.n_subcommand else ''}"
)
self.show_intro()
... This made fairly ugly output, but I believe the output would be identical on all systems: I don't think I'd bother using a tiny font like this. I was experimenting with a tiny font, and that's one that showed up in a quick search. Also, I'm guessing you wouldn't have to add a context manager to every command's |
Thanks for all the deets! Yes using Courier New in The quickest and least invasive solution for now seems to be adding a global flag for the user to specify the font, and then hardcoding that as Courier New from the tests raw command. This will have the added benefit of letting the user customize the font. I will try this out and update here.
And oh yes - this would be great! |
Ok - this is implemented on the dev branch. I added a Wanna try it out and lemme know if it works for you? |
Okay, the dev branch works for me. I did have to rebuild my virtual environment. That wasn't obvious until I tried to run a git-sim command manually. I removed the Windows-specific files, and the code that modified the reference file path to look for a Windows-specific file. I also tried using a bundled font at the test-only level: def test_log(tmp_repo):
"""Test a simple `git-sim log` command."""
raw_cmd = "git-sim log"
cmd_parts = get_cmd_parts(raw_cmd)
os.chdir(tmp_repo)
with m.register_font("/Users/eric/projects/git-sim/tests/e2e_tests/ProggyClean.ttf"):
output = subprocess.run(cmd_parts, capture_output=True)
fp_generated = Path(output.stdout.decode().strip())
... This didn't work, and I wasn't too suprised by that. I believe |
I made a PR so you can pull these small improvements if you want. Feel free to reject the PR and incorporate those changes whenever appropriate. |
Sure I'll merge the PR, but can you make it to the dev branch instead of main? Looks like it's including 2 of the commits I made since those don't exist on main yet... But those should go away if you change it to the dev branch. |
I'm newer to committing to others' projects, so some of my Git habits are unclear. I forgot to run Black before submitting the PR, so now the PR would include three commits: 9e0b88 (HEAD -> dev, update_tests) Use Black formatting conventions.
6d0582 Remove Windows-specific test files.
65050d Additional assertions about filepaths that are generated during testing.
7c9c32 (upstream/dev) Update e2e test reference files for macOS Should I submit a PR with these 3 commits? Do you want me to squash them? Do you squash them when you merge? |
Thanks for running Black - this is my first time using it as suggested by other contributors. On my Mac I was able to get Black to to run automatically when exiting Vim (via a Vim plugin), but had issues setting that up on my current Windows machine so now it's manual again, so I often forget to run that myself too 😸. You can just submit the PR to the dev branch with the 3 commits, I'll squash and merge them in the GitHub interface. |
Sure thing, I'm trying to use Black more consistently in my own work as well. I resubmitted the PR. |
Merged it and added tests for the remaining non-networked subcommands. I regenerated all the reference images on my Windows machine and then tested on my Mac, where 12/18 tests passed and the other 6 failed due to bad ratio diff. Most of these were barely above the threshold, but the biggest difference was about 0.0126. I'm thinking this is just due to how the image files are generated on different platforms (regardless of font). Anyway, I'm thinking we can just boost the acceptable ratio diff from 0.0075 to 0.015 to take into account these subtle differences. |
(the updated reference files are pushed up in the dev branch if you want to test on both your systems to compare) |
I got the same results on my Mac. The highest ratio came from the log command, but I opened the generated image and the reference image and couldn't see any difference at all. I think a ratio of 0.015 sounds perfectly reasonable. I'm going to be a little embarrassed, but mostly curious if someone who's more familiar with working with images comes along and has a much simpler way to test image output. The test suite is slower now, as expected when running 6 times as many tests. It takes 80s to run the test suite on my system. I was tempted to focus on the Spent 3.334 seconds running test_add.
Spent 0.085 seconds in compare_images().
.Spent 4.103 seconds running test_branch.
Spent 0.087 seconds in compare_images().
.Spent 5.484 seconds running test_checkout.
Spent 0.088 seconds in compare_images().
.Spent 5.732 seconds running test_cherrypick.
Spent 0.088 seconds in compare_images().
.Spent 3.235 seconds running test_clean.
Spent 0.085 seconds in compare_images().
.Spent 3.205 seconds running test_commit.
Spent 0.086 seconds in compare_images().
.Spent 3.837 seconds running test_log.
Spent 0.091 seconds in compare_images().
.Spent 5.776 seconds running test_merge.
Spent 0.088 seconds in compare_images().
.Spent 3.617 seconds running test_mv.
Spent 0.086 seconds in compare_images().
.Spent 6.985 seconds running test_rebase.
Spent 0.090 seconds in compare_images().
.Spent 4.160 seconds running test_reset.
Spent 0.086 seconds in compare_images().
.Spent 3.301 seconds running test_restore.
Spent 0.086 seconds in compare_images().
.Spent 3.348 seconds running test_revert.
Spent 0.086 seconds in compare_images().
.Spent 3.518 seconds running test_rm.
Spent 0.086 seconds in compare_images().
.Spent 3.328 seconds running test_stash.
Spent 0.086 seconds in compare_images().
.Spent 3.205 seconds running test_status.
Spent 0.085 seconds in compare_images().
.Spent 5.336 seconds running test_switch.
Spent 0.089 seconds in compare_images().
.Spent 4.065 seconds running test_tag.
Spent 0.091 seconds in compare_images(). If you want to profile the current suite on your system, it's on my profile_tests branch. There are a couple ways to speed up tests... |
Parallel testsThe main way to speed up tests is to run them in parallel. pytest has a really good plugin called pytest-xdist which handles all of the overhead of distributing tests for many test suites. It works for our suite. Install Non-parallel output: $ pytest
============== test session starts ===========================
platform darwin -- Python 3.11.2, pytest-7.4.0, pluggy-1.2.0
rootdir: /Users/eric/projects/git-sim
plugins: xdist-3.3.1
collected 18 items
tests/e2e_tests/test_core_commands.py ..................[100%]
============== 18 passed in 79.52s (0:01:19) ================= Parallel output: $ pytest -n auto
============== test session starts ===========================
platform darwin -- Python 3.11.2, pytest-7.4.0, pluggy-1.2.0
rootdir: /Users/eric/projects/git-sim
plugins: xdist-3.3.1
10 workers [18 items]
.................. [100%]
============== 18 passed in 22.69s =========================== The |
Running selected testsYou can organize the test suite into multiple files. For example you could pick a set of core commands you always want to test, and put those in test_core_commands.py. Then other tests go in test_noncore_commands.py. You'd run pytest has a way of marking tests, so you could just mark some tests as core tests. Add the decorator |
Sweet! I'll bump that up.
Uhhh..... this is awesome!!! On my PC this speeds things up from 45s to 14s. I think it's a good compromise to gain that kind of speedup by running in parallel, and if something fails to rerun in series to get the output. Random note on parallelism - I spent a lot of time using the multiprocessing module to try to get git-sim to render individual animations in parallel (by rendering independent segments of each video in parallel) to speed up performance, as I feel that Manim's performance is a major limitation of git-sim. Unfortunately I couldn't get it working in the end as I had issues passing data (specifically file handles/references) in between processes. It was pretty weird behavior and I was kinda surprised I couldn't get it working - seemed so close. Maybe one day as Python's multiprocessing and pickling tools get more robust...
Good to know! |
I have one more thought at the moment, not about performance. You can set custom messages to display when a test fails. For example you're seeing "bad pixel ratio: " at times in the output, but if there are multiple test failures you kind of have to hunt for which tests had which ratios. You can write code like this: msg = f"{path_gen} image comparison failed: {ratio_diff}"
assert ratio_diff < 0.015, msg Then you'd see that information for failing tests even without using the A lot of the decisions like this just come down to how you want to use your test suite. What kind of information do you want to see in all test runs? What kind of information do you want to see only when looking at results in detail? |
Funny you mention that - as a part of the threshold update I just committed I updated that very same printed message to specify the subcommand that failed:
😸 |
Do you mind if I ask about the specs on your system? I think I have a reasonably well-specced system, but your tests are twice as fast without parallel, and still significantly faster when run in parallel. I wonder if there's anything about my setup that's slowing things down for me.
I've got other things going on for a while so I'm going to stick to a focus on testing for the time being, but I would love to look into this at some point.
You might give the assert message a try, because I think this is the only remaining reason to use the |
I upgraded my PC's CPU to the 13600k late last year, runs a lot faster on my PC now than on my Intel mac.
Oh definitely - I'm grateful for just the testing help this has been incredibly good for the project. Happy to share what I've done on the performance front at some point if your get some extra time!
Oh gotcha I'll take a look - I missed that assert statement in your note. |
To be clear, that would move the assertion from the individual test functions to the compare_images() function. That seems like a slightly better structure anyway. I'll go quiet for a while, but if you ever reply somewhere here and don't hear from me, feel free to reach out directly over email. I think I'm going to write a newsletter series on Git at some point, and I'll definitely use git-sim for the visualizations for that series. |
One more quick thing, you might want to remove the stub file test.py from the top level. It's not causing any problems, but with pytest in use it's just taking up space and it might confuse someone looking at the project. |
That's awesome. Please drop me a note when it's ready would love to take a look.
Oh yeah - I thought about that but decided to leave it as a stub for unit tests. But you're right now that we have the |
Closing this since related work to refine the test suite will be handled thru #99 |
Hi, I have been following this project for some time since its release. Great Work.
But support for several git features coming in, is there any plans or ideas on testing the features. I would like to work on this.
For Manim based tests, we can have a look at the same way manimCE tests their own code. https://docs.manim.community/en/stable/contributing/testing.html and https://github.com/ManimCommunity/manim/tree/main/tests
Hi, @initialcommit-io , is there any test design you have in mind.?
Some of my ideas:
Once done we can add an actions pipeline to ensure continuous integration.
I am working on this, let me know if there are any existing plans on this.
Once again, Thank You for this wonderful project.
The text was updated successfully, but these errors were encountered: