Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is concurrent execution on one machine supported? #231

Closed
ghost opened this issue Nov 22, 2016 · 27 comments
Closed

Is concurrent execution on one machine supported? #231

ghost opened this issue Nov 22, 2016 · 27 comments

Comments

@ghost
Copy link

ghost commented Nov 22, 2016

Id like to concurrently run citgm on a couple of different node on the same machine. Is this possible without one messing the other(s) up?

@ghost
Copy link
Author

ghost commented Nov 22, 2016

..running several now, so seems to be.. thx

@ghost ghost closed this as completed Nov 22, 2016
@ghost
Copy link
Author

ghost commented Nov 22, 2016

..but looking closer, I'm seeing strange results, like modules not installing, or their tests just skipped.. not yet familiar enough with all of citgm's nondeterminism (which is an awful thing to have during testing) to know why I'm seeing what I am..

does anyone know if a globally installed citgm can safely run concurrently?

@ghost ghost reopened this Nov 22, 2016
@gdams
Copy link
Member

gdams commented Nov 22, 2016

what os are you using? Can you send an example of the failures and what command are you running?

@ghost
Copy link
Author

ghost commented Nov 22, 2016

uname -a
Linux druid 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I've been re-running serially to rule out so I can see if my build of node is blowing chunks. The consistency I'm seeing in the results leads me to think concurrent execution doesn't quite work, but not positive on that.

I will come back a little later and specifically run citgm concurrently to see what, if any, issues that actually creates. However, if you already know citgm is using some global state, like storing a list of work-in-progess in a file or something, then we'll already know probably not a good idea to run concurrently

@richardlau
Copy link
Member

richardlau commented Nov 22, 2016

If any modules contain native code running concurrently will probably run into nodejs/node-gyp#1054.

@MylesBorins
Copy link
Contributor

This is not a design consideration of citgm. Closing.

If you have a concise proposal as to how this could be fixed please open another issue to discuss prior to implementation.

@ghost
Copy link
Author

ghost commented Nov 22, 2016

I'm getting the feeling I'm bothering you guys. This issue wasn't at all meant to be any kind of proposal, but was simply a question. I'm sorry, I'll stop asking questions... just new to citgm.

@ghost
Copy link
Author

ghost commented Nov 23, 2016

@thealphanerd I'm requesting your permission to post questions on this repo.

Can you please help me understand how I must phrase questions, and what constraints they have as to content, such that they are not immediately closed.

@ghost
Copy link
Author

ghost commented Dec 8, 2016

@thealphanerd It would have been kind of you to reference your related PR #144

@MylesBorins
Copy link
Contributor

MylesBorins commented Dec 8, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 8, 2016

Thank you kindly Myles for replying and apologizing; no worries. I know you're at least aware of my efforts around helping fix module symlinking. I was having to run citgm-all four times; v7.2.0 NODE_PRESERVE_SYMLINKS=0 and 1, and then v7.2.0-sjw NODE_SUPPORT_SYMLINKS=0 and 1.. and it was just hurting a little having to wait, which was the impetus for this issue. I'm happy to do anything I can to help, even if it's staying out of the way (fwiw, I'm continuing to work on my style of communications on this medium, and appreciate your tolerance)

@MylesBorins
Copy link
Contributor

MylesBorins commented Dec 8, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 8, 2016

Scaling a single running instance of citgm-all will help :).

I had success by creating an additional user account, citgm1, and then using with nvm to completely isolate the environments (node/npm/node-gyp), and still use the same version of node. I was able to concurrently run two processes of citgm-all successfully.

Only one module differed while running under the citgm1 user account: a yeoman-generator test was trying to access a Gruntfile.js and threw EACCESS on some permission problem somewhere. I tried exploring a little bit, but haven't gotten back to uncovering the underlying cause. I did however confirm it was something related to user permissions, and not from the two processes clobbering each other in some way.

It might be worthwhile to explore integrating nvm, or implementing something similar in citgm with respect to environment isolation.

@ghost
Copy link
Author

ghost commented Dec 8, 2016

As an aside, my goal was just to use all the cores while testing (I was running on a 32 core server). A parallelized version of citgm-all alleviates the need to run two or more of them concurrently :).

I'n my fantasy world, citgm would be provided the module to install/test, and the versions of node and npm to use. It would then create an isolated environment, exercise the module, then store the results by hashes of node executable, module package, and date-time. It would have registered itself over a well known port to citgm-all which would be responsible for coordinating things, providing the needed info to the listeningcitgm instances. With this conceptual approach, if I was in a hurry, I could stand up two, 32 core servers and have all testing done in about as long is it takes the longest test to run (probably under a minute). Also, by keeping results by node + package hashes, it would be easy to compare results between any two versions at any time, and to not re-run (unless asked to) when unncessary, but still automatically re-run when a newer version of the module or node was being exercised.... but then again I'm a dreamer :)

@MylesBorins
Copy link
Contributor

MylesBorins commented Dec 8, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 8, 2016

Forgive my miscommunication. The described implementation is not the thing, but rather what it's implementing is. To further elaborate and expand on your comment:

The ideal is that a test-coordinator has a list of registered workers that can only run a single atomic test at a time, where each test is distinguished by a cpu, os-version, node-version, npm-version, and module-version, and the results are keyed and stored by those things (and timestamp) in a shared store. The workers would have registered themselves by cpu and os-version. The coordinator would be given a list of one or more test configurations to run, and would just go make it happen based on what workers had registered themselves.

I suggested one way to isolate a worker on a single machine. Using docker is just another way to box an environment, but I would still think some level of participation would be required by citgm as workers, and citgm-all as a coordinator, to manage the specifics in the way they're specifying, testing and collecting results.

Might you educate me a little, as I'm ignorant in many ways, as to how you see that concept implemented entirely as "the responsibility of the runtime not the module"? Are my presumptions correct, that by "runtime" you mean "cpu/os/node", and by "module" you mean "citgm/npm/mod-under-test"?

@MylesBorins
Copy link
Contributor

MylesBorins commented Dec 8, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 8, 2016

Thank you

@ghost
Copy link
Author

ghost commented Dec 8, 2016

...I am a little confused? wouldn't #144 then also be more appropriate for a complex ci? How is what its doing fundamentally different?

@MylesBorins
Copy link
Contributor

MylesBorins commented Dec 8, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 8, 2016

I see. My original proposition did not include things related to platform specific sand boxes, I only added that from your comment:

since we run citgm on ci on many platforms it would prove difficult to have a reliable solution for all
runtime.

So is my understanding correct that the differing characteristic from what I originally suggested and #144 is merely being able to specify a different version of node to use for testing?

@MylesBorins
Copy link
Contributor

MylesBorins commented Dec 8, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 8, 2016

I can certainly do that, but I wouldn't want to duplicate something fundamentally being taken care of by #144, so I was just trying to understand the distinguishing characteristics.

Would you agree then, from everything you've described, that the only things a PR might offer on top of #144 is:

  • Being able to explicitly specify the version of node to test rather than the one resolved from PATH, and
  • Storing the results in way a little more convenient to then review and compare?

@MylesBorins
Copy link
Contributor

MylesBorins commented Dec 8, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 8, 2016

Context to understand impetus of discussion:

  • Implemented changes to node v7.2.0, represented by node v7.2.0-sjw
  • Was required by @ sam-github to use citgm to show no regressions.
  • Test Environment:
    • Azure west region 32 core 64 bit vm
    • uname: Linux 3.4.0+ x86_64
    • lsb_release: Ubuntu 14.04.5 LTS
  • Ran citgm-all -m | tee citgm.1.v7.2.0-sjw.md with PATH resolving node to v7.2.0-sjw.
  • citgm.1.v7.2.0-sjw.md contained failures that upon review appeared unrelated to changes.
  • Had no choice but to baseline v7.2.0 via running citgm-all -m | tee citgm.1.v7.2.0.md with PATH resolving node to v7.2.0
  • Comparing citgm.1.v7.2.0-sjw.md to citgm.1.v7.2.0.md showed one additional failure in v7.2.0: one test in spdy-transport failed when attempting to access a network port.
  • Ran citgm spdy-transport with PATH resolving to v7.2.0, all tests passed.
  • Re-ran citgm -all -m | tee citgm.2.v7.2.0.md, and one module failed, because download from github failed from network issue (unknown as to which side)
  • Re-ran citgm -all -m | tee citgm.3.v7.2.0.md, and pass/fail results equivalent to citgm.1.v7.2.0-sjw.md

Initial conclusions from above experience:

  • citgm-all does not scale
  • citgm-all itself runs with the version of node in PATH
  • citgm-all generates different output from the same input & is sensitive to environment; i.e. non-deterministic
  • citgm-all OOB downloads the latest versions of test modules from github.com, creating potential for implicit variance in test results
  • Comparing results between two different versions of node for a given module@version is tedious

Thoughts on addressing issues to mitigate non-determinism, speed up locating cause of regressions:

  • Parallelize citgm-all
  • Be explicit with version of node-under-test, rather than resolving from PATH
  • Ensure citgm-all itself does not use version of node-under-test
  • Store and index test results by sha1(node) + sha1(module@version) + timestamp
    • Provides means to quickly compare results between two versions of node for a given module@version
    • Enables running tests only when necessary (when bits change, or explicitly requested)
    • Auto download of latest module@version creates distinguished results
    • Specifics of test-output-encoding and storage engine entirely irrelevant

@MylesBorins
Copy link
Contributor

MylesBorins commented Dec 8, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 8, 2016

Not wanting to waste your time.

Please respect that I described real challenges & problems in using your tool and I'm seeking understanding in how they be best addressed.

This will be my last comment on this issue.

I will repeat here excluding my thoughts on how they might be addressed (ie. I'm NOT stating any type of feature request!!!!), and instead ask of you:

  • how the issues should be dealt with, or
  • request you point me to someone who knows how citgm works, or
  • allow me to repost in a new issue so others may respond:

Conclusions from using citgm:

  • citgm-all generates different output from the same input & is sensitive to environment; i.e. non-deterministic
  • citgm-all does not scale
  • citgm-all itself runs with the version of node in PATH
  • citgm-all OOB downloads the latest versions of test modules from github.com, creating potential for implicit variance in test results
  • Comparing results between two different versions of node for a given module@version is tedious

Conclusions arrived out be these explicit steps taken in using citgm:

  • Implemented changes to node v7.2.0, represented by node v7.2.0-sjw
  • Was required by @ sam-github to use citgm to show no regressions.
  • Test Environment:
    • Azure west region 32 core 64 bit vm
    • uname: Linux 3.4.0+ x86_64
    • lsb_release: Ubuntu 14.04.5 LTS
  • Ran citgm-all -m | tee citgm.1.v7.2.0-sjw.md with PATH resolving node to v7.2.0-sjw.
  • citgm.1.v7.2.0-sjw.md contained failures that upon review appeared unrelated to changes.
  • Had no choice but to baseline v7.2.0 via running citgm-all -m | tee citgm.1.v7.2.0.md with PATH resolving node to v7.2.0
  • Comparing citgm.1.v7.2.0-sjw.md to citgm.1.v7.2.0.md showed one additional failure in v7.2.0: one test in spdy-transport failed when attempting to access a network port.
  • Ran citgm spdy-transport with PATH resolving to v7.2.0, all tests passed.
  • Re-ran citgm -all -m | tee citgm.2.v7.2.0.md, and one module failed, because download from github failed from network issue (unknown as to which side)
  • Re-ran citgm -all -m | tee citgm.3.v7.2.0.md, and pass/fail results equivalent to citgm.1.v7.2.0-sjw.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants