-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate API key to inform dev of errors caused by server reload #392
Comments
Server restartDetecting that there isn't a problem with the server:
SetupCreate a project: https://docassemble.org/docs/api.html#playground_post_project Install code: https://docassemble.org/docs/api.html#playground_install (do allow default restart). Doesn't currently seem to allow pulling from GitHub, so we'll have to think about that. Delete a project: https://docassemble.org/docs/api.html#playground_delete_project Making an API key instructions: https://docassemble.org/docs/api.html#manage_api |
Create a projectGenerated by insomniaconst http = require("https");
const options = {
"method": "POST",
"hostname": "apps-dev.suffolklitlab.org",
"port": null,
"path": "/api/playground/project?key=myKey&=",
"headers": {
"cookie": "session=aSessionID",
"Content-Type": "multipart/form-data; boundary=---011000010111000001101001",
"Content-Length": "0"
}
};
const req = http.request(options, function (res) {
const chunks = [];
res.on("data", function (chunk) { chunks.push(chunk); });
res.on("end", function () {
const body = Buffer.concat(chunks);
console.log(body.toString());
});
});
req.write("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"project\"\r\n\r\nAPIProject7\r\n-----011000010111000001101001--\r\n");
req.end(); My untested versionMaybe needs const http = require("https");
const options = {
"method": "POST",
"hostname": "apps-dev.suffolklitlab.org",
"path": "/api/playground/project?key=myKey&=projectName",
};
const req = http.request(options, function (res) {
const chunks = [];
res.on("data", function (chunk) { chunks.push(chunk); });
res.on("end", function () {
const body = Buffer.concat(chunks);
console.log(body.toString());
});
});
req.end(); Delete a projectInsomnia didn't work for some reason. The curl it generated looked weird. Why does it want to keep making cookies? My assumptionThe same except with Detect server statusIt doesn't actually detect server status. It detects the package update status. And I'm not even sure if that means a project package update. It might be enough that it detects whether the server responds at all, but that ping requires a package update code and I'm not sure how to get that code if it's not a project package update. |
You can get the user id this way, so we won't need that info either: https://docassemble.org/docs/api.html#user_retrieve
Get the returned JSON's |
The API is now available! https://docassemble.org/docs/changelog.html
We can start working on v4 API stuff. |
Actually, we can't do this until our server updates to v1.3.9, which really shouldn't happen till the work week starts. |
I already said this :P |
No longer blocked, now working on integration. There's been another adjustment where the requester (us) no longer needs to guess at the right address to send depending on the repo being private or not or GitHub being integrated for the da account or not. I'll start moving forward where I can, but further testing will be needed when we implement that. |
Pull a repoNote: as of 1.3.10, url format shouldn't matter. It can be ssh or non-ssh. https://docassemble.org/docs/api.html#playground_pull public/owner/no ssh/no integrationcurl --insecure --request POST --url https://apps-dev.suffolklitlab.org/api/playground_pull -d key=someKey -d project=APIPull1 -d github_url=https://github.com/plocket/docassemble-ALAutomatedTestingTests -d branch=157_feedback Right result: { "task_id": "cynAqWnbeJedvwckhJSewESK" }, repo and branch correctly pulled public/owner/try ssh, but have no integrationcurl --insecure --request POST --url https://apps-dev.suffolklitlab.org/api/playground_pull -d key=someKey -d project=APIPull1 -d github_url=git@github.com:plocket/docassemble-ALAutomatedTestingTests.git -d branch=main Right result: "Pull process encountered an error: error running git clone. Cloning into 'docassemble-ALAutomatedTestingTests'...\nHost key verification failed.\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n" public/owner/has integration/da email DOES match github email?? public/owner/integrated/da email doesn't match github emailcurl --insecure --request POST --url https://apps-dev.suffolklitlab.org/api/playground_pull -d key=someKey -d project=APIPull1 -d github_url=git@github.com:plocket/docassemble-ALAutomatedTestingTests.git -d branch=main Right result: { "task_id": "WlbCYHcSmYNUMzSdLOmtnIIR" } and code was pulled in private/owner/non-ssh url/no integration??
|
HTTP librariesEdit: I don't want to spend a ton of time searching, so I'll start with axios, but if others have thoughts, we can probably switch without too much pain. request: node-fetch: axios: Here's a list with no reviews whatsoever from Aug 2019: request/request#3143 |
All the examples in axios use qs (https://www.npmjs.com/package/qs) for their |
I'd appreciate a hand with a current API handling behavior dilemma. I've got three functions: The first two use There are lots of things that call
Which of those seem most compelling? Are there any other reasons to lean one way vs. another? § For reference, see a sample of the code [This question might become more relevant as we use the API to test the interview itself and detect the difference between page load issues and server response issues. Should all the api interface functions just rely on their own timeouts? Is there another reason for them to timeout other than a busy server?] |
This doesn't make it clear what the difference between these two is. Both are interfaces (an already overloaded word IMO). It looks like dai just does The core question here that isn't explained is why |
I can understand that. Do you have thoughts on how I can make that more clear? "docassemble api" is trying to stick as close to axios as possible. It's not doing anything other than making the requests. "docassemble api interface" is supposed to manage various request operations in ways that help get more complex things done, like waiting for a busy server, creating unique Project names, etc. Will post another response to the other question in a bit. |
This is making me think that maybe there shouldn't be an
I used getting the id because it doesn't cause a restart and there isn't a specific api endpoint for just that. Shall we add an endpoint? The API docs have a section on polling the server that only talks about doing it for tasks that restart the server and therefore have |
[Separate from the discussion above:]
[Answer, potentially: To me, the answer based on the below discussion below is that we can do without polling and just use a timeout. If a user finds they have trouble with it, though, I hope we can find this discussion again...] [Prior discussion:] The docassemble API recommends that:
The following conversation aimed for clarifying polling suggestion:
A:
Q:
A:
|
Another separate question: Could we modify the api so we can start an interview and get its url so we can then go to it: https://docassemble.org/docs/api.html#session_new. The url is currently not returned. That way we won't need to get the developer id. We'll also then have access to the interview through the API. I'm not sure how that could be helpful currently, but there may come a time... |
functions instead of going through the server check. See #392 (comment)
Why do we want to poll the server during tests?This is to differentiate between a long page load and a busy server. I'll explain the complications later. First, it'll only be partially effective, and there's not a way to avoid that. The server could become unresponsive just after we test it or become responsive just after we test it. We can test before trying to load a page, during the loading of the page, and after the page times out on loading, but all of them will have this problem. That said, we can take a shot at making these false failures more rare. Do we want to? How about just testing a page load multiple times? That can result in more false failures and longer test times. False failure scenario: The developer sets a custom timeout for 1 second per step. They continue to the next page. Our code gives the page 3 tries, but the server is going to take 5 seconds to finish restarting. The server needed five more seconds, though. The test will fail because the server didn't have enough time to restart. Tests can take longer: I know some people don't find this to be a problem, but I definitely would find it frustrating. Scenario 1: Supposing the developer knows some big files will take a long time to load. They give the test 4 min. to load a page. The server is busy, but finishes 5 seconds after when we start timing the page load. The page fails to load in those first 4 min because it got stuck (which I've experienced before). That's 8 min. If we had been polling the server, it would have taken 4 min. and 5 seconds. Scenario 2: If they set the 4 min. timeout for the whole scenario, not just one step, then this situation could happen: The page would have taken 1 sec to load. The server is busy, but finishes 5 seconds after the page tries to load. The page doesn't load in that first 4 min. window. It then tries again and loads in 1 second. That's 4 min and 1 second instead of 6 seconds. |
[I'm rethinking this. We could probably make a page to let the repeat user give us all the fields on one page.] |
Random event. Unintentional server restart muddled things up a bit. |
There's a lot here, both as notes in the issue and as changes in the files. Some of it is just changing a folder name, but this might take a real-time discussion depending on how much folks know about these API requests. This addresses #392. Specifically the setup and takedown steps. The next part, helping manage the tests themselves, will probably be a lot smaller. I hope I haven't missed anything important, but this is a big move. I can try to redo this in a more step-by-step fashion, making a PR for each step. I have some ideas how, so do let me know if that would help. I just needed to see how it would all fit together. This also addresses a couple of other issues, and I'll try to track them all down and add them here. (Addresses #438, session_vars). (Addresses #432, better Project name, for v4, I think - title of docassemble Project. See [this line of code](https://github.com/SuffolkLITLab/ALKiln/blob/5782821f6de94335b6cf715e91986cccce4639a1/lib/utils/session_vars.js#L97).) It made sense considering the other changes I was making. I'm very unsure about how I'm handling `session_vars` (which are meant to eventually replace `env_vars`). ~It's a lot of getting things for stuff that's basically static, but there are some in there that need to be gotten, and I thought it would be more consistent.~ See comment below about at least part of why `session_vars` is all `.get...()`. Maybe it needs to be broken up further somehow, though. Open to suggestions. In the future, `da_i.get_dev_id()` will be needed to get the interview address. Right now its use is a bit questionable. That discussion has started in #392. `da_i.throw_an_error_if_server_is_not_responding()` is still more problematic, but it's not currently being used by anything. It was before and it will be again, but it'll have time to get worked out later I think. ------- * Start developing API calls - Add user_vars file to eventyally remove env_vars file - Other than pull and checking task status, calls appear working - Install axios and qs * Fix api path for server status update, increase pull default timeout, add some questions * Change user_vars to session * Finish changing user_vars to session * Implement Project creation through API * Add pulling through the API * Implements wait_for_server_to_restart() Adds: - log.js for logging ALKiln-specific messages - time.js to wait for a timeout A lot of re-arranging. Still not sure how to organize logs vs. throws, etc. * Implemented delete project * Delete puppeteer-utils.js, remove spurious `while` for creating a project (it should probably be handled in project creation) and add some comments. * Clean up a bit * Add developer API key to workflow * Replace env_var with session in env_vars.js where possible * Switch `session` to `session_vars` because session has too much baggage * Move request complexity into its own file * Get more info about interview url (for errors) * Restore scope.js, error resolved * Comment out unused funcs, move creation loop into da interface and simplify it a big. Add various comments. * Add pull attempts to run again if the server is busy * Implement pull retries if server is busy * Move wait for pull to complete into da interface. Also - Rename some functions - Trust the API allowing simplification of if statements - Adjusted log messages * Rearrange locations of files * Add test for correct API key, adjusted logs * Fix behavior for an invalid API key * Trust server check timeout instead of loop in most cases, ensure dai.delete() too waits for server to be free. Logs and comments. * Adjust names, move long timeouts directly into the relevant functions instead of going through the server check. See #392 (comment) * Add and correct comments/descriptions * Add API key to action.yml * Fix invalid project name allowed * Remove dev console log * Update messages, improve log.js, improve some names, as noted and also with some of my own changes. Also, had to move dev __delete function into REST to handle deleting with axios because I apparently changed the actual `delete` functions to use only the name of the current project. No longer able to hand one in. * Concat log string instead of using loop Co-authored-by: rpigneri-vol <78454056+rpigneri-vol@users.noreply.github.com> * Make SERVER_URL safe and other review fixes Co-authored-by: rpigneri-vol <78454056+rpigneri-vol@users.noreply.github.com>
Close #40 * add conditional for name with more than 4 parts * create test for more that 4 part name warning * fix typo in tests * fix repeated code * fix spacing and typo errors * fix grammar in error * fix grammar in testing * log changes in CHANGELOG * Address #392, use API to help with server functionality (#442) There's a lot here, both as notes in the issue and as changes in the files. Some of it is just changing a folder name, but this might take a real-time discussion depending on how much folks know about these API requests. This addresses #392. Specifically the setup and takedown steps. The next part, helping manage the tests themselves, will probably be a lot smaller. I hope I haven't missed anything important, but this is a big move. I can try to redo this in a more step-by-step fashion, making a PR for each step. I have some ideas how, so do let me know if that would help. I just needed to see how it would all fit together. This also addresses a couple of other issues, and I'll try to track them all down and add them here. (Addresses #438, session_vars). (Addresses #432, better Project name, for v4, I think - title of docassemble Project. See [this line of code](https://github.com/SuffolkLITLab/ALKiln/blob/5782821f6de94335b6cf715e91986cccce4639a1/lib/utils/session_vars.js#L97).) It made sense considering the other changes I was making. I'm very unsure about how I'm handling `session_vars` (which are meant to eventually replace `env_vars`). ~It's a lot of getting things for stuff that's basically static, but there are some in there that need to be gotten, and I thought it would be more consistent.~ See comment below about at least part of why `session_vars` is all `.get...()`. Maybe it needs to be broken up further somehow, though. Open to suggestions. In the future, `da_i.get_dev_id()` will be needed to get the interview address. Right now its use is a bit questionable. That discussion has started in #392. `da_i.throw_an_error_if_server_is_not_responding()` is still more problematic, but it's not currently being used by anything. It was before and it will be again, but it'll have time to get worked out later I think. ------- * Start developing API calls - Add user_vars file to eventyally remove env_vars file - Other than pull and checking task status, calls appear working - Install axios and qs * Fix api path for server status update, increase pull default timeout, add some questions * Change user_vars to session * Finish changing user_vars to session * Implement Project creation through API * Add pulling through the API * Implements wait_for_server_to_restart() Adds: - log.js for logging ALKiln-specific messages - time.js to wait for a timeout A lot of re-arranging. Still not sure how to organize logs vs. throws, etc. * Implemented delete project * Delete puppeteer-utils.js, remove spurious `while` for creating a project (it should probably be handled in project creation) and add some comments. * Clean up a bit * Add developer API key to workflow * Replace env_var with session in env_vars.js where possible * Switch `session` to `session_vars` because session has too much baggage * Move request complexity into its own file * Get more info about interview url (for errors) * Restore scope.js, error resolved * Comment out unused funcs, move creation loop into da interface and simplify it a big. Add various comments. * Add pull attempts to run again if the server is busy * Implement pull retries if server is busy * Move wait for pull to complete into da interface. Also - Rename some functions - Trust the API allowing simplification of if statements - Adjusted log messages * Rearrange locations of files * Add test for correct API key, adjusted logs * Fix behavior for an invalid API key * Trust server check timeout instead of loop in most cases, ensure dai.delete() too waits for server to be free. Logs and comments. * Adjust names, move long timeouts directly into the relevant functions instead of going through the server check. See #392 (comment) * Add and correct comments/descriptions * Add API key to action.yml * Fix invalid project name allowed * Remove dev console log * Update messages, improve log.js, improve some names, as noted and also with some of my own changes. Also, had to move dev __delete function into REST to handle deleting with axios because I apparently changed the actual `delete` functions to use only the name of the current project. No longer able to hand one in. * Concat log string instead of using loop Co-authored-by: rpigneri-vol <78454056+rpigneri-vol@users.noreply.github.com> * Make SERVER_URL safe and other review fixes Co-authored-by: rpigneri-vol <78454056+rpigneri-vol@users.noreply.github.com> Co-authored-by: plocket <52798256+plocket@users.noreply.github.com> Co-authored-by: rpigneri-vol <78454056+rpigneri-vol@users.noreply.github.com>
Addresses #392, closes #438. [As part of this change, env_vars is no longer needed, so it's deleted, and checks for `session_vars` props needed to be moved around as this gets the ID in a different way.] * Use api to get Playground ID instead of ID env var. #392. * Update tests and fix code for session var validation * Add clarifying comment about env.BRANCH_NAME Co-authored-by: Bryce Willey <Bryce.Steven.Willey@gmail.com> * Remove outdated comment * Update lib/utils/session_vars.js Co-authored-by: Bryce Willey <Bryce.Steven.Willey@gmail.com>
For errors caused by another package reloading the server and thus causing test failure: What if we do some constant async polling of the server and keep that status in |
Consider removing this as an item for milestone "v4". I now think the goal of this issue doesn't match the milestone. The real breaking change for users was converting to using the API. This particular server reload issue can be icing on the cake that might be more possible now that we can use the API. |
Webhook idea moved to discussion #523 |
Our decision from #523:
|
Summary:
This is from brainstorms in /* ===== Brainstorms =====
Store the promise in state and then await that promise in here
Who should retry? If this is a story table, retrying shouldn't happen in here
Otherwise, this in here should retry.
Except! If we reload the page when the dev is using individual steps to set
values, the page's values will be blank. There's no way to look back over the
previous steps on this page.
Would it be crazy to just restart the test?
Depends on the test, but probably not
Is it even possible to restart the test?
I don't know at all [later: yes, it is, with --retry n]
Raising a re-tryable error and then detect that in cucumber to somehow
trigger restarting the same scenario?
How about just failing the test and telling them why and letting them rerun manually?
That can be a useful temporary improvement, but I'd rather not teach devs to
ignore test failures. It's better than opaque failures, though.
TODO: Also have to do this when first loading an interview or going to a
url with a link check, though the latter isn't yet implemented, so we can
skip that for now.
MVP possibly: Add the server reload error message to the report, but also
add --retry 1 to the node script _and_ wait for the server to reload.
Sadly, --retry 1 will rerun actually failing tests too.
https://github.com/cucumber/cucumber-js/blob/main/docs/retry.md
==========================
*/ |
So. The result we get when the test fails because of a reload [and then passes on the second try] is that the report gets both a failed scenario and a passed scenario. In addition, the test passes, but the cucumber error still shows up. I see this as a problem because it can be confusing. Suppose two tests fail, one from the server reloading and one fails legitimately. A person might look at the report and think that both tests are failing legitimately and try to solve both problems. I do add a report message to the reload failure Scenario, so maybe that's enough? I'm not sure. |
…efully (#570) * Start tracking server status * Creates untested way to wait for server to reload * Confirm server reload check is triggering and working by implementing roughly in story table step. A bit of a mess as I need to keep confirming its working state as I go forward. * Broken. Add waiting for server response in some places. See more... Currently, the story table page id loop rejection doesn't error. Waiting for server response also seems to not error on rejection. Need to throw errors instead? This is probably why the try/catch block wasn't working for triggering the check (earlier on) Also, the puppeteer will timeout before the page id story table loop times out, leaving the process unexited until the server waiting and page id loop finish. How do we exit those immediately when necessary? Or will it not be necessary when everything else is lined up? * Add throw in a couple places, make functions async some places * Abstract throwing server reload error, add brainstorms * Solve multiple promises being created, prep for custom value * Clean up and add to comments * Solve browser closing too early * Handle reload error for initial page load, abstract throwing server reload errors. Initial page load handling is untested because that's hard to do locally. We may have to figure out some other way to test. Remote testing for this probably won't work either as the timing would be impossible - we'd have to start running setup on both tests and hope that the server timeout didn't interfere with the other setup, just with the page load once the tests had started. Maybe make two local repos and set one up and then time the test and new setup so they overlap? I'm not sure. * Clarify message about server reloading * Change server tracker from recursive to `while` loop * Discover and mend the immediate reload oversight which was the "continue" Step using `wrapPromiseWithTimeout()` to ensure preventing an infinite loop. It was causing a timeout too early independently from the reload timeout. The problem now is where else needs to deal with it, where to put the reoload complexity, and how to make it clear where that's happening since it's so far removed from a lot of the action. Also, is it only a problem when elements are tapped, or other times as well? For example, when downloading documents. Maybe that counts as tapping an element. * Simplify timeout catching * Fix server sign in not erroring properly on server reload * Handle tapping element when reloading. I think... Async is always hard to test. Basically, cleaned up where the page was being closed and where the browser was being created. Not sure if `sign_in` needs to create a browser now, or a new page, etc, but I think it's enough that the cleanup is handled consistently. * Clean up old page closing code * Pass tests for tapping var button and sign out at end of step * Test sign out, continuing two ways, reload on both attempts, and reload completes during first test. * Comment out local-only tests * Remove unnecessary test * Simplify (I hope) creation of clickWait * Clean up some cosmetics found in review * Fix wrong var name used Co-authored-by: Bryce Willey <Bryce.Steven.Willey@gmail.com> * Ensure full time is waited between server reload checks Co-authored-by: Bryce Willey <Bryce.Steven.Willey@gmail.com>
Closed by #570 |
Problem: When someone on the server edits a module, pulls in a repo with a module, or does various other things, the server restarts. That takes a while and causes test timeout errors - false failures [for other tests - they can timeout]. A test running, in fact, can cause a server reload timeout error.
Solution: There is a way to ping the API to see if the server is reloading/busy. Use that to detect when retrying is more appropriate than failing. This needs to be done wherever the test is going to have to wait for a page to load.
This will bump us to v4. We could take extra care to avoid breaking v3, but the complication introduced by that doesn't seem worth it.
We expect this change to take a while because of all the associated tasks and limitations. For example, currently, we don't know how to use the API at all, so this is going to involve that learning as well.
Things to implement before this will be ready for release:
Things to do after implementation:
I think, though, that folks using this framework all use organizations (it's probably just us) and we will only need to adjust the organization secrets to make this work, so we can do it all in one go. [The packages that have repo secrets that override the GitHub secrets may have a problem. There are some of those for suffolk that are using the dev server instead of the test server. That said, when we update their script versions, admins can also check their secrets.]
See #389 (comment), "Option 2".
The text was updated successfully, but these errors were encountered: