Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Improvements #602

Closed
brittainhard opened this issue Aug 3, 2017 · 9 comments
Closed

Testing Improvements #602

brittainhard opened this issue Aug 3, 2017 · 9 comments

Comments

@brittainhard
Copy link
Contributor

I think it would be good to have a discussion about testing strategies that would go beyond unit tests and mocking functions. One goal for issue #543 was to add a mocked test, much of which we have in the MockCompute object.

I think we should also talk about creating unit tests that run the actual services and test the output. These would only be run locally, since they would require the taxpuf file.

Issue #600 is one we can get done quickly, and the issue in #601 needs to be dealt with.

@martinholmer @MattHJensen @hdoupe @PeterDSteinberg if you have any suggestions I think this is a good place to discuss them.

@hdoupe
Copy link
Collaborator

hdoupe commented Aug 4, 2017

I would like to create a test that feeds a reform (or a series of reforms) through the interface that alters all of the parameters and validates that the inputs are processed as expected. However, the process of inputting over a hundred parameters into taxbrain takes several hours and is painfully tedious.

I've been trying to think of a way to come as close as possible to doing this without spending several hours punching values into the interface. Here are some of my ideas:

  1. If you input values directly into the interface, then this page is saved as something like this: https://www.ospc.org/taxbrain/edit/1000/?start_year=2015. But, if you feed a json file into taxbrain then an input page is not created and saved. So, what if we could automate the loading of a json file into taxbrain, save the parameters, and then have a page similar to the one occurring if we input the parameters manually?

  2. The html source code is not available in the production app, but it is available in the test app. What if we used something like selenium to programmatically input the parameters into the taxbrain interface page?

  3. The pages like the link I listed in (1) are saved in the taxbrain_taxsaveinputs table. What if we wrote a script that takes a reform as an input, reads the parameters, and directly adds a row to taxbrain_taxsaveinputs? Then we would have a link similar to the one referenced in (1). We could then go to that link and run the parameters through the taxbrain model.

The idea is to figure out a way to simulate as closely as possible the act of feeding user inputs through taxbrain and checking the results without actually doing it. Of course, I'm open to other ideas and the three that I listed may not be very good. This is something that we could start thinking about.

The next step would be to create a keywords to dropq dictionary like the one created in the celery_tasks.dropq_task function. This dictionary is the expected output from feeding the reform through the interface. If the actual output and expected output are the exact same (down to the type of the objects), then the dropq results would be identical. This we could test in the same way as my code snippet here.

@brittainhard @PeterDSteinberg @MattHJensen @martinholmer

@martinholmer
Copy link
Contributor

@hdoupe said in TaxBrain issue #602:

I would like to create a test that feeds a reform (or a series of reforms) through the interface that alters all of the parameters and validates that the inputs are processed as expected. However, the process of inputting over a hundred parameters into taxbrain takes several hours and is painfully tedious.

I've been trying to think of a way to come as close as possible to doing this without spending several hours punching values into the interface. Here are some of my ideas:

  1. If you input values directly into the interface, then this page is saved as something like this: https://www.ospc.org/taxbrain/edit/1000/?start_year=2015. But, if you feed a json file into taxbrain then an input page is not created and saved. So, what if we could automate the loading of a json file into taxbrain, save the parameters, and then have a page similar to the one occurring if we input the parameters manually?

  2. The html source code is not available in the production app, but it is available in the test app. What if we used something like selenium to programmatically input the parameters into the taxbrain interface page?

  3. The pages like the link I listed in (1) are saved in the taxbrain_taxsaveinputs table. What if we wrote a script that takes a reform as an input, reads the parameters, and directly adds a row to taxbrain_taxsaveinputs? Then we would have a link similar to the one referenced in (1). We could then go to that link and run the parameters through the taxbrain model.

The idea is to figure out a way to simulate as closely as possible the act of feeding user inputs through taxbrain and checking the results without actually doing it. Of course, I'm open to other ideas and the three that I listed may not be very good. This is something that we could start thinking about.

The next step would be to create a keywords to dropq dictionary like the one created in the celery_tasks.dropq_task function. This dictionary is the expected output from feeding the reform through the interface. If the actual output and expected output are the exact same (down to the type of the objects), then the dropq results would be identical. This we could test in the same way as my code snippet here.

Automated testing of the TaxBrain GUI interface is definitely an important goal. So, thank you for raising this issue for discussion.

I don't know much about the internal working of TaxBrain, but what I do know suggests that for the automated testing to be comprehensive it needs to replicate the TaxBrain user experience using the GUI interface. I don't know enough to know whether your approaches 1 and 3 would accomplish that replication. But clearly your approach 2 does replicate the TaxBrain user experience. So, the rest of my response to your questions focuses on your second approach:

What if we used something like selenium to programmatically input the parameters into the taxbrain interface page?

First, having the HTML source code is not essential for implementing this approach (but having it it might make the implementation easier).

Second, we already have source code that implements your second approach. That source code was used extensively to test the TaxBrain GUI interface during late 2016, especially the reform-delay feature using the * character. It looks as if the last version of Tax-Calculator to include that selenium testing capability was the 0.7.3 release. The test code is located in the taxcalc/taxbrain directory of that release. That location was always meant to be temporary until TaxBrain built up a more comprehensive test suite.

One strategy you could consider is to download the source code for release 0.7.3 and then copy the contents of the taxcalc/taxbrain directory to a new directory in the webapp-public repository that is going to hold your new automated testing code. Then try it out without changing any code. It probably won't work out of the box, but trying to make it work will provide a relatively fast introduction to the pros and cons of your approach 2. This strategy is certainly faster than starting from scratch.

If you have any questions about the old taxcalc/taxbrain code, I'll be happy to respond.

@hdoupe
Copy link
Collaborator

hdoupe commented Aug 4, 2017

@martinholmer said

I don't know much about the internal working of TaxBrain, but what I do know suggests that for the automated testing to be comprehensive it needs to replicate the TaxBrain user experience using the GUI interface. I don't know enough to know whether your approaches 1 and 3 would accomplish that replication.

I had the same thoughts on this.

Second, we already have source code that implements your second approach. That source code was used extensively to test the TaxBrain GUI interface during late 2016, especially the reform-delay feature using the * character. It looks as if the last version of Tax-Calculator to include that selenium testing capability was the 0.7.3 release. The test code is located in the taxcalc/taxbrain directory of that release. That location was always meant to be temporary until TaxBrain built up a more comprehensive test suite.

One strategy you could consider is to download the source code for release 0.7.3 and then copy the contents of the taxcalc/taxbrain directory to a new directory in the webapp-public repository that is going to hold your new automated testing code. Then try it out without changing any code. It probably won't work out of the box, but trying to make it work will provide a relatively fast introduction to the pros and cons of your approach 2. This strategy is certainly faster than starting from scratch.

That's great. Thanks for pointing this out. I'll look into this approach. And, yes, this will be much better than reinventing the wheel.

Thank you for your feedback.

@talumbau
Copy link
Member

talumbau commented Aug 4, 2017

@hdoupe said:

The idea is to figure out a way to simulate as closely as possible the act of feeding user inputs through taxbrain and checking the results without actually doing it. Of course, I'm open to other ideas and the three that I listed may not be very good. This is something that we could start thinking about.

I see where you are going and this is a good idea. There is a "tool" that is missing from your proverbial toolbox that will help you achieve this goal. First I will explain what happens when someone does a TaxBrain submission:

The user goes to /taxbrain and the TaxBrain screen loads. They are given a "session ID" by the site and this is kept in the browser session as long as the tab stays open. The user fills out all desired boxes and hits the submit button. This generates a specific type of message from the user's browser sent to ospc.org using the HTTP protocol. The message is called a "POST". This HTTP POST contains all of the data that the user filled out in the forms, along with the session ID. The website is designed to run specific code when it receives such a POST message. This code does all of the work of storing the user's input as a row in the taxbrain_taxsaveinputs table (a Model "instance" in Django terms), submitting the job to the worker nodes, waiting for the results, etc.

All of this can, of course, be done programmatically via a script instead of typing the data in the boxes displayed in the browser. In other words, one can write Python code (or code in many other languages) that loads the /taxbrain page, but instead of displaying it in a browser, just keeps the data in a data structure. Then, one can construct an HTTP POST message with the desired parameter values (really just key-value pairs where the keys are the parameter names and values are what the user would type in the boxes) and the session ID, and then submit this POST to the site.

You can find out more about how all of this works by searching online for something like "HTTP introduction POST form submission" or something like that.

A good Python package that is useful for these purposes is called requests. You can get it with conda install requests.

@martinholmer
Copy link
Contributor

martinholmer commented Aug 4, 2017

@talumbau explained in response to @hdoupe questions on approaches to TaxBrain testing:

I see where you are going and this is a good idea. There is a "tool" that is missing from your proverbial toolbox that will help you achieve this goal. First I will explain what happens when someone does a TaxBrain submission:

[very useful description]

All of this can, of course, be done programmatically via a script instead of typing the data in the boxes displayed in the browser. In other words, one can write Python code (or code in many other languages) that loads the /taxbrain page, but instead of displaying it in a browser, just keeps the data in a data structure. Then, one can construct an HTTP POST message with the desired parameter values (really just key-value pairs where the keys are the parameter names and values are what the user would type in the boxes) and the session ID, and then submit this POST to the site.

You can find out more about how all of this works by searching online for something like "HTTP introduction POST form submission" or something like that.

A good Python package that is useful for these purposes is called requests. You can get it with conda install requests.

@hdoupe, I think the approach being suggested here by @talumbau is more promising than the selenium approach (because selenium can be touchy when working with a webpage). Just sending a POST form submission to the TaxBrain server is much easier and more reliable. However, it does seem to leave untested the first steps of submitting a job to the TaxBrain server. I don't know enough about what happens between the time a TaxBrain user clicks on the Show Me the Results button and the time the POST request arrives at the TaxBrain server.

@hdoupe
Copy link
Collaborator

hdoupe commented Aug 7, 2017

@talumbau said

The user goes to /taxbrain and the TaxBrain screen loads. They are given a "session ID" by the site and this is kept in the browser session as long as the tab stays open. The user fills out all desired boxes and hits the submit button. This generates a specific type of message from the user's browser sent to ospc.org using the HTTP protocol. The message is called a "POST". This HTTP POST contains all of the data that the user filled out in the forms, along with the session ID. The website is designed to run specific code when it receives such a POST message. This code does all of the work of storing the user's input as a row in the taxbrain_taxsaveinputs table (a Model "instance" in Django terms), submitting the job to the worker nodes, waiting for the results, etc.

Thank you for your response. I'm new to web-development and have been operating under the assumption that some black-box magic occurred between submitting the form and data being received by the functions in taxbrain/views.py. Your response clarifies what is actually going on here.

@martinholmer said

@hdoupe, I think the approach being suggested here by @talumbau is more promising than the selenium approach (because selenium can be touchy when working with a webpage). Just sending a POST form submission to the TaxBrain server is much easier and more reliable.

I agree with this.

However, it does seem to leave untested the first steps of submitting a job to the TaxBrain server. I don't know enough about what happens between the time a TaxBrain user clicks on the Show Me the Results button and the time the POST request arrives at the TaxBrain server.

I also do not know how much we would be leaving untested and I will look into this. @talumbau @brittainhard @PeterDSteinberg we would appreciate your thoughts on this.

@brittainhard
Copy link
Contributor Author

@martinholmer @hdoupe Django is really great in that it can provide a django Client object that can handle get / post requests and database object creation. Documentation is here: https://docs.djangoproject.com/en/1.11/topics/testing/

When you run the test suite it usually creates a test database, which is then destroyed after the tests are done. You also run a mocked Client object that will handle any requests, so that you can test responses in isolation. Here is one such example:

    def test_taxbrain_file_post_reform_and_assumptions(self):
        tc_file = SimpleUploadedFile("test_reform.json", reform_text)
        tc_file2 = SimpleUploadedFile("test_assumptions.json", assumptions_text)
        data = {u'docfile': tc_file,
                u'assumpfile': tc_file2,
                u'has_errors': [u'False'],
                u'start_year': unicode(START_YEAR), 'csrfmiddlewaretoken':'abc123'}

        response = self.client.post('/taxbrain/file/', data)
        # Check that redirect happens
        self.assertEqual(response.status_code, 302)
        # Go to results page
        link_idx = response.url[:-1].rfind('/')
        self.failUnless(response.url[:link_idx+1].endswith("taxbrain/"))

The Django test client is also a sort of black box. In theory this should submit a dropq request, but it doesnt. It returns a mocked response object when your post data does not cause any errors. I haven't investigated this test to see whether it also created the TaxSaveInputs object, but I suspect it didn't.

@hdoupe
Copy link
Collaborator

hdoupe commented Aug 7, 2017

@brittainhard Thanks, I'll look into this. I was able to use the requests library that @talumbau mentioned to submit a form on my local machine, but I haven't figured out how to get the results. I'll open another issue about this so that I don't clutter up this feed while I'm figuring out how all of this POST and GET stuff works.

@martinholmer
Copy link
Contributor

It would seem that issue #602, which asked for help in testing the TaxBrain GUI input logic, has generated much activity by @hdoupe. Thanks for all the work on this, Hank. Several bugs have been identified in issues and unit tests added to document each one of those bugs. Also, an initial version of a systematic testing framework has been developed in pull request #627 and is being reviewed by Continuum Analytics staff.

So, it would seem as if #602 has been very effective at generating a discussion of how best to approach systematic TaxBrain input testing. And I assume that discussion will continue in comments on pull request #627.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants