Refactor commands to actually follow the Command pattern #743

vringar · 2020-09-11T12:55:58Z

The command pattern describes how to encapsulate behaviour and the necessary state for this behaviour in a class, that is opaque to the caller.
My suggestion is as follows:
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/BrowserManager.py#L502-L504
gets replaced by

 command.execute(driver, browser_settings, browser_params, manager_params, extension_socket)

This is enabled by removing the split between declaration of actions and their implementation.
For example the GetCommand turns into:

class GetCommand(BaseCommand):
    def __init__(self, url, sleep):
        self.url = url
        self.sleep = sleep

    def __repr__(self):
        return "GetCommand({},{})".format(self.url, self.sleep)

    def execute(webdriver, browser_settings, browser_params, manager_params, extension_socket: clientsocket):
    """
    goes to <url> using the given <webdriver> instance
    """

    tab_restart_browser(webdriver)

    if extension_socket is not None:
        extension_socket.send(this.visit_id)

    # Execute a get through selenium
    try:
        webdriver.get(this.url)
    except TimeoutException:
        pass

    # Sleep after get returns
    time.sleep(this.sleep)

    # Close modal dialog if exists
    try:
        WebDriverWait(webdriver, .5).until(EC.alert_is_present())
        alert = webdriver.switch_to_alert()
        alert.dismiss()
        time.sleep(1)
    except (TimeoutException, WebDriverException):
        pass

    close_other_windows(webdriver)

    if browser_params['bot_mitigation']:
        bot_mitigation(webdriver)

We could then also shrink the CommandSequence by removing
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/CommandSequence.py#L64-L69
and replacing it with a generalized add_command(self, command).
We'd also add a validate method, to check that we contain a get or browse before every other command.
This method would be called when a CommandSequence gets submitted to the TaskManager, where we would also add the IntitializeCommand and the FinalizeCommand.

This would simplify adding a new command to:

Create Class MyCommand that derives from BaseCommand.
Implement execute() (and maybe __repr__ and eventually name())
Add command_sequence.add_command(MyCommand())

And we could remove the RunCustomFunctionCommand as it would be easier to just tell the user to create a new class instead.

The text was updated successfully, but these errors were encountered:

vringar · 2020-09-11T14:07:24Z

This issue is pretty well documented and I'm willing to mentor anybody that shows interest in working on this.

englehardt · 2020-09-11T21:10:33Z

This sounds great to me. So GetCommand would be fully defined in browser_commands.py? Perhaps we can do some renaming of the Commands directory files, such that a caller could do something like from commands.browser import GetCommand, and similarly for the current profile commands.

Allowing us to replace custom functions with commands that follow the same pattern is also a huge win.

cyruskarsan · 2020-09-13T05:27:50Z

Hi, I am interested in resolving this issue as my first bug. Where/how should I begin?

vringar · 2020-09-13T10:00:05Z

@cyruskarsan Thanks for showing interest in this issue.
This is a pretty major rewrite, so please expect that this will take some time.
Assuming you've never worked with OpenWPM before, I'd suggest first just running demo.py and then having a look at the following to pieces of code.

Everything related to creating and storing Commands happens in the CommandSequence:
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/CommandSequence.py#L64-L69
And the execution of the Commands begins here:
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/BrowserManager.py#L502-L504

Follow the code paths down from the second bit of code to see everything that you'll need to change.

I'd suggest not concerning yourself too much with how the CommandSequence gets from here
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/TaskManager.py#L572
to the BrowserManager because that path is quite convoluted and doesn't need to be changed as part of this issue.

cyruskarsan · 2020-09-13T17:04:43Z

@vringar So I took a look at demo.py and got a high level understanding of what you guys are doing. From a high level, it appears that you have modified firefox to log a customized set of variables in an effort to search for leaked information/bad practices and run multiple browsers in parallel. I ran demo.py locally using python3 on Ubuntu 20.04. Do I need to run it using docker or is docker for people not using Ubuntu?

To view the sqlite data, is there a certain software you use? Should I be worried about that?

From my understanding, you would like make it easier for researchers to create new commands and you have asked to replicate the functionality of get in command_sequence.py in browser_manager.py by creating a getCommand class?

To clarify, this code you wrote could be implemented to replace the get command, correct?

class GetCommand(BaseCommand):
    def __init__(self, url, sleep):
        self.url = url
        self.sleep = sleep

    def __repr__(self):
        return "GetCommand({},{})".format(self.url, self.sleep)

    def execute(webdriver, browser_settings, browser_params, manager_params, extension_socket: clientsocket):
    """
    goes to <url> using the given <webdriver> instance
    """

    tab_restart_browser(webdriver)

    if extension_socket is not None:
        extension_socket.send(this.visit_id)

    # Execute a get through selenium
    try:
        webdriver.get(this.url)
    except TimeoutException:
        pass

    # Sleep after get returns
    time.sleep(this.sleep)

    # Close modal dialog if exists
    try:
        WebDriverWait(webdriver, .5).until(EC.alert_is_present())
        alert = webdriver.switch_to_alert()
        alert.dismiss()
        time.sleep(1)
    except (TimeoutException, WebDriverException):
        pass

    close_other_windows(webdriver)

    if browser_params['bot_mitigation']:
        bot_mitigation(webdriver)

vringar · 2020-09-14T12:16:46Z

@cyruskarsan Conceptually correct. The Open Web Privacy Measurements project aims to enable researchers to collect data on website behaviour, to enable privacy researchers to focus on their research and not waste time with building yet another crawler. It does so by installing an extension Firefox that can be configured to collect a variety of data points and stream them back to the platform (aka the python code). The platform also allows you to run multiple FF instances in parallel and supports different storage backends.

If you can run the code natively then I'd encourage you to do so. We mostly use Docker for cloud crawls, where we deploy on a GCP Kubernetes cluster.

You can use any SQL client you like for interacting with the Database. I use DBeaver, but mostly out of habit. I don't see a reason to be concerned about a database. (I'm sorry if I'm misunderstanding your question here)

My intention was to consolidate all the separate pieces of implementations (the type, the construction point and the behaviour) into a single place and not have them be spread out through the entire code base, but the fact that this eases the development of new commands for researchers is a very welcome benefit.

To illustrate my point on how disjointed our current architecture is, I wrote up all the places that are relevant to a singular command.

The functionality of the GetCommand (which already exists as a type) is currently implemented here:
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/Commands/browser_commands.py#L108-L140

The current flow of commands is something like this:

The GetCommand gets put into a CommandSequence by call .get() on it.
The CommandSequence gets submitted to the TaskManager through execute_command_sequence which then eventually iterates over the CommandSequence here:
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/TaskManager.py#L435-L443 where the commands get put into the BrowserManager.command_queue
The command_queue gets read in another process by the BrowserManager
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/BrowserManager.py#L502-L504
where we then reach the command_executor where the GetCommand object is deconstructed and it's parts passed a the paramters of the get_website function
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/Commands/command_executor.py#L15-L19

So to add a single parameter to a GetCommand you'd have to touch four files:

CommandSequence.py
command_executor.py
Types.py
browser_commands.py

And after my proposed changes you should one have to modify browser_commands.py whereas Types.py and command_executor.py shouldn't exist anymore.

vringar · 2020-09-14T12:19:12Z

Would you be more comfortable with me creating the required changes to BrowserManager.py and CommandSequence.py and make the changes to GetCommand and you'd then replicate the changes made to all other commands?

cyruskarsan · 2020-09-14T18:06:26Z

@vringar Thanks for your succinct explanation. I think I understand the issue now. In its current implementation, the execution of tasks (such as the GetCommand) is disjoint and fragmented across different files. My task is to consolidate the implementation of these commands into the browser_commands.py file.

To begin, I will take you up on your offer to do the first changes to GetCommand.

With regard to my SQL question, I was just asking to determine if the program was actually running correctly locally or if I was just seeing terminal output. I will look into DBBeaver. Thank you

vringar · 2020-09-17T09:07:04Z

Hey @cyruskarsan,
I've created the branch CommandRefactoring, feel free to create multiple PRs against this branch. I'll merge anything that passes our current tests.
I haven't changed the CommandSequence yet, but I might get on that eventually.

vringar · 2020-09-30T15:28:33Z

@cyruskarsan I've finally brought the branch into a state where all tests are passing.
Does this example help you or do you need any other assistance?

cyruskarsan · 2020-10-04T18:31:54Z

@vringar I made an attempt at refactoring save_screenshot but when I attempt to push to the branch, git gives me this error:
remote: Permission to mozilla/OpenWPM.git denied to cyruskarsan. fatal: unable to access 'https://github.com/mozilla/OpenWPM.git/': The requested URL returned error: 403

How can I push my changes to the branch?

vringar · 2020-10-04T19:42:20Z

@cyruskarsan You don't have push access to this repository, so you'll have to create a Pull Request (PR) instead.
What you'll need to do instead is:

Fork this repository to have a private copy of it on GitHub. You can do this by using the fork button in the top right corner of the screen.
Add this fork as a remote to your repository. For that you'll need the URL from your repository, so after you cloned it go to the green download code button it your repository and copy the link.
Then add it as a remote by typing
git remote add private <URL goes here>
Now push to your private fork by writing
git push private CommandRefactoring
You should now see your code in your private copy in the branch CommandRefactoring
You can now create a PR from your branch CommandRefactoring to the Mozilla branch CommandRefactoring and then I can accept and merge your changes.

Once you have made further changes you'll only need to repeat the steps 3-5.

cyruskarsan · 2020-10-16T04:06:23Z

@vringar Does stitch_screenshot_parts(visit_id, browser_id, manager_params) need to be refactored?
I don't see it in command_executor.py nor types.py

vringar · 2020-10-16T08:54:59Z

This method seems to only be called from screenshot_full_page so I'd suggest making it a method on the ScreenshotFullPageCommand.

vringar · 2020-10-29T19:59:31Z

Copying @cyruskarsan's question from #770 here so it doesn't get lost:

Are DumpProfCommand , RunCustomFunctionCommand, and ShutdownCommand the only commands left to refactor? If so, these didn't appear in browser_commands.py so would I just add the execute function to each of these classes in browser_commands.py?

DumpProfCommand exists here and can be turned into a command there
We want to get rid of RunCustomFunctionCommand and instead create an append_command method on CommandSequenceso that users can more easily write Commands on their own
ShutdownCommand currently doesn't have a function beyond being a signal here, so maybe we should rename it to ShutdownSignal and move it into the BrowserManager.py

vringar · 2020-10-29T20:16:42Z

@cyruskarsan I've just added append_command myself. So consider running git pull origin CommandRefactoring before continuing your work.
Once you have removed RunCustomFunctionCommand some tests will need updating. Do you want to do that or do you want leave that to me?

cyruskarsan · 2020-10-31T16:13:00Z

@vringar I've never created pytests before but I would be interested in learning.

I removed the RunCustomFunctionCommand from the code and noticed that append_command is not in Command_Sequence.py. Does it need to be there like the other commands or no?

vringar · 2020-10-31T16:23:09Z

@cyruskarsan Did you run git pull origin CommandRefactoring?
Then you should hopefully see append_command

cyruskarsan · 2020-10-31T17:54:33Z

@vringar You are right, it was there. I just didn't see it :)

vringar added discussion enhancement Not a bug or a feature request good-first-bug Bugs that are good for a first-time committer to tackle labels Sep 11, 2020

vringar mentioned this issue Sep 17, 2020

Command refactoring #750

Merged

vringar removed good-first-bug Bugs that are good for a first-time committer to tackle discussion labels Sep 28, 2020

vringar assigned cyruskarsan Sep 28, 2020

Metropass mentioned this issue Sep 30, 2020

Fixes #233, removed built-in extensions #754

Merged

vringar closed this as completed in #750 Jan 9, 2021

vringar mentioned this issue Dec 16, 2021

Firefox installation fails for older versions #964

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor commands to actually follow the Command pattern #743

Refactor commands to actually follow the Command pattern #743

vringar commented Sep 11, 2020 •

edited

Loading

vringar commented Sep 11, 2020 •

edited

Loading

englehardt commented Sep 11, 2020

cyruskarsan commented Sep 13, 2020 •

edited

Loading

vringar commented Sep 13, 2020

cyruskarsan commented Sep 13, 2020

vringar commented Sep 14, 2020 •

edited

Loading

vringar commented Sep 14, 2020

cyruskarsan commented Sep 14, 2020

vringar commented Sep 17, 2020

vringar commented Sep 30, 2020

cyruskarsan commented Oct 4, 2020

vringar commented Oct 4, 2020 •

edited

Loading

cyruskarsan commented Oct 16, 2020

vringar commented Oct 16, 2020

vringar commented Oct 29, 2020 •

edited

Loading

vringar commented Oct 29, 2020

cyruskarsan commented Oct 31, 2020

vringar commented Oct 31, 2020

cyruskarsan commented Oct 31, 2020

Refactor commands to actually follow the Command pattern #743

Refactor commands to actually follow the Command pattern #743

Comments

vringar commented Sep 11, 2020 • edited Loading

vringar commented Sep 11, 2020 • edited Loading

englehardt commented Sep 11, 2020

cyruskarsan commented Sep 13, 2020 • edited Loading

vringar commented Sep 13, 2020

cyruskarsan commented Sep 13, 2020

vringar commented Sep 14, 2020 • edited Loading

vringar commented Sep 14, 2020

cyruskarsan commented Sep 14, 2020

vringar commented Sep 17, 2020

vringar commented Sep 30, 2020

cyruskarsan commented Oct 4, 2020

vringar commented Oct 4, 2020 • edited Loading

cyruskarsan commented Oct 16, 2020

vringar commented Oct 16, 2020

vringar commented Oct 29, 2020 • edited Loading

vringar commented Oct 29, 2020

cyruskarsan commented Oct 31, 2020

vringar commented Oct 31, 2020

cyruskarsan commented Oct 31, 2020

vringar commented Sep 11, 2020 •

edited

Loading

vringar commented Sep 11, 2020 •

edited

Loading

cyruskarsan commented Sep 13, 2020 •

edited

Loading

vringar commented Sep 14, 2020 •

edited

Loading

vringar commented Oct 4, 2020 •

edited

Loading

vringar commented Oct 29, 2020 •

edited

Loading