-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor commands to actually follow the Command pattern #743
Comments
This issue is pretty well documented and I'm willing to mentor anybody that shows interest in working on this. |
This sounds great to me. So Allowing us to replace custom functions with commands that follow the same pattern is also a huge win. |
Hi, I am interested in resolving this issue as my first bug. Where/how should I begin? |
@cyruskarsan Thanks for showing interest in this issue.
Follow the code paths down from the second bit of code to see everything that you'll need to change. I'd suggest not concerning yourself too much with how the CommandSequence gets from here |
@vringar So I took a look at To view the sqlite data, is there a certain software you use? Should I be worried about that? From my understanding, you would like make it easier for researchers to create new commands and you have asked to replicate the functionality of To clarify, this code you wrote could be implemented to replace the class GetCommand(BaseCommand):
def __init__(self, url, sleep):
self.url = url
self.sleep = sleep
def __repr__(self):
return "GetCommand({},{})".format(self.url, self.sleep)
def execute(webdriver, browser_settings, browser_params, manager_params, extension_socket: clientsocket):
"""
goes to <url> using the given <webdriver> instance
"""
tab_restart_browser(webdriver)
if extension_socket is not None:
extension_socket.send(this.visit_id)
# Execute a get through selenium
try:
webdriver.get(this.url)
except TimeoutException:
pass
# Sleep after get returns
time.sleep(this.sleep)
# Close modal dialog if exists
try:
WebDriverWait(webdriver, .5).until(EC.alert_is_present())
alert = webdriver.switch_to_alert()
alert.dismiss()
time.sleep(1)
except (TimeoutException, WebDriverException):
pass
close_other_windows(webdriver)
if browser_params['bot_mitigation']:
bot_mitigation(webdriver) |
@cyruskarsan Conceptually correct. The Open Web Privacy Measurements project aims to enable researchers to collect data on website behaviour, to enable privacy researchers to focus on their research and not waste time with building yet another crawler. It does so by installing an extension Firefox that can be configured to collect a variety of data points and stream them back to the platform (aka the python code). The platform also allows you to run multiple FF instances in parallel and supports different storage backends. If you can run the code natively then I'd encourage you to do so. We mostly use Docker for cloud crawls, where we deploy on a GCP Kubernetes cluster. You can use any SQL client you like for interacting with the Database. I use DBeaver, but mostly out of habit. I don't see a reason to be concerned about a database. (I'm sorry if I'm misunderstanding your question here) My intention was to consolidate all the separate pieces of implementations (the type, the construction point and the behaviour) into a single place and not have them be spread out through the entire code base, but the fact that this eases the development of new commands for researchers is a very welcome benefit. To illustrate my point on how disjointed our current architecture is, I wrote up all the places that are relevant to a singular command. The functionality of the The current flow of commands is something like this:
So to add a single parameter to a
And after my proposed changes you should one have to modify |
Would you be more comfortable with me creating the required changes to BrowserManager.py and CommandSequence.py and make the changes to GetCommand and you'd then replicate the changes made to all other commands? |
@vringar Thanks for your succinct explanation. I think I understand the issue now. In its current implementation, the execution of tasks (such as the GetCommand) is disjoint and fragmented across different files. My task is to consolidate the implementation of these commands into the To begin, I will take you up on your offer to do the first changes to GetCommand. With regard to my SQL question, I was just asking to determine if the program was actually running correctly locally or if I was just seeing terminal output. I will look into DBBeaver. Thank you |
Hey @cyruskarsan, |
@cyruskarsan I've finally brought the branch into a state where all tests are passing. |
@vringar I made an attempt at refactoring How can I push my changes to the branch? |
@cyruskarsan You don't have push access to this repository, so you'll have to create a Pull Request (PR) instead.
Once you have made further changes you'll only need to repeat the steps 3-5. |
@vringar Does |
This method seems to only be called from |
Copying @cyruskarsan's question from #770 here so it doesn't get lost:
|
@cyruskarsan I've just added |
@vringar I've never created pytests before but I would be interested in learning. I removed the |
@cyruskarsan Did you run |
@vringar You are right, it was there. I just didn't see it :) |
The command pattern describes how to encapsulate behaviour and the necessary state for this behaviour in a class, that is opaque to the caller.
My suggestion is as follows:
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/BrowserManager.py#L502-L504
gets replaced by
This is enabled by removing the split between declaration of actions and their implementation.
For example the GetCommand turns into:
We could then also shrink the CommandSequence by removing
https://github.com/mozilla/OpenWPM/blob/1ad7d5a41f6c0e87173bcfa8838e894d373b5b48/automation/CommandSequence.py#L64-L69
and replacing it with a generalized
add_command(self, command)
.We'd also add a validate method, to check that we contain a get or browse before every other command.
This method would be called when a CommandSequence gets submitted to the TaskManager, where we would also add the IntitializeCommand and the FinalizeCommand.
This would simplify adding a new command to:
MyCommand
that derives fromBaseCommand
.execute()
(and maybe__repr__
and eventuallyname()
)command_sequence.add_command(MyCommand())
And we could remove the
RunCustomFunctionCommand
as it would be easier to just tell the user to create a new class instead.The text was updated successfully, but these errors were encountered: