-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pacer): Link pacer sessions to proxies #4192
Conversation
6cebd2b
to
4579385
Compare
This commit updates the `ProxyPacerSession` class to enable selection of a proxy connection string from the application settings.
This commit updates the `log_into_pacer` method to return a tuple containing the user's cookie and the proxy address used for login (if applicable). This improvement provides more context about the login session, facilitating further actions requiring both cookies and potential proxy information.
This commit tweaks the `get_or_cache_pacer_cookies` function to handle the new return format of `log_into_pacer`. It ensures compatibility with both the updated function and any existing cached data that might be in the old format.
… cookie handling - Updates tasks logic to accommodate the new format for PACER user cookies. - Updates function signature to reflect the change in cookie data format.
- Adapts update_docket_info_iquery task to handle the updated output format of get_or_cache_pacer_cookies. - Enhances task functionality by passing the proxy parameter to the ProxyPacerSession class
a1c1746
to
693552d
Compare
Semgrep found 4 Avoid using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you've got an effective solution here, and that you've thought carefully about how to roll it out without queued tasks failing. Nice.
I think I would change something though. Having to work with tuples and pass them around is never much fun. They're kind of ugly, it's hard to remember what order the tuple is in ("Was it cookies then proxy or proxy then cookies?"), and so forth.
Since I think the values we need are getting added to the PACER session object anyway, did you consider just passing that around? Even log_into_pacer
could just return s
, I think?
cl/lib/pacer_session.py
Outdated
return cookies | ||
if isinstance(cookies_data, tuple): | ||
return cookies_data | ||
return cookies_data, settings.EGRESS_PROXY_HOST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop this variable in the code now so we don't have to think about it again. I see you're doing backwards compat here, but it'd be better to just hardcode EGRESS_PROXY_HOSTS[0]
, say. Let's also make a comment about how this code can be removed the next time anybody is in this area and that it's only here for transitioning.
2954d38
to
b4cc6df
Compare
b4cc6df
to
5a47599
Compare
@@ -7,7 +7,7 @@ | |||
from cl.corpus_importer.tasks import add_tags, get_pacer_doc_by_rd | |||
from cl.lib.celery_utils import CeleryThrottle | |||
from cl.lib.command_utils import VerboseCommand, logger | |||
from cl.lib.pacer_session import ProxyPacerSession | |||
from cl.lib.pacer_session import ProxyPacerSession, SessionData | |||
from cl.lib.scorched_utils import ExtraSolrInterface | |||
from cl.lib.search_utils import build_main_query_from_query_string | |||
from cl.scrapers.tasks import extract_recap_pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using environment variables inside django app. Use django-environ
as it a better alternative for deployment.
@@ -14,7 +14,7 @@ | |||
) | |||
from cl.lib.celery_utils import CeleryThrottle | |||
from cl.lib.command_utils import VerboseCommand, logger | |||
from cl.lib.pacer_session import ProxyPacerSession | |||
from cl.lib.pacer_session import ProxyPacerSession, SessionData | |||
from cl.lib.scorched_utils import ExtraSolrInterface | |||
from cl.lib.search_utils import build_main_query_from_query_string | |||
from cl.recap.tasks import process_recap_attachment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using environment variables inside django app. Use django-environ
as it a better alternative for deployment.
Semgrep found 5
Avoid using |
81cd87d
to
e34f3d5
Compare
@mlissner I've implemented your suggestions. I've removed the Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! This is a lot easier to reason about. Thanks for making the little helper class.
Can you make an issue with details about deployment and the follow up tasks? E.g.:
- Need the variables set before deployment
- Need to remove the old proxies after a few hours or maybe a day
- Need to remove shim code after a few hours or maybe a day
Anything else?
This is approved, but can you work with Ramiro to make sure that we've got the variables and k8s deployments in place before we merge and deploy? |
Sure! |
@mlissner Ramiro confirmed the environment variable was set up correctly and that CL can successfully reach the newly deployed proxies. Let's merge this PR. |
Auto-merge is set. Thanks all! |
@ERosendo Now that this has been live for a couple weeks, is there code in here we want to refactor to simplify things? |
@mlissner Yes, We used several ternary if statements to manage legacy cached data. I believe we can safely remove this code. Here are examples: courtlistener/cl/recap/tasks.py Lines 1658 to 1660 in 5716284
courtlistener/cl/recap/tasks.py Lines 2416 to 2421 in 5716284
courtlistener/cl/lib/pacer_session.py Lines 165 to 169 in 5716284
|
This PR introduces a new approach to caching PACER sessions and updates the code of the ProxyPacerSession class to pick one of the proxy connection strings from the settings.
Key Changes:
Introduces a new environment variable (EGRESS_PROXY_HOSTS) to manage a list of proxies.
Implements new logic in the ProxyPacerSession to choose a proxy connection string from the settings, enabling dynamic proxy selection.
Updates the
log_into_pacer
method to return a tuple containing both the user's cookie and the used proxy address used for loginTweaks the
get_or_cache_pacer_cookies
to handle both the new return format and potentially existing cached data in the old format, ensuring backward compatibility.Updates tasks from
corpus_importer
andrecap
modules to accommodate the new PACER user cookie format.Adapts commands in the
corpus_importer
module that use cookies to handle the new format.Introduces new tests for pacer session utils.
This PR closes #4087