Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing loader imports #580

Merged
merged 4 commits into from
Nov 30, 2021
Merged

Conversation

JeffAshton
Copy link
Contributor

@JeffAshton JeffAshton commented Nov 30, 2021

Description

Fixing RulesLoader/FileRulesLoader to load and maintain cache of imports (import_rules) correctly.

Checklist

  • I have reviewed the contributing guidelines.
  • I have included unit tests for my changes or additions.
  • I have successfully run make test-docker with my changes.
  • I have manually tested all relevant modes of the change in this PR.
  • I have updated the documentation.
  • I have updated the changelog.

@@ -212,7 +212,7 @@ def get_import_rule(self, rule):
Retrieve the name of the rule to import.
:param dict rule: Rule dict
:return: rule name that will all `get_yaml` to retrieve the yaml of the rule
:rtype: str
:rtype: str or List[str]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As defined by schema:

@@ -596,7 +614,7 @@ def get_import_rule(self, rule):
Allow for relative paths to the import rule.
:param dict rule:
:return: Path the import rule
:rtype: str
:rtype: List[str]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned value, expanded_imports, is always a List[str].

assert rules_loader.import_rules[leaf_path] == [
branch_path,
]
# re-load the rule a couple times to ensure import_rules cache is updated correctly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best visualized with side by side:

image

assert rules_loader.import_rules[water_path] == [
oxygen_path,
]
# re-load the rule a couple times to ensure import_rules cache is updated correctly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best visualized with side by side:

image

'symbol': 'O',
}
assert sorted(rules_loader.import_rules.keys()) == [
oxygen_path,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert sorted(rules_loader.import_rules.keys()) == [
water_path,
]
assert rules_loader.import_rules[water_path] == [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache no represents correct set of import paths:

# read the set of import paths from the rule
import_paths = self.get_import_rule(rule)
if type(import_paths) is str:
import_paths = [import_paths]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bi-product of how they were using the += operator on the files_to_import list before:

files_to_import += self.get_import_rule(rule)

Example:

list = []

list += [ 'a', 'b' ]
print( list )

list += 'c'
print( list )

Outputs:

['a', 'b']
['a', 'b', 'c']


if 'import' not in rule:
# clear import_rules cache for current path
self.import_rules.pop(current_path, None)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what scenario will this self.import_rules.pop statement actually pop something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the import is removed from a file. Added test case to cover this:

ef06201#diff-68599694211dc95012d74a98b54fd5f439eddb0d4ce257ba1a35a1772f243069R617

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now. It's great to have all this new test coverage, thanks for adding it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping this resolves the issue I'm seeing:

image

After rule updates, the yaml scanner explodes with warnings:

{
    "name": "py.warnings",
    "levelno": 30,
    "levelname": "WARNING",
    "pathname": "/usr/local/lib/python3.10/warnings.py",
    "filename": "warnings.py",
    "module": "warnings",
    "lineno": 109,
    "funcName": "_showwarnmsg",
    "created": 1638273031.6421432,
    "asctime": "2021-11-30 11:50:31,642",
    "msecs": 642.1432495117188,
    "relativeCreated": 492821989.7737503,
    "thread": 139802450061056,
    "threadName": "ThreadPoolExecutor-0_6",
    "process": 47,
    "message": "/usr/local/lib/python3.10/site-packages/yaml/scanner.py:286: ResourceWarning: unclosed <ssl.SSLSocket fd=386, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('172.31.38.167', 49134), raddr=('54.87.161.2', 443)>\n  for level in list(self.possible_simple_keys):\n"
}

With enough rule updates the CPU ends up pegged on computing sha1 hashes of the imports.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you automated your rule updates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We commit our rules to a git repository and I run cron job inside the docker container to update them.

@jertel jertel merged commit c0fd81f into jertel:master Nov 30, 2021
@JeffAshton JeffAshton deleted the fixing-loader-imports branch November 30, 2021 14:36
@ghost ghost mentioned this pull request Jan 6, 2023
6 tasks
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants