Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards better inference: bits → nibbles #3808

Open
wants to merge 177 commits into
base: main
Choose a base branch
from
Open

Conversation

originalsouth
Copy link
Contributor

@originalsouth originalsouth commented Nov 6, 2024

Changes

Bits → Nibbles

Issue link

N/A

Demo

T.B.D.

QA notes

Test all ported bits/new nibbles from front-end:

  • disallowed_csp_hostnames (with/without config)
  • check_cve_2021_41773
  • domain_owner_verification
  • expiring_certificate
  • ask_url_params_to_ignore
  • website_discovery
  • missing_certificate
  • cipher_classification
  • default_findingtype_risk
  • spf_discovery
  • max_url_length_config (with config for testing only)
  • oois_in_headers (with/without config)
  • ask_port_specification
  • ask_disallowed_domains
  • check_hsts_header (with/without config)
  • url_classification
  • missing_spf

Code Checklist

  • All the commits in this PR are properly PGP-signed and verified.
  • This PR only contains functionality relevant to the issue.
  • I have written unit tests for the changes or fixes I made.
  • I have checked the documentation and made changes where necessary.
  • I have performed a self-review of my code and refactored it to the best of my abilities.
  • Tickets have been created for newly discovered issues.
  • For any non-trivial functionality, I have added integration and/or end-to-end tests.
  • I have informed others of any required .env changes files if required and changed the .env-dist accordingly.
  • I have included comments in the code to elaborate on what is not self-evident from the code itself, including references to issues and discussions online, or implicit behavior of an interface.

Checklist for code reviewers:

Copy-paste the checklist from the docs/source/templates folder into your comment.


Checklist for QA:

Copy-paste the checklist from the docs/source/templates folder into your comment.

originalsouth and others added 30 commits August 27, 2024 09:31
…uler from recreating already deleted oois trhough affirmations
…cheduler_from_reacreating_already_deleted_oois_through_affirmations' into feature/nibbles
@originalsouth originalsouth marked this pull request as ready for review December 18, 2024 09:42
@originalsouth originalsouth requested a review from a team as a code owner December 18, 2024 09:42
@originalsouth originalsouth added the nibbles Everything nibble related label Dec 18, 2024
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
45.7% Coverage on New Code (required ≥ 80%)
9.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link
Contributor

@Donnype Donnype left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I didn't check all the queries or dove too deep into some parts (such as run/infer and the whole going down the inference tree), but at least for the queries I am happy to see the integration tests.

There is for me one big caveat: the nibble runner, and in particular its stateful properties, are now in charge of managing enabled/disabled nibbles and selecting them after a query. This will not work in practice over application lifetime.

Some less urging comments are about the number of methods/endpoints and how flexible they should/could be.

Please have a look! I could definitely re-review this to go into more detail for other parts later on.

self.method,
f"[{','.join(sorted([str(param) for param in self.parameters_references]))}]"
if self.parameters_references is not None
else "Null",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "None" would be more consistent with other instances where fields in the natural key might be None

self.__class__.__name__,
self.origin_type.value,
self.method,
f"[{','.join(sorted([str(param) for param in self.parameters_references]))}]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the has of the parameters instead? Since there could be an arbitrary amount

@@ -249,6 +262,30 @@ def deserialize(cls, data: dict[str, Any]) -> OOI:
stripped["user_id"] = user_id
return object_cls.model_validate(stripped)

@classmethod
def objectify(cls, t: list | Any, obj: dict | list | set | Any) -> tuple | frozenset | Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "hydrate"? This deserves a comment explaining what the method accepts and returns, as I doubt Any is the best we can do seeing this method only being called once? Also the parameter t and variables tt and o could be more explicit about what they might hold. For the main if/elif branches you could specify the scenario we are in.


import structlog
from pydantic import BaseModel
from xxhash import xxh3_128_hexdigest as xxh3 # INFO: xxh3_64_hexdigest is faster but hash more collision probabilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pun intended?

@@ -582,3 +582,73 @@ def migrate_origins(
session.add((OperationType.DELETE, origin.id, valid_time))

session.commit() # The save-delete order is important to avoid garbage collection of the results


@router.get("/nibbles/list", tags=["nibbles"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency:

Suggested change
@router.get("/nibbles/list", tags=["nibbles"])
@router.get("/nibbles", tags=["nibbles"])

assert self.repository.objectify(int, set((3 * "9 ").split())) == frozenset([9])
assert self.repository.objectify(int, {"1", "2", "5"}) == frozenset([1, 2, 5])

assert self.repository.objectify(str, "potato") == "potato"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all these examples: should we rename to something like parse_recursively_as(data, parse_type: type)?

"potato2": "king-edward",
}

assert self.repository.objectify([str, int], ["seven", "11"]) == tuple(["seven", 11])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this gets a bit tricky. Some fields/query results might be strings that should remain strings even if they represent an integer? I.e. perhaps some of this parsing should happen outside the query funtion by users that know whether to expect a string of integer?

@@ -13,6 +13,7 @@ class OriginType(Enum):
OBSERVATION = "observation"
INFERENCE = "inference"
AFFIRMATION = "affirmation"
NIBBLET = "nibblet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After Nibble and Nibbler, perhaps we should reduce the introduced naming by making this:

Suggested change
NIBBLET = "nibblet"
NIBBLET = "nibble"

And renaming the variables to nibble_origin


STATIC_IP = ".".join((4 * "1 ").split())


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these tests create objects in XTDB and then query them using nibble queries to assert the behavior? Then I'm very pleased to see this!

if os.environ.get("CI") != "1":
pytest.skip("Needs XTDB multinode container.", allow_module_level=True)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bits nibbles Everything nibble related octopoes Issues related to octopoes
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

5 participants