Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce IDs for each script/category #262

Open
6 tasks
undergroundwires opened this issue Sep 27, 2023 · 10 comments
Open
6 tasks

Introduce IDs for each script/category #262

undergroundwires opened this issue Sep 27, 2023 · 10 comments
Labels
enhancement New feature or request
Milestone

Comments

@undergroundwires
Copy link
Owner

undergroundwires commented Sep 27, 2023

TL;DR: Please checked the Proposed solution, I'm looking for community feedback on the proposed ID format and the overall approach.

Problem description

The absence of IDs for scripts and categories is blocking:

To be able to implement these, we need to assign IDs for each script and category.

Requirements for IDs

The IDs should be:

  • Simple to generate.
  • Unique across each collection (i.e., each operating system).
  • Concise, ensuring they do not excessively increase import/export metadata or clutter URLs.

Proposed solution

Adopt the approach of generating a GUID (e.g., 27e7b119-6fdb-447f-91e1-99ecf94d9f34) and extracting the first segment (prior to the first dash, e.g., 27e7b119).

So every script and category will have an ID in format of 27e7b119.

Alternatives considered

  • Avoiding IDs: Initially attempted to avoid using IDs for ease of contribution to privacy.sexy via collection files, but this now poses a significant blocker.
  • Hierarchical IDs: As suggested in privacy.sexy/#59#issuecomment-1732556213, this option imposes excessive mental load and undesirably tight coupling of scripts to their parent categories.
  • Numerical IDs: Challenging for maintaining and generating new IDs.

TODO

  • Develop a CLI tool for generating IDs for every script/category lacking them.
  • Integrate an ID field within the compiler and parser.
  • Modify the parser to require an ID field for every script/category, accompanied by a contributor-friendly error message.
  • Introduce logic to validate the uniqueness of each ID within its respective collection.
  • Assign IDs to all existing scripts/categories.
  • Update relevant documentation, like extend scripts, to incorporate descriptions of the ID field.
@undergroundwires undergroundwires added the enhancement New feature or request label Sep 27, 2023
@undergroundwires undergroundwires added this to the 0.13.0 milestone Sep 27, 2023
@undergroundwires undergroundwires added the help wanted Extra attention is needed label Sep 28, 2023
@neube3
Copy link

neube3 commented Oct 1, 2023

LGTM!

One potential caveat to point out, though would be the fact, that using only the first word increases chances of collisions down the line - though I don't know enough to say how significant a chance would it be. Though I don't have a good idea how to prevent this (a bad idea would be to update the CLI tool with the list of currently used IDs and make the tool re-generate a new GUID if the first word of a fresh one collided with an old one).

@neube3
Copy link

neube3 commented Feb 4, 2024

Just snowballing here - since the hierarchical structure is bad (categories are a bit whimsical, can change names, etc.) and GUIDs are a proposed solution - what if we could have a list of GUIDs but in the script itself give each feature an array of category tags? This way you're not limited to only one category. I hope that's potentially useful ;).

@undergroundwires
Copy link
Owner Author

Can you elaborate @neube3?

Do I understand correctly that you're suggesting that we keep categories but introduce one more level of taxonomy through tags? You're right that the hierarchical categories can never be perfect and one script could often be categorized inside a few groups. So we add tag support and tag scripts/categories as another level of categorization? And assign IDs to these tags too?

@neube3
Copy link

neube3 commented Feb 16, 2024

Can you elaborate @neube3?

Do I understand correctly that you're suggesting that we keep categories but introduce one more level of taxonomy through tags? You're right that the hierarchical categories can never be perfect and one script could often be categorized inside a few groups. So we add tag support and tag scripts/categories as another level of categorization? And assign IDs to these tags too?

I knew I was right when I put it up for discussion - I didn’t even think about IDs for tags, but that’s exactly the type of an idea enhancement I expected from a discussion :)! Tag ID's sound great - I’m almost always for separating the key and label (doing multilingual work - you learn to appreciate ID-label coupling). As a side note - I have no clue about any plans for translating the scripts, but tags could have language versions to them (as a property, I guess?) but since the ID would stay the same through label and language changes the whole system should be both accessible and easily shareable.

As for the tags, we could either:

  1. Get rid of the categories altogether and use tags only (it might be slightly confusing, but hopefully only in the transitory period from the legacy thinking mode - tags are just a superior superset for hierarchy; also: see more below). Don’t really use the hierarchy for anything but display, use tags internally.
  2. Rename tags to categories and drop the hierarchical aspect (might be confusing, but makes the id-ing aspect somewhat easier since there would be one less thing to worry about)
  3. Keep categories as legacy, including/not including their hierarchy and add tags separately. Don’t really use the hierarchy for anything but display, use tags internally.

Whichever we choose, we should allow users to search by tag; there are also fun little additional things you can do with a tag structure you cannot do in a hierarchy, most basic of which is one script with multiple tags ("Is it script a security or a speedup measure?" - now you don’t have to guess!), a further one is a tag cloud (pleasing visual representation of tags by their relative count) and finally the best thing ever: searching by tags, e.g. "All speedup scripts in one place".

Last, but not least, tags can include/reference other tags (non-hierarchically, but we could always choose to include only such subsets).

@Kerobyte
Copy link

human readable ids are nice ribbit
https://dev.to/stripe/designing-apis-for-humans-object-ids-3o5a

@undergroundwires
Copy link
Owner Author

FYI, ID support is implemented. I'm merging necessary refactorings as part of patches and will add the main code in next release. In previous refactoring c138f74, I added a concept called Executable.

An Executable represents a Category or a Script that can be uniquely identified (a.k.a. id) within a collection such as windows/linux/macos (a.k.a. collection). So I was initially thinking designing executables this way:

{
  "key": {
    "collection": "windows",
    "id": "27e7b119",
  },
}

However, the blog post you linked @Kerobyte inspired me to do adding type: "script" and type: "category" in key field:

{
  "key": {
    "collection": "windows",
    "type": "script",
    "id": "27e7b119",
  },
}

I quote this:

Querying every single table to find one ID is extremely inefficient, so we need a better method. One way could be to require an additional “type” parameter.

With the previous design I had in mind, all executable in a collection must be iterated until the executable is found to be able to find out the object and read the type. But having type inside key, type can act as a secondary partition key and remove the need to iterate the other ones. The only issue with this approach is the type (category/script) must be known when doing queries. This is not an issue for the API (#262), import export (#126) and the type can be included in URLs for permalink support (#49).

The article argues that having an object like this "complicates the API with no additional gain". I do not agree, the gain is that it's much more explicit and human readable. So I will keep this kind of complex key/ID structure in the code.

However, we'll need to serialize this object to use in public API (#126) and to store selections (#59). And in that case, I guess we can use this format:

{collection}/{type_plural}/{id}

For example: windows/category/27e7b119, macos/scripts/9d8b6dd9.

I guess this URI format is much more clear than what the articles suggests with underscores ([prefix]_[random_string]). The URI format explicitly tells which part of the ID is more generic than the other part in REST-like manner => collection > executable type > executable ID.

We're making the last decisions. I'd like to hear your feedback on this.

(@Marc05 feel free to join designing this if you're around)


TLDR

Use a composite/aggregate key for executables (category/script) composed of following properties:

{
  "key": {
    "collection": "windows",
    "type": "script",
    "id": "27e7b119",
  },
}

Serialize it in URI format like this: windows/scripts/27e7b119.

New decision: Add type (i.e. "category" or "script") as part of the composite key.

@undergroundwires undergroundwires pinned this issue Jun 15, 2024
@neube3
Copy link

neube3 commented Jun 15, 2024

Not gonna lie - I didn't grok everything, but your solution LGTM!

Also, the article basically mentions Hungarian notation. Which is not bad, per se, but not new, either.

Semi-connected rant below, feel free to skip if you're for the opinion on the solution:
And the author thinks that having a userID as a number is bad, because if you have huge holes in your API, then some malicious actors could use their ID to guess others' IDs and then do something nefarious.
Well... have they tried not having a huge hole in their APIs? "Need to know" and "Chinese walls" are basic concepts of security and if someone posits such a strawman as an intro point in favour of their theory/solution then I automatically grow doubly sceptical. If you have such holes in the API, then numbered userIDs are the least of your problems.

As for the "oh, I'm customer 50, your operation isn't big and therefore doesn't feel that great anymore" argument - it would be trivial to just add a random(10000,100000) to the ID and check for collisions - another strawman.

@Marc05
Copy link
Contributor

Marc05 commented Jun 15, 2024

Ship it! :)

It probably goes without saying to keep in mind some safety checks around categories; it sounds like there's potential for infinite loops since they can reference each other.

Good progress!

@undergroundwires
Copy link
Owner Author

@neube3 I agree. It's security by obscurity, and a very weak one. Enumeration is mitigated by proper authentication and network controls not by adding prefixes to IDs.

Thanks for the input everyone.


This will be final design:

image

The support in application will be added in upcoming patch release(s). The collection files will modified in the feature release.

GPT 4 agrees that the executable type (whether it is a script or a category) should be part of the key/ID when my prompt asked for it to optimize for easiest future changes to maximize maintainability

Incorporating executableType into the composite ID for privacy.sexy could provide benefits, especially considering the goal of maximizing maintainability and allowing for easy future updates and categorization.

Pros of Including executableType in ID:

  • Clear Distinction: Differentiates between types of executables (e.g., scripts vs categories) directly in the ID. This clarity can streamline processing logic and data retrieval by quickly identifying the type without parsing additional data structures.
  • Scalability: Facilitates adding new executable types in the future without reworking the ID structure or parsing mechanisms.
  • Simplified Data Relations: Enhances the ability to manage relationships and dependencies between different types of data. For instance, in a scenario where scripts and categories might have different attributes or behaviors, IDs with type information simplify filtering and operations specific to types.
  • Enhanced Search and Organization: Having type information within the ID can make indexing and searching through a large number of items more efficient.

Cons of Including executableType in ID:

  • Increased Complexity: The IDs will be longer and potentially more cumbersome to handle in code, especially for manual debugging or during development.
  • Potential Redundancy: If the type is rarely needed for practical purposes or if systems already manage type information efficiently through other means, this addition could be superfluous.
  • Modification Overhead: Any change in the categorization or type naming would require updates to the IDs, possibly affecting historical data or requiring migration steps.

Recommendation:
Given the focus on maintainability and adaptability for future changes, including executableType in the composite ID appears beneficial. It provides a more robust and flexible structure for managing different entities within the system and adapts more readily to future expansions or modifications in data organization. This approach is especially valuable in environments where different types of executables might have distinct handling, permissions, or processing paths.

@ltguillaume
Copy link

ltguillaume commented Jun 17, 2024

If I might suggest another piece of metadata for each executable: the privacy.sexy version/build the specific script got introduced in, or a "revision number".

In the future, people will want to load the script they previously created as a template in order to create an updated version of it. It would then be important to be able to quickly see all the newly added executables since the template script was created and add the desired ones from such a filtered overview.

It would also make sense to change this version metadata for executables if new breaking changes have been made or if new incompatibility information or other warnings were added, so that people can reconsider these executables upon rebuilding their script.

undergroundwires added a commit that referenced this issue Jul 10, 2024
This commit unifies the concepts of executables having same ID
structure. It paves the way for more complex ID structure and using IDs
in collection files as part of new ID solution (#262). Using string IDs
also leads to more expressive test code.

This commit also refactors the rest of the code to adopt to the changes.

This commit:

- Separate concerns from entities for data access (in repositories) and
  executables. Executables use `Identifiable` meanwhile repositories use
  `RepositoryEntity`.
- Refactor unnecessary generic parameters for enttities and ids,
  enforcing string gtype everwyhere.
- Changes numeric IDs to string IDs for categories to unify the
  retrieval and construction for executables, using pseudo-ids (their
  names) just like scripts.
- Remove `BaseEntity` for simplicity.
- Simplify usage and construction of executable objects.
  Move factories responsible for creation of category/scripts to domain
  layer. Do not longer export `CollectionCategorY` and
  `CollectionScript`.
- Use named typed for string IDs for better differentation of different
  ID contexts in code.
undergroundwires added a commit that referenced this issue Aug 3, 2024
This commit unifies executable ID structure across categories and
scripts, paving the way for more complex ID solutions for #262.
It also refactors related code to adapt to the changes.

Key changes:

- Change numeric IDs to string IDs for categories
- Use named types for string IDs to improve code clarity
- Add unit tests to verify ID uniqueness

Other supporting changes:

- Separate concerns in entities for data access and executables by using
  separate abstractions (`Identifiable` and `RepositoryEntity`)
- Simplify usage and construction of entities.
- Remove `BaseEntity` for simplicity.
- Move creation of categories/scripts to domain layer
- Refactor CategoryCollection for better validation logic isolation
- Rename some categories to keep the names (used as pseudo-IDs) unique
  on Windows.
@undergroundwires undergroundwires unpinned this issue Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants