-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow packwerk to scan for packages inside of engines #216
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this on Gusto's codebase and it didn't produce any change to the number of inspected files or the time to run bin/packwerk check
.
bin/packwerk update-deprecations
also produces no diff.
It's my understanding based on this and based on the principles of this change that this should continue to work as normal for systems whose application structure does not have multiple rails app and engines at the root.
Before
time bin/packwerk check
Running via Spring preloader in process 35737
📦 Packwerk is inspecting 40922 files
# ...
📦 Finished in 18.11 seconds
No offenses detected
No stale violations detected
real 0m21.756s
After
Running via Spring preloader in process 70028
📦 Packwerk is inspecting 40922 files
# ...
📦 Finished in 26.05 seconds
No offenses detected
No stale violations detected
real 0m29.417s
The good news is this didn't produce changes to the number of files inspected or any changes to violations.
One thing I did notice is that it appears to increase runtime significantly (~30% increase in runtime). I didn't get to investigating this more thoroughly, but I wonder if it's because there is a lot files to glob out and scan? I'm really surprised it would add 8 seconds like this though, but it appears to be happening consistently across runs.
I'd love to dig in a bit deeper on the performance implications. Separately, I realize that this is actually a behavioral change for user, and in theory could be a breaking change. I wonder if (A) to make it non-breaking and (B) unblock you to use this in your application without us digging into the performance implications we can have this be off by default and able to be turned on with a configuration in packwerk.yml
. I imagine something like include_external_rails_engines
(there is probably a better name here). I use "external" to distinguish setups that have unbuilt rails engines located within the rails app.
|
||
all_paths | ||
.transform_keys { |path| Pathname.new(path).expand_path } | ||
.select { |path| path.fnmatch(rails_root_match.to_s) } # path needs to be in application directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this served a purpose before. I suppose that for the most part, autoload paths are always going to be under rails root, which is why this may have been redundant by that implicit fact. I don't know if you can configure Rails to identify autoload paths outside of the application directory. You probably can, but I don't know why we wouldn't want packwerk to consider those, so at the surface this seems fine to me, especially since it doesn't change how things work in Gusto's configuration of packwerk (and as long as it doesn't for Shopify's either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we introduced this line to keep packwerk from inferring constants to be defined in gems. Not sure it's still necessary as we might now be doing that filtering somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we introduced this line to keep packwerk from inferring constants to be defined in gems
I think that filtering is also happening with the line right below it, but I'll double check testing against our mono repo.
That being said, I do wonder if maybe changing the behavior so packwerk does infer constants defined in gems could be a feature not a bug? I can imagine a use case where a gem author publishes a rails engine, with packages defined inside of it, and they intend for that to message to consumers what the public API of the gem is, vs the internals you shouldn't be relying on. I'm not sure if this use case has been explicitly considered and rejected by packwerk maintainers, or just hadn't been considered yet, and ya'll chose to filter out constants from gems for performance or other reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that filtering is also happening with the line right below it, but I'll double check testing against our mono repo.
Just closing this loop, testing this function with our own repo, which has both locally defined engines, and engines from ruby gems via bundler, I'm only seeing our locally defined engine paths show show up
I have some suspicions as to where this is coming from. I suspect I'm generating effectively overlapping globs, and ruby is scanning some directories multiple times as a result. There's probably a way to filter out those.
I'm cool with a new configuration key. I'll add it to PR shortly |
We'd have to come up with a convention for non-application package names. Since package name and path from application root are the same in packwerk, we don't have a convention for the name of a package that lives outside of the application root. E.g. the package |
I'd love it if packwerk could be used to detect gem dependencies someday. I was excited about this because it could help folks keep their dependencies on other gems down, which ultimately makes packwerk more valuable from the perspective of a tool to help folks extract a system into another service (i.e. you need to explicitly list all your gem dependencies to do so). That being said as @exterm said we have a handful of questions to answer, but could be cool for us to start with a design brainstorm in a discussion somewhere. |
Just a heads up, the new configuration key is a little trickier than I realized, just because building a package At this point I think it's probably worth it to refactor these self methods into a new |
I think that sounds reasonable to me @AndrewSwerlick , though may have some more feedback when I see the diff. Really appreciate you continuing to push through on this, and please let me know if you feel stuck or frustrated with anything and I'd be happy to pair with you again on zoom. |
8219642
to
fa46a24
Compare
@alexevanczuk I've added the configuration setting. It's made the PR a little more gnarly, so I cleaned up the history so you can see the refactoring changes in one commit, and then the introducing of the new capabilities in the second. I still need to update the documentation to check off all the boxes. |
I have a suspicion there will be some interaction between our two PRs: #218 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this seems sensible to me, and at this point mostly wanting @rafaelfranca 's take on it.
There has also recently been some other client requests around changes in the way packwerk identifies and pulls in load paths: #218
@rafaelfranca it would be great to find some time (over slack or zoom) to chat more about the type of ways clients might want packwerk to find and pull in load paths and see if there's a way we can systematically enable some more flexibility!
@@ -1,4 +1,4 @@ | |||
# typed: true | |||
# typed: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why'd we need to downgrade the typed-ness of this file? Could we keep it at true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay to use T.unsafe(...)
to wrap something that is challenging to type check if it can keep us at a higher typed level, although better to type check it if we can.
case template | ||
when :minimal | ||
set_load_paths_for_minimal_template | ||
when :skeleton | ||
set_load_paths_for_skeleton_template | ||
when :external_packages | ||
set_load_paths_for_external_packages_template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This + the helper methods looks interesting -- I'm not too familiar with the Rails happy path for setting and knowing about Rails engines. This looks fine but curious if @rafaelfranca has any feedback on setting up an engine stub for test.
@alexevanczuk I think I addressed all the feedback. I know there are some other ongoing discussion, so let me know if there's anything else I can do related to this PR. |
Hey @alexevanczuk, I just wanted to bump this and see if it's still something we can explore getting merged in. I know there's been some conversations in other PRs that may make some of this obsolete, so I'm open to making changes as necessary, but wanted to check in and see where you all stood. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me @AndrewSwerlick. @rafaelfranca let me know if you have any concerns prior to merging and releasing
@rafaelfranca do you happen to know why CI is not running for this PR? |
@AndrewSwerlick Could you try pushing another commit to this branch (and then perhaps reverting it) to see if it kicks off CI? |
@gmcgibbon Would you be able to look at this one too? |
def engine_paths_to_scan | ||
bundle_path_match = Bundler.bundle_path.join("**") | ||
|
||
Rails.application.railties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't depend on railties in the gemspec so I'd rather not depend on rails constants here. I think this also assumes all local engines are packages. Couldn't you just glob for package.yml files instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't depend on railties in the gemspec so I'd rather not depend on rails constants here
I'll address this in your second comment below about paths because I think they're kind of related
I think this also assumes all local engines are packages
That's not exactly accurate. It assumes any local engine might contain packages, but not that the engine itself is a package. The engine doesn't have to contain packages either. It's just adding all the paths to the list of paths that later get scanned for via glob for a package.yml
file.
lib/packwerk/configuration.rb
Outdated
:custom_associations, | ||
:config_path, | ||
:cache_directory, | ||
:packages_outside_of_app_dir_enabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of this name. Maybe instead we could specify a root path (or list of paths) to scan for packages from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did create a prototype that did something like that and proposed a potential solution in #206. It wasn't exactly the same, I was proposing creating load path aliases instead, but the general feedback as I understood it was that we didn't want to expand to the permanent configuration api.
The idea behind this configuration key is that it's meant to be a kind of feature flag, that would be replaced by an opt-out, and then a full removal after this change had been out in the wild and safely tested by a few people.
If we want to expand the permanent configuration api, then a key like external_package_paths
work for our particular use case, but it has some downsides in that it could lead to confusion. Ultimately, these external package paths will only work if they're file structure follows zeitwerk. I'm not sure how easy it is to validate that in advance, or provide useful warning messages to the users as to why package rules aren't being enforced.
This is ultimately the advantage of railsties based approach, in that the engines are guaranteed to follow that structure, so it allows for a zero config, reliable approach, at the downside of creating another dependency
I didn't realize rails wasn't already an explicit dependency though, so I agree if we stick to a rails tie approach, the proper rails gem should be added to the gemspec.
|
||
def create_new_engine_at_path(path) | ||
Class.new(Rails::Engine) do | ||
T.unsafe(self).define_method(:root) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use T.bind(self, Class)
above this line so Sorbet understands the block is a Class. I think this might also be fixed if we just upgrade Sorbet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Moves arguments passed into the self methods on `PackageSet` into a new `PackagePaths` object. This makes it easier to add new arguments without alot of parameter drilling. This will support the next commit where we add the ability to conditionally find packages outside of the app dir
This commit modifies packwerk so that it can scan for packages defined in any engines loaded in a given rails application. This allows support for more complex repository structures where the rails app is not at the root of the repository, and where that app depends on engines defined in sibiling directories. To support this, I modified `PackageSet` so it also looks at `Rails.application.railties` to pull a list of engines, and their root directories, and include those in the paths scanned for packages. I also dropped the restrictions on load paths for constant resolution in `ApplicationLoadPaths`. The result is that packwerk can now find packages defined in these engines, and resolve constants in these engines. With this setup, packwerk still only scans files in the actual app and it's subdirectories for violations, but now it can resolve constants in the engines, and attach them to packages in the engine. For testing, I created a new fixture which models this sort of directory structure, and mocks an app with a rails engine.
Fixed some typing issues by using unsafe where necessary
10d053c
to
9baa416
Compare
I'm not sure exactly why this is needed because what you are describing here is how we use packwerk and it already work for us without modification. Do you have an example of what doesn't work? |
Sorry I didn't mean to generate noise on this. We had an internal hackathon recently exploring trying to get packwerk established in our ecosystem again, and I brought my fork back up to date to explore some approaches using it. I'll answer your question, but before I do, it's worth highlighting that I think I'm going to close out this PR because going forward we have a plan that will allow us to avoid some of the pain points that prompted this in the first place. The first thing that will probably help is if I rewrite the piece you quoted. It should really read
The specifics of our mono repo are such that we have a structure where we have a series of gems and apps as siblings at the root. The gems contain engines. The apps have Gemfiles that include the gems based on their symlink'd paths. Visually it looks something like this
This creates two problems based on my testing.
This PR fixes that problem by ensuring the engines load paths (which are absolute path values) are also scanned for both package and constant resolution. All that being said, even with this PR we had some challenges trying to use some of the other community tools (ie https://github.com/rubyatscale/packs, https://github.com/rubyatscale/danger-packwerk) because of this repo structure, and some other issues. As a result, we decided to temporarily move forward with the experimental rust based parser in https://github.com/alexevanczuk/packs, which dodges many of these problems but not relying on zietwerk conventions. We're revisiting our unique mono repo structure for a number of unrelated reasons, and in the future may end up with something that plays better with packwerk itself out of the box. So we don't really have use for these changes anymore. |
What are you trying to accomplish?
I'm trying to add support to packwerk for more complex repository structures, particularly mono-repos where multiple apps live in the same repo, and share code through engines that are defined in top level folders. Currently, packwerk cannot detect violations where an app is inappropriately using code defined in a package within an engine, because packwerk cannot discover engines that aren't defined below the
Rails.root
of the app.What approach did you choose and why?
To support this, I modified
PackageSet
so it also looks atRails.application.railties
to pull a list of engines, and their root directories, and include those in the paths scanned for packages. I also dropped the restrictions on load paths for constant resolution inApplicationLoadPaths
. The result is that packwerk can now find packages defined in these engines, and resolve constants in these engines.With this setup, packwerk still only scans files in the actual app and it's subdirectories for violations, but now it can resolve constants in the engines, and attach them to packages in the engine.
What should reviewers focus on?
I'm not confident in the approach I took to mocking and testing things to support the new tests I wrote, so particular focus there would be helpful. Also I'm curious if there are implications to the changes to
ApplicationLoadPaths
that I'm not considering.Type of Change
Additional Release Notes
Include any notes here to include in the release description. For example, if you selected "breaking change" above, leave notes on how users can transition to this version.
If no additional notes are necessary, delete this section or leave it unchanged.
Checklist