Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add @source not support #17255

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Add @source not support #17255

wants to merge 17 commits into from

Conversation

RobinMalfait
Copy link
Member

@RobinMalfait RobinMalfait commented Mar 17, 2025

This PR adds a new source detection feature: @source not "…". It can be used to exclude files specifically from your source configuration without having to think about creating a rule that matches all but the requested file:

@import "tailwindcss";
@source not "../src/my-tailwind-js-plugin.js";

While working on this feature, we noticed that there are multiple places with different heuristics we used to scan the file system. These are:

  • Auto source detection (so the default configuration or an @source "./my-dir")
  • Custom sources ( e.g. @source "./**/*.bin")
  • The code to detect updates on the file system

Because of the different heuristics, we were able to construct failing cases (e.g. when you create a new file into my-dir that would be thrown out by auto-source detection, it'd would actually be scanned). We were also leaving a lot of performance on the table as the file system is traversed multiple times for certain problems.

To resolve these issues, we're now unifying all of these systems into one ignore crate walker setup. We also implemented features like auto-source-detection and the not flag as additional gitignore rules only, avoid the need for a lot of custom code needed to make decisions.

High level, this is what happens after the now:

  • We collect all non-negative @source rules into a list of roots (that is the source directory for this rule) and optional globs (that is the actual rules for files in this file). For custom sources (i.e with a custom glob), we add an allowlist rule to the gitignore setup, so that we can be sure these files are always included.
  • For every negative @source rule, we create respective ignore rules.
  • Furthermore we have a custom filter that ensures files are only read if they have been changed since the last time they were read.

So, consider the following setup:

/* packages/web/src/index.css */
@import "tailwindcss";
@source "../../lib/ui/**/*.bin";
@source not "../../lib/ui/expensive.bin";

This creates a git ignore file that (simplified) looks like this:

# Auto-source rules
*.{exe,node,bin,…}
*.{css,scss,sass,…}
node_modules/
git/

# Custom sources can overwrite auto-source rules
!lib/ui/**/*.bin

# Negative rules
lib/ui/expensive.bin

We then use this information on top of your existing .gitignore setup to resolve files (i.e so if your .gitignore contains rules e.g. dist/ this line is going to be added before any of the rules lined out in the example above. This allows negative rules to allow-list your .gitignore rules.

To implement this, we're rely on the ignore crate but we had to make various changes, very specific, to it so we decided to fork the crate. All changes are prefixed with a // CHANGED: block but here are the most-important ones:

  • We added a way to add custom ignore rules that extend (rather than overwrite) your existing .gitignore rules
  • We updated the order in which files are resolved and made it so that more-specific files can allow-list more generic ignore rules.
  • We resolved various issues related to adding more than one base path to the traversal and ensured it works consistent for Linux, macOS, and Windows.

Behavioral changes

  1. Any custom glob defined via @source now wins over your .gitignore file and the auto-content rules.
  2. The node_modules and .git folders as well as the .gitignore file are now ignored by default (but can be overridden by an explicit @source rule).
  3. Introduced @source not "…" to negate any previous rules.
  4. Negative content rules in your legacy JavaScript configuration (e.g. content: ['!./src']) now work with v4.
  5. The order of @source definitions matter now, because you can technically include or negate previous rules. This is similar to your .gitingore file.
  6. Individual files scanned in the CLI are now also tested against all the rules.

Combining with other features

Note that the not flag is also already compatible with @source inline(…) added in an earlier commit:

@import "tailwindcss";
@source not inline("container");

Next steps

We are not finished yet with our quest to improve the @source directive. We still plan to address #16765 and resolve symlink-related issues.

Test plan

  • We added a bunch of oxide unit tests to ensure that the right files are scanned
  • We updated the existing integration tests with new @source not "…" specific examples and updated the existing tests to match the subtle behavior changes
  • We also added a new special tag [ci-all] that, when added to the description of a PR, causes the PR to run unit and integration tests on all operating systems.

[ci-all]

@RobinMalfait RobinMalfait force-pushed the feat/source-not branch 4 times, most recently from 41102c9 to 39588cf Compare March 18, 2025 11:36
@philipp-spiess philipp-spiess force-pushed the feat/source-not branch 12 times, most recently from d3700cc to 41e1ac2 Compare March 21, 2025 14:38
RobinMalfait and others added 10 commits March 21, 2025 18:11
This is a chunky commit which includes a refactor and all the necessary
changes. Sorry that it's a bit confusing to read because so many things
changed.

The big picture changes:

1. We were dealing with `GlobEntry` enums, but we split this up to
   better model what's going on.

   - `GlobEntry`: literally represents a `Glob`. We still return these
     so our frontends like `@tailwindcss/postcss` can still use them.

   - `SourceEntry`: in the `node` crate, a `SourceEntry` is what used to
     be a `GlobEntry` but also has a negated flag now. This represents a
     `@source not '…'` for example. This will map to a
     `PublicSourceEntry` inside of the `oxide` crate.

   - `PublicSourceEntry`: in the `oxide` crate, this is exactly the same
     structure as the `SourceEntry` in the `node` crate.

   - `SourceEntry`: this is an enum that better represents the domain we
     are dealing with.:

     ```rs
     pub enum SourceEntry {
       Auto { base: PathBuf },
       IgnoredAuto { base: PathBuf },
       Pattern { base: PathBuf, pattern: String },
       IgnoredPattern { base: PathBuf, pattern: String },
     }
     ```

2. One big realisation we had is that auto source detection and the
   ignored files/extensions/... _basically_ map to a giant set of
   `.gitignore` rules.

   This is also how we modeled it now. The `Scanner` will get a set of
   `sources` (`Vec<SourceEntry>`) and generates the necessary
   `.gitignore` rules.

   One nice benefit with this approach is that it unlocks a few features
   for free:

     1. No need to duplicate the rules in several spots anymore
        (scanning files initially, scanning files in subsequent calls,
        scanning specific files via `scanFiles()`). They all use the
        same "rules".
     2. No need to potentially do multiple file system walks anymore.
        All the rules are setup once, and we will be able to walk over
        the files and folders that adhere to the rules.

        Bonus points: _if_ you reach a folder that we know with 100%
        certainty is ignored by the gitignore rules, then we can skip
        that folder entirely.
     3. `@source` order matters, but it allows you to explicitly ignore
        an allowed file, or explicitly allow an ignored file.

3. We split the internal `Scanner` API, it essentially looks like this:

  1. The `Scanner::new(sources)` sets up the world and creates a
     `walk_builder` (the big engine behind all of this)
  2. `Scanner.scan_sources(…)` — the only thing it does is use the
     walk_builder and stores information based on the info we found:
     1. Collect `dirs`
     2. Collect `files`
     3. Collect `extensions`
  3. `Scanner.scan_content` — this is what the `@tailwindcss/cli` calls
     via `scanner.scanFiles(…)`. This also just registers the changed
     files and content into the `scanner`. For `ChangedContent::File(_)`
     we verify that the file is allowed by all the rules mentioned
     above.
  4. `Scanner.extract_candidates` — essentially take whatever is stored
     in `self.changed_content` and parse these files by extracting all
     the candidates. This is used by `scan_sources` and `scan_content`.

Co-authored-by: Philipp Spiess <hello@philippspiess.com>
Essentially we used to combine files and globs into a shared `paths`.
Now these are split again, and instead of using tuple syntax you can get
them from a `ScanResult`.

Additionally, we will _not_ add `**/*` (Auto Source Detection) by
default, you have to be explicit about this.
The gist of it is that we now are dealing with `SourceEntry` instead of
`GlobEntry`. A `SourceEntry` in the `node` crate maps to a
`PublicSourceEntry`. This makes it easy for JS APIs to pass in the
values.

A `PublicSourceEntry` looks like:

- `base` — the base path of the source
- `pattern` — the glob pattern
- `negated` — whether or not the pattern should be negated

Internally we will map this to a `SourceEntry`, this is a proper enum
that looks like this:
```rs
pub enum SourceEntry {
    Auto { base: PathBuf },
    IgnoredAuto { base: PathBuf },
    Pattern { base: PathBuf, pattern: String },
    IgnoredPattern { base: PathBuf, pattern: String },
}
```

Before we construct a Scanner, we will also make sure to optimize these
patterns. Some optimization steps:

1. Each `pattern` will be brace-expanded and a new `SourceEntry` will be
   created. This allows us to always deal with simple patterns.
2. The `base` of each `SourceEntry` will be canonicalized so we are
   dealing with the real paths and symlinks are resolved.
3. When patterns include static parts such as `/src/*.html`, then the
   static part (in this case `src`) will be moved to the `base`.
This isn't strictly necessary for this PR, but noticed missing values
when updating the `globs` -> `sources` and made TypeScript happy by
adding this.
These tests will reflect 2 big changes:
1. New tests in general
2. Tests where we used auto source detection and `.gitignore` files,
   will now favor the `@source '…'` rules over the `.gitignore` rules.
   Note: If you _don't_ want this behavior, then you can use `@source
   not '…'` to override rules.
We still emit `globs` for `@tailwindcss/postcss`, but we also added
better normalized globs based on your `@source` directives. We returned
them as a new property to stay backward compatible, but this also means
that the `@tailwindcss/cli` can make use of this.
Some people don't have `node_modules` in the `.gitignore` file so in
some scenario's this will always be scanned.

Another example is if you use the Vercel CLI to deploy your project
because `.gitignore` files are not pushed...

This is technically a breaking change, but now that we have `@source not
'…'` you can include files/folders from `node_modules` even though the
folder is ignored by default.
@RobinMalfait RobinMalfait marked this pull request as ready for review March 21, 2025 17:26
@RobinMalfait RobinMalfait requested a review from a team as a code owner March 21, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment