Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] File naming strategy #872

Closed
devongovett opened this issue Feb 20, 2018 · 24 comments
Closed

[RFC] File naming strategy #872

devongovett opened this issue Feb 20, 2018 · 24 comments
Labels
💬 RFC Request For Comments
Milestone

Comments

@devongovett
Copy link
Member

This is a meta issue to discuss a number of things that have come up about file naming in Parcel.

  1. Keep original filenames for HTML files: 🙋 Keep folder structure and filename #433, 🙋 how to keep html name #280, Keep folder structure and filename #557, [WIP] Keep file names #307
  2. Hashing assets based on file content for cache busting: 🙋 Use MD5 of File Content As Name To Bust Caches #717, File hash does not change after its content updates #188, 🙋 Add --rev option to build command for revved filenames #753, Add --rev option to allow for bundle revisions #756, WIP: Content-hash bundle names #829
  3. Putting assets in a separate folder from HTML: [RFC] Specify sub directory for all the generated assets for easy deploy and serve #233

I think we can come up with a cohesive file naming strategy that meets all of these needs.

  • We hash all assets based on file contents to produce filenames like index.a8b29e.js, except in the following cases (taken from the rules outlined by @Munter here). In those cases, we use the original filenames.
    • Any graph entry point (usually html)
    • Any asset linked to with an <a href>
    • Any asset linked to with a <meta http-equiv="refresh">
    • Serviceworkers (must keep consistent file name across builds)
    • humans.txt, robots.txt, .htaccess, favicon.ico
    • Cache manifests, rss and atom feeds
  • Place hashed assets (things not matched by above rules) in dist/assets e.g. dist/assets/index.a8b29e.js. This would be flattened as it is currently, so src/some/path/something.js would be placed in e.g. dist/assets/something.fd5se2.js.
  • Place non-hashed assets (things matched by the above rules) in the root, and create directories as needed to match the original paths. For example, if an HTML file were linked to from <a href="/some/path/something.html"> the output file would be dist/some/path/something.html.
  • We also support the -o or --out-file CLI option, which would override the default name for the entry file. If not provided, and the entry file matches main in package.json, use the package name.

The only case where this breaks down is if the input path started with /assets - which is the folder we're already using for static things. Not sure what to do about that: I guess we could try to generate a unique name for the assets folder or something. Open to suggestions here!

Otherwise, I think this strategy solves most of the issues listed above. Please let me know your feedback and make any suggestions you think would improve this strategy!

cc. @zeakd @songlipeng2003 @ssuman @Munter @benhutton @shanebo @leeching @gamebox @npup

@devongovett devongovett added the 💬 RFC Request For Comments label Feb 20, 2018
@jsiebern
Copy link
Contributor

I think its a great strategy, though I'd like a CLI option for not putting the assets in a subfolder! Is index._hash_.js based on the root file name or on the budle file name?

@Munter
Copy link

Munter commented Feb 21, 2018

I recommend not making up a unique name for the assets folder. At least not per build at least. The point of correct hashing is to get content addressable urls, so the assets directory has to be predictable and identical across runs so caching headers can be configured as imutable for the path.

I'd just prepend an underscore or two, just to lessen the likelihood of name clashes by departing from nice human names: __assets

@Siyfion
Copy link

Siyfion commented Feb 21, 2018

I think the strategy looks good. As for the assets folder, I agree with @Munter, perhaps just double underscore, _assets_ or some variant thereof. I think having a unique name for the assets folder could cause issues; for example when invalidating files on AWS CloudFront / CDNs, etc.

@benhutton
Copy link

Why not just let it default to assets and be overridden by a command line option (like we do with out-dir)? You could specify another name for the folder, or no folder at all.

@npup
Copy link

npup commented Feb 21, 2018 via email

@zeakd
Copy link

zeakd commented Feb 21, 2018

looks good! I think it is perfect for html. But is there something reason to collect all hashed file to assets folder? I don't know about cache control deeply, but IMO these Parcel ways needs more and more options.

I mean, how about src/some/path/something.js just to dist/some/path/something.fd5se2.js? and write src/assets/index.js if you needs assets folder.

I think, with special parcel ways, parcel needs more and more options to customize..

@devongovett
Copy link
Member Author

devongovett commented Feb 21, 2018

That's certainly an option. We could just put all of the hashed assets in the root. It was requested in #233 and elsewhere to put static files in a separate folder though, so I was trying to accommodate that. I guess maybe it makes it easier to separate things you might upload to a CDN and things you need to put on your webserver maybe? Why did you need that @npup?

This option would look like:

dist/
├── index.html
├── something
│   └── about.html
└── index.a8b29e.js

Alternatively, we could make two roots: one for static assets, and one for HTML. So the output would be:

dist/
├── html
│   ├── index.html
│   └── something
│       └── about.html
└── static
    └── index.a8b29e.js

Then you could easily upload the html directory to your webserver, and static to your cdn.

@zeakd
Copy link

zeakd commented Feb 22, 2018

@devongovett parcel is useful when make static folder page like git page. However, two roots as default would not work with static folder page, and it looks little ugly with --public-url option. ex) PUBLICURL/../static/index.a6gy7d.js

@SmileyChris
Copy link

SmileyChris commented Feb 22, 2018

I like leaving them in the root as the simple default.

Just provide a new option for the build directory for entry points (the non-asset files), defaulting to the same as --out-dir. Your second example, @devongovett, would be reproducible with --out-dir=dist/static --entry-point-dir=dist/html.

@zeakd --public-url is really only required for the hashed assets, since the others keep (and can point to) their relative location. So it doesn't need to look ugly with --public-url.

@Chathula
Copy link

this feature is a must needed one.. +1 for this...

i like the folder structure that @devongovett was mentioned here.

dist/
├── html
│   ├── index.html
│   └── something
│       └── about.html
└── static
    └── index.a8b29e.js

@Munter
Copy link

Munter commented Feb 22, 2018

Please do not move url addressable and browser navigator files into a HTML folder. It is crucial that you retain the same folder structure from the web root as in your source directory for these files. If you fail to do this, it has implications on how a web server has to be set up, which dialers the ability to use standard static hosting and servers.

It's important to put hashed assets into a specific folder with a predictable name so it gets easy to configure cache headers for these immutable files without having to configure regex matches in your server

@npup
Copy link

npup commented Feb 22, 2018 via email

@swernerx
Copy link

I would suggest using base62 instead of hex as this leads to shorter hash IDs. This is especially powerful with a fast hash algorithm like xxhash. See also by little side-project: https://github.com/sebastian-software/asset-hash

@jamiebuilds
Copy link
Member

I think this goes along with multiple entry points (#189).

  • The only reason you'd care about the name of a file is if you need to be linking to it somehow (.html files become website URLs, .js files become importable modules)
  • If you need to link to it, that should be considered an "entry" point
  • All entry points should have unique names and should generate those names in the output
  • All other generated modules are implementation details, we can try to make the names nicer, but they are an implementation detail that can and will change and should not be relied upon.

@fu5ha
Copy link
Contributor

fu5ha commented Feb 23, 2018

In order for hashes of file contents to mean anything for cache busting and/or versioning, those files (and therefore hashes) need to be the same when you build with the same input code. This is not currently the case... I think something along the lines of #780 needs to be merged with or before this

@bitkomponist
Copy link

bitkomponist commented Feb 23, 2018

would it be sensible to make the naming configurable if needed (e.g. via a .rc file), and in that case just resort to old school get parameter cache busting?
the .rc file could look like this (im thinking asset type based):

{
"html":"dist/[name].html",
"css":"dist/css/[name].css",
"js":"dist/js/[name].[package.version].js"
}

@Munter
Copy link

Munter commented Feb 23, 2018

Adding naming configuration should really not be needed. A good default is perfectly fine. Keep the name of the asset if there was one. If it's not a linkable entry point, inject a hash into the name to achieve content addressability.

The only possible use for naming configuration would be if you have your static assets served through an external proxy CDN, so you need to update the urls from the parent asset from /assets/foo-hash.png to https://mycdn.host.com/assets/foo-hash.png for example

@ioss
Copy link

ioss commented Mar 7, 2018

As @zeakd wrote:
"I mean, how about src/some/path/something.js just to dist/some/path/something.fd5se2.js? and write src/assets/index.js if you needs assets folder."

IMHO this rule is the best implicit "asset" folder configuration option (because it is somewhat explicit, but does not need any additional options).
Apart from being able to keep meaningful names for assets if they are used elsewhere (downloaded or might also be SEO relevant), it also helps a lot to backreference the original source.

@Munter
Copy link

Munter commented Mar 8, 2018

@ioss not gathering the content addressable files in a common directory makes it harder to configure a server to send out an immutable cache header for them

@devongovett devongovett added this to the v1.7.0 milestone Mar 9, 2018
@ioss
Copy link

ioss commented Mar 9, 2018

@Munter I don't understand how the above "rule" from zeakd would prevent you from gathering files in a (or a handful of) common directory? Just have your assets in /src/assets/... and they would end up in /dist/assets/... and you would have the original proposal.
Except for the flattening path part, which shouldn't be a problem concerning the servers "immutable cache header rule".

Contrary: should you (for whatever reason?) do not want to send out immutable cache headers for some of the files, you'd have a hard time to exclude them, especially as they would change their name every time their content changes.

devongovett added a commit that referenced this issue Mar 19, 2018
@devongovett
Copy link
Member Author

Implemented this strategy in #1025. Please let me know what you think and help test it out!

@devongovett
Copy link
Member Author

Closing since #1025 is merged. Please help test using the master branch - a release will hopefully come next week!

@yonimor
Copy link

yonimor commented Oct 23, 2018

would it be sensible to make the naming configurable if needed (e.g. via a .rc file), and in that case just resort to old school get parameter cache busting?
the .rc file could look like this (im thinking asset type based):

{
"html":"dist/[name].html",
"css":"dist/css/[name].css",
"js":"dist/js/[name].[package.version].js"
}

I strongly support using an (optional) configuration file here.
Folder and file structure have a few implications on projects, especially at scale.
For example I may need to scan all image assets in my project to do some OCR (true use case from a past job) where sorting images to folders may have positive performance implications (I'm talking thousands of images).
Please consider this solution.

@Munter
Copy link

Munter commented Oct 24, 2018

@yonimor scan your source folder, not your build artefacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💬 RFC Request For Comments
Projects
None yet
Development

No branches or pull requests