Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernize and simplify the library itself (not the polyfills) #37

Open
2 tasks done
romainmenke opened this issue May 26, 2024 · 0 comments
Open
2 tasks done

Modernize and simplify the library itself (not the polyfills) #37

romainmenke opened this issue May 26, 2024 · 0 comments

Comments

@romainmenke
Copy link
Member

romainmenke commented May 26, 2024

The polyfill-library npm package is currently three things:

  1. a collection of polyfills
  2. a script to expose information about the collection
  3. an opinionated implementation of how to read files from disk

I personally think we should drop (3)

It makes these assumptions:

  • reading from disk is slow
  • memory is cheap and abundant
  • there is no cache in front of this library
  • caching is needed
  • it will be part of a long running process
  • reading from the filesystem will never error

I think these assumptions are only minimally helpful while also being naive and harmful.

Users of this library are much better positioned to implement filesystem reads within their specific context and constraints.

This does make it harder to use this library but only marginally so.


At the same time we should consider modernizing the source code of the library

  • publish as esm only
  • publish types
  • top level async await
  • cleanup dependencies
  • move all data to sqlite when sqlite support stabilizes
  • ...
romainmenke added a commit that referenced this issue Jun 30, 2024
romainmenke added a commit that referenced this issue Oct 8, 2024
see : #37

The current implementation of reading files and creating polyfill
bundles doesn't really make sense.

It is both streaming and cached.
These are not things you can/should combine.

Either you make something streaming because you are concerned about
memory usage.
Or you have a cache and try to serve directly from memory.

So the current implementation is needlessly complex while also being
slow and using too much memory.

The slowness is mostly the result of having too many cache misses and an
implementation that is way too complex for what is actually going on.

------------

This change:
- make it a real streaming implementation
- reduce the number of filesystem operations by combining the
`meta.json` files for all polyfills and lazily loading this in memory
- reduce the number of async operations, especially waiting on things
that are actually sync reads
- pre-compute the dependency graph during the build process and store
the toposort order in `meta.json`
- remove as many needless conversions from streams to strings and back
- make the `fs` functions user configurable

------------

Configurable `fs` functions:

```js
/**
 * Set the file system implementation to use.
 * The default implementation uses the Node.js `fs` module.
 *
 * @param {readFile} readFile - A function that reads a file and returns a promise.
 * @param {createReadStream} createReadStream - A function that creates a read stream.
 */
function setFs(readFile, createReadStream) {
	fsImpl = {
		readFile,
		createReadStream
	};
}
```

This is a replacement for those users who want caching or other
optimizations.

I don't want to add an explicit cache mechanic as I still believe that
an in memory cache implemented in JavaScript is dangerous when running
this as a hosted service.

It is too easy to request different polyfills one by one, thrashing both
the `LRU` cache layer and the garbage collector to eventually starve the
host of both memory and CPU.

Either you want to have an edge cache in front of a hosted service
(making in memory cache less relevant) or you want to use something like
sqlite to optimize filesystem reads.

-------

I don't consider the removal of the `LRU` cache a breaking change
because this was an internal implementation detail. But I am open to
other opinions here :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant