Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: represent module source as [u8] #152

Closed
wants to merge 3 commits into from
Closed

Conversation

lucacasonato
Copy link
Member

Previously module source was represented as String. This does not work for
non-textual formats like WASM. As such, module source is now represented as
[u8].

In JS, the module source is now represented as a Uint8Array.

Towards #150

lucacasonato added a commit to denoland/deno_ast that referenced this pull request Apr 28, 2022
`Arc<String>` to `Arc<str>` (double heap pointer -> single heap pointer)

Also enforces that BOM handling is a `deno_graph` concern. See
denoland/deno_graph#152
src/graph.rs Outdated
@@ -622,7 +622,7 @@ impl Module {
kind: ModuleKind,
parsed_source: ParsedSource,
) -> Self {
let source = parsed_source.source().text();
let source = parsed_source.source().text_str().as_bytes().to_vec();
Copy link
Member

@dsherret dsherret Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will increase our memory usage which might be significant for large files. For maybe_source, perhaps we should use an enum instead that is either Arc<str> or Arc<u8> and it shares the memory from parsed_source.source().text()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arc<str> and Arc<[u8]> technically have the same underlying memory layout, so we could convert between the two by just transmuting (this is unsafe though). Very painful that this is really just a Rust type system limitation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change the internal property to non-public, for the instances when we don't have a parsed module (e.g. JSON modules) and store it as an Arc<[u8]> only when we don't have a parsed source. Then provide a pub fn source() -> Option<Arc<str>> and pub fn source_bytes() -> Option<Arc<[u8]>> which we generate on the fly?

lucacasonato added a commit to denoland/deno_ast that referenced this pull request Apr 28, 2022
`Arc<String>` to `Arc<str>` (double heap pointer -> single heap pointer)

Also enforces that BOM handling is a `deno_graph` concern. See
denoland/deno_graph#152
@lucacasonato lucacasonato force-pushed the string_to_u8 branch 3 times, most recently from 4b380c5 to 6a574a5 Compare May 20, 2022 12:39
Previously, if a specifier was imported more than once, with conflicting
assertions, we'd only really consider the final import. This is wrong:
conflicting assertions are an error in-themselves already. I think this
behaviour is still wrong, because if a module is imported with incorrect
assertions later in the graph after the module has already been loaded by
an import with a correct assertion, no error will be raised.
This is a pre-requisite for WASM imports that additionally allows us to
minimize clones when doing things like BOM stripping.
Previously module source was represented as `String`. This does not work
for non-textual formats like WASM. As such, module source is now
represented as `[u8]`.
EinsteinCCC pushed a commit to EinsteinCCC/deno_ast that referenced this pull request Oct 14, 2024
`Arc<String>` to `Arc<str>` (double heap pointer -> single heap pointer)

Also enforces that BOM handling is a `deno_graph` concern. See
denoland/deno_graph#152
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants