Skip to content

Add "associated data" to support namespaces #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kmcallister opened this issue Oct 18, 2014 · 3 comments
Closed

Add "associated data" to support namespaces #29

kmcallister opened this issue Oct 18, 2014 · 3 comments

Comments

@kmcallister
Copy link
Contributor

The conceptual model for Atom becomes

struct Atom<T = ()> {
    string: String,
    assoc: Option<T>,
}

Any atom where assoc.is_some() is interned in the dynamic table. html5ever and Servo will use this to store namespace and prefix (see #26), with None representing the HTML namespace and an empty prefix. This shrinks a qualified name to a single word, without any performance penalty to HTML names.

One complication is that we need two different notions of equality on this associated data. An XML document can contain nodes which use different prefixes to produce the same qualified name. We can't combine these in the interning table, because it's possible to read the original prefix out of the DOM. But when we're comparing atoms for equality we need to ignore the prefix.

Also, the global interning table needs to be aware of the type T somehow. We could have a set of tables indexed by the type T, hopefully resolving the polymorphism at compile time. (It should really be more like the entire crate is parametrized on T.) In C++ I would use a static member variable in a class template, but I don't know of any analogous mechanism in Rust (I checked and static muts inside generic functions get combined.)

@kmcallister
Copy link
Contributor Author

without any performance penalty to HTML names

This is true for parsing the HTML syntax, but it's not quite true within Servo. For example we need to recognize that

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns:bleh="http://www.w3.org/1999/xhtml">
<bleh:img src="http://www.rust-lang.org/logos/rust-logo-256x256-blk.png" />
</html>

contains an HTMLImageElement. This is a problem because

match node.name() {
    atom!(img) => ...

will produce pretty complicated machine code. (And even getting the macro to work at all will be tricky.) A name that matches could be a static/inline atom or it could be a dynamic atom with any prefix or none. The best hope is to destructure node.name() first and handle the (unlikely) case where it's a dynamic atom in a totally separate cold path, but I don't know if we can convince LLVM to do that reliably.

For this reason I think atoms should include the namespace URI, which is canonical and part of the atom's identity, but not the prefix, which is a detail about how that URI was obtained. html5ever will have its own "atom with prefix" type which Servo can also use.

A remaining problem is that the default namespace for attributes is "" while the default namespace for HTML tag names is "http://www.w3.org/1999/xhtml". So the meaning of assoc: None in Servo has to be contextual, which is not great.

@Ms2ger
Copy link
Contributor

Ms2ger commented Oct 19, 2014

I don't think this design makes sense conceptually for prefixes; prefixes are completely independent of the namespace outside the HTML parser.

@SimonSapin
Copy link
Member

I’m also not convinced about this proposal, and we’ve been doing ok without it all this time. And with #178 string-cache is becoming a more general-purpose library that known nothing about HTML or the DOM.

Closing as wontfix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants