Add "associated data" to support namespaces #29

kmcallister · 2014-10-18T20:14:18Z

The conceptual model for Atom becomes

struct Atom<T = ()> {
    string: String,
    assoc: Option<T>,
}

Any atom where assoc.is_some() is interned in the dynamic table. html5ever and Servo will use this to store namespace and prefix (see #26), with None representing the HTML namespace and an empty prefix. This shrinks a qualified name to a single word, without any performance penalty to HTML names.

One complication is that we need two different notions of equality on this associated data. An XML document can contain nodes which use different prefixes to produce the same qualified name. We can't combine these in the interning table, because it's possible to read the original prefix out of the DOM. But when we're comparing atoms for equality we need to ignore the prefix.

Also, the global interning table needs to be aware of the type T somehow. We could have a set of tables indexed by the type T, hopefully resolving the polymorphism at compile time. (It should really be more like the entire crate is parametrized on T.) In C++ I would use a static member variable in a class template, but I don't know of any analogous mechanism in Rust (I checked and static muts inside generic functions get combined.)

The text was updated successfully, but these errors were encountered:

kmcallister · 2014-10-18T22:26:39Z

without any performance penalty to HTML names

This is true for parsing the HTML syntax, but it's not quite true within Servo. For example we need to recognize that

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns:bleh="http://www.w3.org/1999/xhtml">
<bleh:img src="http://www.rust-lang.org/logos/rust-logo-256x256-blk.png" />
</html>

contains an HTMLImageElement. This is a problem because

match node.name() {
    atom!(img) => ...

will produce pretty complicated machine code. (And even getting the macro to work at all will be tricky.) A name that matches could be a static/inline atom or it could be a dynamic atom with any prefix or none. The best hope is to destructure node.name() first and handle the (unlikely) case where it's a dynamic atom in a totally separate cold path, but I don't know if we can convince LLVM to do that reliably.

For this reason I think atoms should include the namespace URI, which is canonical and part of the atom's identity, but not the prefix, which is a detail about how that URI was obtained. html5ever will have its own "atom with prefix" type which Servo can also use.

A remaining problem is that the default namespace for attributes is "" while the default namespace for HTML tag names is "http://www.w3.org/1999/xhtml". So the meaning of assoc: None in Servo has to be contextual, which is not great.

Ms2ger · 2014-10-19T09:30:08Z

I don't think this design makes sense conceptually for prefixes; prefixes are completely independent of the namespace outside the HTML parser.

SimonSapin · 2016-11-02T15:43:23Z

I’m also not convinced about this proposal, and we’ve been doing ok without it all this time. And with #178 string-cache is becoming a more general-purpose library that known nothing about HTML or the DOM.

Closing as wontfix.

kmcallister added the performance label Oct 22, 2014

SimonSapin closed this as completed Nov 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "associated data" to support namespaces #29

Add "associated data" to support namespaces #29

kmcallister commented Oct 18, 2014

kmcallister commented Oct 18, 2014

Ms2ger commented Oct 19, 2014

SimonSapin commented Nov 2, 2016

Add "associated data" to support namespaces #29

Add "associated data" to support namespaces #29

Comments

kmcallister commented Oct 18, 2014

kmcallister commented Oct 18, 2014

Ms2ger commented Oct 19, 2014

SimonSapin commented Nov 2, 2016