See [the HTML spec](http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#the-input-byte-stream) and the [WHATWG Encoding spec](http://encoding.spec.whatwg.org/). This also entails noticing `<meta charset=...>` and `<meta http-equiv="Content-Type">` as we parse.