Skip to content

Commit

Permalink
Remove InfallibleTokenizer (#102)
Browse files Browse the repository at this point in the history
* Remove InfallibleTokenizer

InfallibleTokenizer and Tokenizer::infallible were added as shortcuts
for when the underlying input cannot fail, to make iteration slightly
more ergonomic.

In recent Rust versions it's possible to run `for Ok(token) in ...`,
which makes this useless.

* remove impl

* fix

* fix doc

* fix benchmarks

* link pr
  • Loading branch information
untitaker authored Oct 30, 2024
1 parent 0530ebd commit 1186517
Show file tree
Hide file tree
Showing 8 changed files with 22 additions and 35 deletions.
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# 0.7.0

- Removal of `Tokenizer.infallible()`. Use `for Ok(token) in Tokenizer::new()` instead. [PR 102](https://github.com/untitaker/html5gum/pull/102)
- Add more convenience functions to `tree-builder` feature, equivalent to `html5ever::driver`. [PR 101](https://github.com/untitaker/html5gum/pull/101)

# 0.6.1

- Fix a bug where html5gum would interpret tags inside of `<script>`. [PR 98](https://github.com/untitaker/html5gum/pull/98)
- Restructured the crate slightly, though there _should_ not be any breaking changes. [PR 99](https://github.com/untitaker/html5gum/pull/99)
- Added a way to integrate with `scraper` crate and the `html5ever` tree builder, see `examples/scraper.rs`.

# Before 0.6.1

Who knows...
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ test-generator = "0.3.0"
serde_bytes = "0.11.5"
glob = "0.3.0"
libtest-mimic = "0.8.1"
iai = "0.1.1"
# https://github.com/bheisler/iai/issues/34
iai = { git = "https://github.com/sigaloid/iai", rev = "6c83e942" }
# required for examples/scraper.rs
scraper = "0.20.0"
argh = "0.1.12"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ use html5gum::{Tokenizer, Token};
let html = "<title >hello world</title>";
let mut new_html = String::new();

for token in Tokenizer::new(html).infallible() {
for Ok(token) in Tokenizer::new(html) {
match token {
Token::StartTag(tag) => {
write!(new_html, "<{}>", String::from_utf8_lossy(&tag.name)).unwrap();
Expand Down
2 changes: 1 addition & 1 deletion benches/patterns.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use html5gum::Tokenizer;

fn pattern(pattern: &str, i: usize) {
let s: String = black_box((0..i).map(|_| pattern).collect());
for _ in Tokenizer::new(&s).infallible() {}
for Ok(_) in Tokenizer::new(&s) {}
}

macro_rules! pattern_tests {
Expand Down
5 changes: 2 additions & 3 deletions src/emitters/callback.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,8 @@
//! });
//!
//! let input = r#"<h1><span class=hello>Hello</span> world!</h1>"#;
//! let text_fragments = Tokenizer::new_with_emitter(input, emitter)
//! .infallible()
//! .collect::<Vec<_>>();
//! let Ok(text_fragments) = Tokenizer::new_with_emitter(input, emitter)
//! .collect::<Result<Vec<_>, _>>();
//!
//! assert_eq!(text_fragments, vec![b"Hello".to_vec()]);
//! ```
Expand Down
2 changes: 1 addition & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,4 @@ pub use error::Error;
pub use htmlstring::HtmlString;
pub use reader::{IoReader, Readable, Reader, StringReader};
pub use state::State;
pub use tokenizer::{InfallibleTokenizer, Tokenizer};
pub use tokenizer::Tokenizer;
2 changes: 1 addition & 1 deletion src/reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ impl<'a, R: 'a + Reader> Readable<'a> for R {
/// let html = "<title >hello world</title>";
/// let mut new_html = String::new();
///
/// for token in Tokenizer::new(html).infallible() {
/// for Ok(token) in Tokenizer::new(html) {
/// match token {
/// Token::StartTag(tag) => {
/// write!(new_html, "<{}>", String::from_utf8_lossy(&tag.name)).unwrap();
Expand Down
27 changes: 0 additions & 27 deletions src/tokenizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,30 +123,3 @@ impl<R: Reader, E: Emitter> Iterator for Tokenizer<R, E> {
}
}
}

/// A kind of tokenizer that directly yields tokens when used as an iterator, so `Token` instead of
/// `Result<Token, _>`.
///
/// This is the return value of [`Tokenizer::infallible`].
#[derive(Debug)]
pub struct InfallibleTokenizer<R: Reader<Error = Infallible>, E: Emitter>(Tokenizer<R, E>);

impl<R: Reader<Error = Infallible>, E: Emitter> Tokenizer<R, E> {
/// Statically assert that this iterator is infallible.
///
/// Call this to get rid of error handling when parsing HTML from strings.
pub fn infallible(self) -> InfallibleTokenizer<R, E> {
InfallibleTokenizer(self)
}
}

impl<R: Reader<Error = Infallible>, E: Emitter> Iterator for InfallibleTokenizer<R, E> {
type Item = E::Token;

fn next(&mut self) -> Option<Self::Item> {
match self.0.next()? {
Ok(token) => Some(token),
Err(e) => match e {},
}
}
}

0 comments on commit 1186517

Please sign in to comment.