Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Road to 1.0 #442

Closed
1 of 11 tasks
matthewmueller opened this issue Apr 8, 2014 · 19 comments
Closed
1 of 11 tasks

Road to 1.0 #442

matthewmueller opened this issue Apr 8, 2014 · 19 comments

Comments

@matthewmueller
Copy link
Member

Jotting down some quick thoughts of what I'd like to see to make 1.0 awesome:

Must haves:

  • Rename internal attributes to match their DOM keys (type => nodeType)
  • Cleaner code, with JSDoc block syntax
  • A smaller browser build for webworkers (that's compatible with all the major package managers)
  • A more defined list of API methods that are "core jQuery" to be ridiculously well-tested and match the jQuery API exactly.
  • An official way to stream with cheerio that doesn't add a lot of bloat / complexity. Maybe this is a completely separate repository. Maybe something along the lines of:
var $ = cheerio.stream();
req.pipe($('.signup').addClass('blah'))
  • More support for CSS3 selectors in CSSSelect
  • Clearer options. Optimize for what we actually use, possibly removing some options.
  • XML love. More tests, more support.
  • More forgiving parser for crazy angular things.

Nice to haves:

  • Remove lodash/underscore entirely
  • A website! And a logo.

What are your thoughts? Anyone else have anything they'd like to see?

/cc @fb55 @davidchambers @jugglinmike

@fb55
Copy link
Member

fb55 commented Apr 8, 2014

First of all, the points related to my modules:

More support for CSS3 selectors in CSSSelect

The only missing selectors right now should be positional jQuery extensions (:first, :last, :eq etc.).

A smaller browser build for webworkers (that's compatible with all the major package managers)

I think I'll move the parser & tokenizer part of htmlparser2 to their own module, so neither the streaming stuff, nor eg. the FeedHandler are bundled. cheerio would simply have to change some requires.

More forgiving parser for crazy angular things.

That should also be already fixed (eg. brackets in attributes). Implementing the HTML5 treebuilder algorithm would be nice, though (but requires more time than I currently have).

Rename internal attributes to match their DOM keys (type => nodeType)

It's now possible to enable the DOM 1 API, contributed by @jugglinmike to domhandler. Generally speaking I prefer that API as well; a future major-release of domhandler might switch completely.

@fb55
Copy link
Member

fb55 commented Apr 8, 2014

The stream interface is a great idea, but I don't really know how it will fit into the existing structure of cheerio. I have some ideas, I'll write about it tomorrow.

Maybe it might be worth honoring the npm philosophy (a bit) and moving some components to their own module (eg. the serialization stuff). domutils is also way to big, it would be nice to share some logic. The individual components would have their own README (which I prefer over inline documentation) and tests.

@matthewmueller How would you feel about creating a Github organization for cheerio and related modules?

@fb55 fb55 closed this as completed Apr 8, 2014
@fb55 fb55 reopened this Apr 8, 2014
@matthewmueller
Copy link
Member Author

great idea, I've created the org: https://github.com/cheeriojs and made you guys owners.

@fb55
Copy link
Member

fb55 commented Apr 8, 2014

(Closed it by accident, sorry.)

Remove lodash/underscore entirely

jdalton published the individual functions of lodash as separate modules. We aren't using many functions, so replacing lodash with those modules might be worth to consider.

@matthewmueller
Copy link
Member Author

i think we can do this now, but it'd be cool to be able to override the API methods. So we could have a module that makes .each, .map, .select, etc. more like this: https://github.com/matthewmueller/array#selectfnstr

@matthewmueller
Copy link
Member Author

I'd probably be fine with @jdalton's individual lodash functions, I really just don't like the _.isString type-checking that we currently do.

@fb55
Copy link
Member

fb55 commented Apr 8, 2014

I really just don't like the _.isString type-checking that we currently do.

Yup. If someone doesn't pass a primitive, it should be their problem. Wrapper objects are annoying as hell and shouldn't be used anyway.

@fb55 fb55 mentioned this issue Apr 8, 2014
@jdalton
Copy link

jdalton commented Apr 8, 2014

i think we can do this now, but it'd be cool to be able to override the API methods. So we could have a module that makes .each, .map, .select, etc. more like this:

Many of Lo-Dash methods support ".where"_ and ".pluck"_ callback shorthands. So for example if you wanted to check if all the elements were disabled you could do:

_.every(elements, 'disabled');

@matthewmueller
Copy link
Member Author

@jdalton oh that's awesome. I guess the big thing is making sure it's compatible with jquery. But we could extend and just pass the values into those functions.

@fb55
Copy link
Member

fb55 commented Apr 9, 2014

Okay, what I wanted to write about the stream API:

As before, $ will simply be a guard in front the constructor (which probably inherits from node's Writable stream). .stream's only purpose is to encapsulate the options.

Chain methods such as addClass will simply return this, getter methods (eg. .attr('foo)) will return a promise (alternatively, they could accept a callback argument).

I'm still not sure how to pursue this and traversal methods:

  • They could be promises, which will resolve to cheerio objects (once parsing is done).
  • They could also be object streams (firing as soon as an element is matched).

The second solution is much nicer internally, especially as it honors the streaming idea. It should also be possible to implement both variants (by having a then method on the stream).

Operations that return only a fixed sized set of elements (.first, .get(n)) could also be resolved earlier.

It would be nice to support CSS 3's (nth-)last-(child|of-type) and friends. To do this, cheerio actually has to parse the selector (or use CSSwhat), find out which ones are safe to execute when entering a node (tagname, #id, [attr] etc.), when leaving a node (:has, :contains) or when leaving a parent (:last-child), then apply appropriate traversing and do the same again. Traversal can be done using the implementations of .parent etc.

As a simplification, only leaving a node could fire an event, containing the node and it's level (the position in the stack). It's then possible to subscribe to done events of lower levels (used for traversing). Eg. .parent (used by >) would receive elements, subscribe to their parent level's done event, then return them.

As soon as a user-provided function is executed, it might be best to delay execution until parsing is done. A special case might be passing a generator function, which results in co-like behavior (probably only relevant for .each). This requires keeping track of processed elements though.

Additionally, it might be nice to add some of the streaming methods from trumpet.

@jugglinmike
Copy link
Member

Just checked off "Rename internal attributes to match their DOM keys (type => nodeType)" since #561 has landed

@fb55 fb55 mentioned this issue Dec 31, 2014
@ianstormtaylor
Copy link

Any word on 1.0? Seems weird that this module is so widely depended on but it's still in the 0.x range which makes managing it as a dependency harder 😢

@matthewmueller
Copy link
Member Author

haha, i'm not sure these are going to get done without some new love at this point. as far as i can tell cheerio is quite stable though.

you guys want to bump it?

@fb55
Copy link
Member

fb55 commented Jul 20, 2016

IMHO #863 would be a good place to bump it, as people will have to actually check their code. Or should we release v2 at that point?

@jugglinmike jugglinmike mentioned this issue Feb 26, 2018
4 tasks
@andykais
Copy link

andykais commented Dec 6, 2018

I see 1.0.0-rc.2 is the latest version now (congrats!). Has been streaming added? I see an old PR #637 which implements streaming, but other than that, no mention of it in the #1145 PR, and none in the docs

@UziTech
Copy link

UziTech commented Sep 11, 2019

Is this still the todo list before v1.0.0 is stable? I would like to help it get over the bump.

@matthewmueller
Copy link
Member Author

matthewmueller commented Sep 12, 2019

This is woefully out-of-date (2014!!)

Streaming didn't make it in 1.0.0. I think it's tricky but doable. The main issue is that you're now working with partial DOM trees. Unfortunately nobody spent time on it. I'd welcome any attempts to spec it out though!

@UziTech
Copy link

UziTech commented Sep 12, 2019

@matthewmueller Is there a way I can help? It seems like a lot of open issues are irrelevant. What needs to be completed to release v1.0.0?

@matthewmueller
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants