Skip to content

Conversation

@raylua2566
Copy link

@raylua2566 raylua2566 commented Jul 4, 2017

Feature: Wrap a text node in to current or parent node
新特征: 封装一个纯文本到一个当前或者父节点里
TODO:  ... et. not handle yet, create a ElementNode function to
handle them?
TODO: 一些特殊字符还没有处理, 考虑创建一个ElementNode实例方法处理特殊字符?

Feature: Wrap a text node in to current or parent node
新特征: 封装一个纯文本到一个当前或者父节点里
TODO:  ... et. not handle yet, create a ElementNode function to
handle them?
TODO: 一些特殊字符还没有处理, 考虑创建一个ElementNode实例方法处理特殊字符?
@msva
Copy link
Owner

msva commented Jul 4, 2017

Could you, please:

  1. describe the use-case for this?
  2. provide a test code, that will show this usecase (and so fail on current master, but will work after your PR)?
  3. strip Chinese comments (keeping only english ones)?

Well, 3rd one is pretty cosmetic and have no consequences on how library works,
but I'm asking about first two, because I am missing the end purpose (real-world example, when it would be useful) of that changes.

@raylua2566
Copy link
Author

raylua2566 commented Jul 4, 2017

Please forgive my grammar mistakes.

From README.md

Limitations

  • Textnodes are no separate tree elements; in local root = htmlparser.parse("<p>line1<br />line2</p>"), root.nodes[1]:getcontent() is "line1<br />line2", while root.nodes[1].nodes[1].name is "br"

Now

  • My PR will wrap the Text line1 and line2 into a node named text, after that while root.nodes[1].nodes[1].name is "text", and #root.nodes[1].nodes is 3

real-world example, when it would be useful

  • xpath has a func text() to get literals text . Do not you want it?
  • Get ordered tags within a a tag contains <text> and <img>.

Why test case failed?

  • Test case file "tst/init.lua" that function test_order() case will not true at line 292, because the text 1 performance for <text>1</text> implicitly, the same as texts 2 3 ...10. so the :not(n)'s result is 14 instead of 4

Discussion

  • the origin str contains <text>some text</text> ^_^

Copy link
Owner

@msva msva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping?

textcontent = string.gsub(textcontent, "&nbsp;", '')
if textcontent ~= '' then
index = index + 1
local textTag = ElementNode:new(index, 'text', node, descend, textstart+1, textend)
Copy link
Owner

@msva msva Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, that it would be better to rename it to something that will not collide.

What if user will use parser on <text>moo</text>?

On the other hand, we already have _text. So, I guess, it can be something like _textonly, or something like that.

Although, maybe better way would be to move current _text to _content, and make this to be a _text, but it will brake the API 😿

Which variant is more acceptable for you to implement?

Which variant would be more acceptable for you to implement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _content is suitable for _text.
_text is a subset of _content, what do you think?

@msva
Copy link
Owner

msva commented Jun 9, 2019

Ping?

Sorry for long disappearing, I was having a lot of personal issues :-/

Let's discuss further implementation of that idea?

@msva
Copy link
Owner

msva commented Aug 25, 2021

I'll close it for now, since it is incomptible with current code base now.

If you (or someby else) want to continue work on that - feel free to open new PR.

@msva msva closed this Aug 25, 2021
@remysucre remysucre mentioned this pull request Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants