Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CssSelector] Replace "name()" by "local-name()" to make it works in Firefox (XPath) #21011

Closed
wants to merge 5 commits into from

Conversation

BR0kEN-
Copy link

@BR0kEN- BR0kEN- commented Dec 21, 2016

Q A
Branch? 2.7
Bug fix? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #21004
License MIT

@stof
Copy link
Member

stof commented Dec 21, 2016

This needs to be check in detail to ensure it does not break support for namespaced names.

@BR0kEN-
Copy link
Author

BR0kEN- commented Dec 23, 2016

@stof, I've added tests for that.

@nicolas-grekas nicolas-grekas added this to the 2.7 milestone Dec 26, 2016
@xabbuh
Copy link
Member

xabbuh commented Dec 28, 2016

Do we really need the attribute check? What about using //div instead?

@BR0kEN-
Copy link
Author

BR0kEN- commented Jan 2, 2017

@xabbuh, it is interesting for me as well. What do you think?

@xabbuh
Copy link
Member

xabbuh commented Jan 2, 2017

@BR0kEN- I don't know. We should investigate why it was implemented this way initially in the Python lib the CssSelector is a port of.

@BR0kEN-
Copy link
Author

BR0kEN- commented Jan 2, 2017

@xabbuh What I'm proposing now is to change the name() by local-name() (merge this PR) and create a separate issue for further investigation/work. Do you have any objections to not do this and continue here?

@xabbuh
Copy link
Member

xabbuh commented Jan 2, 2017

We will still have to investigate why local-name() name() (sorry for the confusion here) was used initially (might also be caused by the Python lib). So we may still end up with the same things to do.

@BR0kEN-
Copy link
Author

BR0kEN- commented Jan 2, 2017

@xabbuh I've not found any usage of local-name() by git log -p src/Symfony/Component/CssSelector/XPath/XPathExpr.php in 2.7 branch. Did I do something wrong?

@stof
Copy link
Member

stof commented Jan 2, 2017

@xabbuh your comment is the wrong way. local-name() is what this PR introduces.

We use name() currently because this is what is used in the Python lib too (and we ported their logic)

@BR0kEN-
Copy link
Author

BR0kEN- commented Jan 2, 2017

@BR0kEN-
Copy link
Author

BR0kEN- commented Jan 2, 2017

If so, then this may be valuable: scrapy/cssselect#18 (comment)

@stof
Copy link
Member

stof commented Jan 2, 2017

@BR0kEN- yes it is this one

@BR0kEN-
Copy link
Author

BR0kEN- commented Jan 2, 2017

@stof, @xabbuh So I took that lib, added the test from this PR and run it. Results you can see on screenshots - cssselect uses //div instead of //*[name() = 'div'].

test_cssselect-local_name
test_cssselect-local_name-result

@BR0kEN-
Copy link
Author

BR0kEN- commented Jan 2, 2017

We can decide to continue with //*[local-name() = 'div'] - an easier way, or rework a lot pieces of code and implement //div. What is preferable? As for me - the first one, because it less critical change.

@xabbuh
Copy link
Member

xabbuh commented Jan 2, 2017

I agree once we are sure that this does not break any edge cases we did not consider yet.

@stof
Copy link
Member

stof commented Jan 3, 2017

@xabbuh I think we should work on updating our component to the newer version of cssselect for 3.3. This would fix a bunch of other cases too (much better support for pseudo elements for instance)

@nicolas-grekas
Copy link
Member

Given this is staling, should we consider it for master? Or how can we get confidence here?

@nicolas-grekas nicolas-grekas added the Good first issue Ideal for your first contribution! (some Symfony experience may be required) label Feb 16, 2017
@hhamon
Copy link
Contributor

hhamon commented Feb 21, 2017

I did some experiments with your PR and it seems your patch doesn't provide a real fix. The unit test you've added to perform a translated XPath query from a CSS expression against an XML document that includes namespaces is a false positive. See the code below of the unit test:

class TranslatorTest extends TestCase
{
    /** @dataProvider getXmlNamespaceTestData */
    public function testXmlNamespace($css, $value, $xpath, $namespace)
    {
        $translator = new Translator();
        $document = new \SimpleXMLElement(file_get_contents(__DIR__.'/Fixtures/namespace.xml'));

        $this->assertSame($xpath, $translator->cssToXPath($css, ''));

        foreach ($document->xpath($xpath) as $element) {
            $this->assertSame($namespace, $element->getNamespaces());
            $this->assertSame($value, (string) $element);
        }
    }

    public function getXmlNamespaceTestData()
    {
        return array(
            array('#table1 td:nth-child(2)', 'Bananas 1', "*[@id = 'table1']/descendant-or-self::*/*[local-name() = 'td' and (position() = 2)]", array('' => 'http://www.w3.org/TR/html4')),
            array('#table2 > tr > td:nth-child(1)', 'Apples 2', "*[@id = 'table2']/tr/*[local-name() = 'td' and (position() = 1)]", array('fruits' => 'http://www.w3schools.com/fruits')),
        );
    }
}

And the XML document your try to parse:

<?xml version="1.0" encoding="UTF-8"?>

<root xmlns:fruits="http://www.w3schools.com/fruits">
  <table id="table1" xmlns="http://www.w3.org/TR/html4">
    <tr>
      <td>Apples 1</td>
      <td>Bananas 1</td>
    </tr>
  </table>

  <fruits:table id="table2">
    <fruits:tr>
      <fruits:td>Apples 2</fruits:td>
      <fruits:td>Bananas 2</fruits:td>
    </fruits:tr>
  </fruits:table>
</root>

When the unit tests runs, it successfully validates all assertions because the XPath expression successfully matches a non namespaced element.

However, the unit test doesn't really pass with the second data set because it only validates the first assertion before the foreach loop. The $document->xpath($xpath) statement in the foreach loop returns an empty array. The generated XPath expression, whether it uses name() or local-name() function, doesn't match the namespace XML element. This is why this unit test with the second data set is a false positive.

To me, it seems the main problem is that the Translator class is not able to generate XPath queries able to match namespaced elements. So in the end, whether we use name() or local-name() functions, it works the same because namespaced elements are not well supported.

@hhamon
Copy link
Contributor

hhamon commented Feb 21, 2017

I tried the following expressions:

[OK]
*[@id = 'table1']/descendant-or-self::*/*[local-name() = 'td' and (position() = 2)]
//*[@id = 'table1']/descendant-or-self::*/*[local-name() = 'td' and (position() = 2)]
//*[@id = 'table2']/descendant-or-self::*/*[local-name() = 'td' and (position() = 1)]
//*[@id = 'table2']/descendant-or-self::*/*[name() = 'fruits:td' and (position() = 1)]

[KO]
*[@id = 'table2']/tr/*[local-name() = 'td' and (position() = 1)]

@BR0kEN-
Copy link
Author

BR0kEN- commented Feb 23, 2017

@hhamon, impressive findings. Thanks. Didn't know that namespaced queries aren't supported at all. For second case XPath should look something like this //*[@id = 'table2']/*[local-name() = 'tr']/*[local-name() = 'td' and (position() = 1)] (without namespace validation).

@fabpot
Copy link
Member

fabpot commented Mar 22, 2017

Closing as this PR cannot be merged as is. Issue is still open if someone wants to take over.

@fabpot fabpot closed this Mar 22, 2017
@fabpot
Copy link
Member

fabpot commented Mar 22, 2017

Closing as this PR cannot be merged as is. Issue is still open if someone wants to take over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug CssSelector Good first issue Ideal for your first contribution! (some Symfony experience may be required) Status: Needs Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants