Skip to content

Parse formatting inside HTML lists #1239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 13, 2018

Conversation

samimussbach
Copy link

@samimussbach samimussbach commented Jan 6, 2018

Description

As a follow up of #1215 and #945: The current fix swallows texts in formatted lists when list item does not contain solely text formatted all the same.
This fix renders all texts.

Fixes # (issue)
#945

Checklist:

  • I have run composer check and no errors were reported
  • The new code is covered by unit tests
  • I have update the documentation to describe the changes

Edit: I modified tests because it shows another issue that is not part of this fix
Running compooser check leads to
Script ./vendor/bin/php-cs-fixer fix --ansi --dry-run --diff handling the check event returned with error code 8

@troosan
Copy link
Contributor

troosan commented Jan 7, 2018

@samimussbach wouldn't the following do the trick?

    private static function parseListItem($node, $element, &$styles, $data)
    {
        $cNodes = $node->childNodes;
        if (!empty($cNodes)) {
            $text = trim($node->textContent);
            $element->addListItem($text, $data['listdepth'], $styles['font'], $styles['list'], $styles['paragraph']);
        }
    }

@troosan
Copy link
Contributor

troosan commented Jan 7, 2018

other solution

    private static function getTextRecursively(\DOMNode $node)
    {
        $text = '';
        $cNodes = $node->childNodes;
        if (!empty($cNodes)) {
            foreach ($cNodes as $cNode) {
                if ($cNode->nodeName == '#text') {
                    $text .= ' ' . trim($cNode->nodeValue);
                } else {
                    $text .= ' ' . self::getTextRecursively($cNode);
                }
            }
        }

        return trim($text);
    }

@samimussbach
Copy link
Author

The latter would work, indeed. Your first suggestion has some problems with whitespacing but the second works with our test cases.
Do you add a commit or should I (I think you may as it is your fix)?

When we go that way: wouldn't it be possible to parse styles as well in getTextRecursively and get #1215 tackled as well?

@troosan
Copy link
Contributor

troosan commented Jan 8, 2018

@samimussbach actually, here is even better, it parses the styling inside the items:

    private static function parseListItem($node, $element, &$styles, $data)
    {
        $cNodes = $node->childNodes;
        if (!empty($cNodes)) {
            $listRun = $element->addListItemRun($data['listdepth'], $styles['list'], $styles['paragraph']);
            foreach ($cNodes as $cNode) {
                Html::parseNode($cNode, $listRun, $styles['list'], $data['listdepth']);
            }
        }
    }

@samimussbach
Copy link
Author

Yes, very nice! This is what I did not manage to do. This works with our real cases. Test is failing but this is due to the crude structure of docx, test should be updated to match the real case. Bravo, #1215 is solved as well as far as I see.

@troosan
Copy link
Contributor

troosan commented Jan 8, 2018

There is still an issue though when you have multiple lists, the numbering is not restarting.

@samimussbach
Copy link
Author

Indeed. If you could fix this as well this would be superb as it hits us as well. See #508 and #10 for the numbering restart issue

@samimussbach
Copy link
Author

@troosan would you push your solution to this PR? You should get the credit but if you want to I could commit it as well.
I think this could be merged even if the numbering restart issue remains as it solved quite a lot by itself and does not break anything (as far as I see). What do you think?

@troosan
Copy link
Contributor

troosan commented Jan 11, 2018

yes, I will merge it. I was just waiting to have it properly coved by some unit tests, almost done.

@samimussbach
Copy link
Author

I just found a case to consider:

Cannot add TextRun in ListItemRun. in [...]phpword/src/PhpWord/Element/AbstractContainer.php:229 Stack trace: #0 [...]/phpword/src/PhpWord/Element/AbstractContainer.php(129): PhpOffice\PhpWord\Element\AbstractContainer->checkValidity('TextRun') #1 [internal function]: PhpOffice\PhpWord\Element\AbstractContainer->addElement('TextRun', Array) #2 [...]/phpword/src/PhpWord/Element/AbstractContainer.php(111): call_user_func_array(Array, Array) #3 [...]/phpword/src/PhpWord/Shared/Html.php(208): PhpOffice\PhpWord\Element\AbstractContainer->__call('addtextrun', Array) #4 [internal function]: PhpOffice\PhpWord\Shared\Html::parseParagraph(Object(DOMElement), Object(PhpOffice\PhpWord\Element\ListItemRun), Array) #5 [...]/phpword/src/PhpWord/Shared/Html.php(158): call_user_func_array(Array, Array) #6 [...]/phpword/src/PhpWord/Shared/Html.php(397): PhpOffice\PhpWord\Shared\Html::parseNode(Object(DOMElement), Object(PhpOffice\PhpWord\Element\ListItemRun), Array, 0) #7 [internal function]: PhpOffice\PhpWord\Shared\Html::parseListItem(Object(DOMElement), Object(PhpOffice\PhpWord\Element\Cell), Array, Array) #8 [...]/phpword/src/PhpWord/Shared/Html.php(158): call_user_func_array(Array, Array) #9 [...]/phpword/src/PhpWord/Shared/Html.php(190): PhpOffice\PhpWord\Shared\Html::parseNode(Object(DOMElement), Object(PhpOffice\PhpWord\Element\Cell), Array, Array) #10 [...]/phpword/src/PhpWord/Shared/Html.php(172): PhpOffice\PhpWord\Shared\Html::parseChildNodes(Object(DOMElement), Object(PhpOffice\PhpWord\Element\Cell), Array, Array) #11 [...]/phpword/src/PhpWord/Shared/Html.php(190): PhpOffice\PhpWord\Shared\Html::parseNode(Object(DOMElement), Object(PhpOffice\PhpWord\Element\Cell), Array, Array) #12 [...]/phpword/src/PhpWord/Shared/Html.php(172): PhpOffice\PhpWord\Shared\Html::parseChildNodes(Object(DOMElement), Object(PhpOffice\PhpWord\Element\Cell), Array, Array) #13 [...]/phpword/src/PhpWord/Shared/Html.php(68): PhpOffice\PhpWord\Shared\Html::parseNode(Object(DOMElement), Object(PhpOffice\PhpWord\Element\Cell)) #14 /opt/Development/dplan/dplan/demosplan/DemosPlanAssessmentTableBundle/Logic/AssessmentTableServiceOutput.php(1294): PhpOffice\PhpWord\Shared\Html::addHtml(Object(PhpOffice\PhpWord\Element\Cell), '<body><p>Test</...')

The markup is

  <p>Test</p>
  <ul>
    <li>
      <p>Item 1</p>
    </li>
  </ul>

Which is (sad but true) reallife markup generated by TinyMCE

@troosan
Copy link
Contributor

troosan commented Jan 12, 2018

very sad indeed :-S

@troosan troosan changed the title Fix missing text in formatted lists Parse formatting inside HTML lists Jan 12, 2018
@troosan troosan merged commit 4c68ebb into PHPOffice:develop Jan 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants