Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese or Chines language utf-8 not supporting in doc reading #1817

Closed
Tarun-developer opened this issue Feb 4, 2020 · 2 comments · Fixed by #2664
Closed

Japanese or Chines language utf-8 not supporting in doc reading #1817

Tarun-developer opened this issue Feb 4, 2020 · 2 comments · Fixed by #2664
Assignees
Labels
Bug Report MS-DOC (Word 97) Status: Waiting for feedback Question has been asked, waiting for response from PR author
Milestone

Comments

@Tarun-developer
Copy link

Tarun-developer commented Feb 4, 2020

Japanese or Chines language utf-8 not supporting in doc reading
example 1 にっぽんこく、にほんこ
example 2 監警會決定不在司法覆核裁決前發布首份階段性報告
such kind of languages is not reading in doc format

Steps to Reproduce

function phpReader($source) {
    $phpWord = \PhpOffice\PhpWord\IOFactory::load($source, 'MsDoc');
    $text = ExtractText($phpWord);
  return $txt ;
}
function ExtractText($obj, $nested = 0) {
    $txt = "";
    if (method_exists($obj, 'getSections')) {
        foreach ($obj->getSections() as $section) {
            $txt .= " " . ExtractText($section, $nested + 1);
        }
    } else if (method_exists($obj, 'getElements')) {
        foreach ($obj->getElements() as $element) {
            $txt .= " " . ExtractText($element, $nested + 1);
        }
    } else if (method_exists($obj, 'getText')) {
        $txt .= $obj->getText();
    } else if (method_exists($obj, 'getRows')) {
        foreach ($obj->getRows() as $row) {
            $txt .= " " . ExtractText($row, $nested + 1);
        }
    } else if (method_exists($obj, 'getCells')) {
        foreach ($obj->getCells() as $cell) {
            $txt .= " " . ExtractText($cell, $nested + 1);
        }
    } else if (get_class($obj) != "PhpOffice\PhpWord\Element\TextBreak") {
        $txt .= "(" . get_class($obj) . ")"; # unknown object, you need to add it
    }
    return $txt;
}

example1 responce is "k0c0}0�0S0 O k { � S "
example2 responce is " ãvf��gzl�[ N(WøSÕl��8hÁ �z"
example1
example2

  • PHPWord Version: 2020

$phpWord response in example 2 .
like

` ```

PhpOffice\PhpWord\PhpWord Object
(
    [sections:PhpOffice\PhpWord\PhpWord:private] => Array
        (
            [0] => PhpOffice\PhpWord\Element\Section Object
                (
                    [container:protected] => Section
                    [style:PhpOffice\PhpWord\Element\Section:private] => PhpOffice\PhpWord\Style\Section Object
                        (
                            [orientation:PhpOffice\PhpWord\Style\Section:private] => portrait
                            [paper:PhpOffice\PhpWord\Style\Section:private] => PhpOffice\PhpWord\Style\Paper Object
                                (
                                    [sizes:PhpOffice\PhpWord\Style\Paper:private] => Array
                                        (
                                            [A3] => Array
                                                (
                                                    [0] => 297
                                                    [1] => 420
                                                    [2] => mm
                                                )

                                            [A4] => Array
                                                (
                                                    [0] => 210
                                                    [1] => 297
                                                    [2] => mm
                                                )

                                            [A5] => Array
                                                (
                                                    [0] => 148
                                                    [1] => 210
                                                    [2] => mm
                                                )

                                            [B5] => Array
                                                (
                                                    [0] => 176
                                                    [1] => 250
                                                    [2] => mm
                                                )

                                            [Folio] => Array
                                                (
                                                    [0] => 8.5
                                                    [1] => 13
                                                    [2] => in
                                                )

                                            [Legal] => Array
                                                (
                                                    [0] => 8.5
                                                    [1] => 14
                                                    [2] => in
                                                )

                                            [Letter] => Array
                                                (
                                                    [0] => 8.5
                                                    [1] => 11
                                                    [2] => in
                                                )

                                        )

                                    [size:PhpOffice\PhpWord\Style\Paper:private] => A4
                                    [width:PhpOffice\PhpWord\Style\Paper:private] => 11905.511811024
                                    [height:PhpOffice\PhpWord\Style\Paper:private] => 16837.795275591
                                    [styleName:protected] => 
                                    [index:protected] => 
                                    [aliases:protected] => Array
                                        (
                                        )

                                    [isAuto:PhpOffice\PhpWord\Style\AbstractStyle:private] => 
                                )

                            [pageSizeW:PhpOffice\PhpWord\Style\Section:private] => 11905.511811024
                            [pageSizeH:PhpOffice\PhpWord\Style\Section:private] => 16837.795275591
                            [marginTop:PhpOffice\PhpWord\Style\Section:private] => 1440
                            [marginLeft:PhpOffice\PhpWord\Style\Section:private] => 1440
                            [marginRight:PhpOffice\PhpWord\Style\Section:private] => 1440
                            [marginBottom:PhpOffice\PhpWord\Style\Section:private] => 1440
                            [gutter:PhpOffice\PhpWord\Style\Section:private] => 0
                            [headerHeight:PhpOffice\PhpWord\Style\Section:private] => 720
                            [footerHeight:PhpOffice\PhpWord\Style\Section:private] => 720
                            [pageNumberingStart:PhpOffice\PhpWord\Style\Section:private] => 
                            [colsNum:PhpOffice\PhpWord\Style\Section:private] => 1
                            [colsSpace:PhpOffice\PhpWord\Style\Section:private] => 720
                            [breakType:PhpOffice\PhpWord\Style\Section:private] => 
                            [lineNumbering:PhpOffice\PhpWord\Style\Section:private] => 
                            [vAlign:PhpOffice\PhpWord\Style\Section:private] => 
                            [borderTopSize:protected] => 
                            [borderTopColor:protected] => 
                            [borderTopStyle:protected] => 
                            [borderLeftSize:protected] => 
                            [borderLeftColor:protected] => 
                            [borderLeftStyle:protected] => 
                            [borderRightSize:protected] => 
                            [borderRightColor:protected] => 
                            [borderRightStyle:protected] => 
                            [borderBottomSize:protected] => 
                            [borderBottomColor:protected] => 
                            [borderBottomStyle:protected] => 
                            [styleName:protected] => 
                            [index:protected] => 
                            [aliases:protected] => Array
                                (
                                )

                            [isAuto:PhpOffice\PhpWord\Style\AbstractStyle:private] => 
                        )

                    [headers:PhpOffice\PhpWord\Element\Section:private] => Array
                        (
                        )

                    [footers:PhpOffice\PhpWord\Element\Section:private] => Array
                        (
                        )

                    [footnoteProperties:PhpOffice\PhpWord\Element\Section:private] => 
                    [elements:protected] => Array
                        (
                            [0] => PhpOffice\PhpWord\Element\Text Object
                                (
                                    [text:protected] => ãvf��gzl�[
                                    [fontStyle:protected] => PhpOffice\PhpWord\Style\Font Object
                                        (
                                            [aliases:protected] => Array
                                                (
                                                    [line-height] => lineHeight
                                                    [letter-spacing] => spacing
                                                )

                                            [type:PhpOffice\PhpWord\Style\Font:private] => text
                                            [name:PhpOffice\PhpWord\Style\Font:private] => Songti SC
                                            [hint:PhpOffice\PhpWord\Style\Font:private] => 
                                            [size:PhpOffice\PhpWord\Style\Font:private] => 
                                            [color:PhpOffice\PhpWord\Style\Font:private] => 000000
                                            [bold:PhpOffice\PhpWord\Style\Font:private] => 1
                                            [italic:PhpOffice\PhpWord\Style\Font:private] => 
                                            [underline:PhpOffice\PhpWord\Style\Font:private] => none
                                            [superScript:PhpOffice\PhpWord\Style\Font:private] => 
                                            [subScript:PhpOffice\PhpWord\Style\Font:private] => 
                                            [strikethrough:PhpOffice\PhpWord\Style\Font:private] => 
                                            [doubleStrikethrough:PhpOffice\PhpWord\Style\Font:private] => 
                                            [smallCaps:PhpOffice\PhpWord\Style\Font:private] => 
                                            [allCaps:PhpOffice\PhpWord\Style\Font:private] => 
                                            [fgColor:PhpOffice\PhpWord\Style\Font:private] => 
                                            [scale:PhpOffice\PhpWord\Style\Font:private] => 
                                            [spacing:PhpOffice\PhpWord\Style\Font:private] => 
                                            [kerning:PhpOffice\PhpWord\Style\Font:private] => 
                                            [paragraph:PhpOffice\PhpWord\Style\Font:private] => 





```

@Progi1984
Copy link
Member

@Tarun-developer Hi, Could you send me a file with this case, please, for analysis ?

@Progi1984 Progi1984 added the Status: Waiting for feedback Question has been asked, waiting for response from PR author label Aug 30, 2024
@Progi1984 Progi1984 self-assigned this Aug 30, 2024
@Progi1984
Copy link
Member

@Tarun-developer

This issue has been fixed by a maintainer in the PR #2664. You can help him by sponsoring him through Github sponsors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Report MS-DOC (Word 97) Status: Waiting for feedback Question has been asked, waiting for response from PR author
Development

Successfully merging a pull request may close this issue.

2 participants