Skip to content

Can't read file .doc (MS Word 97) with utf-8 characters #1454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
nhatquang20 opened this issue Sep 3, 2018 · 6 comments · Fixed by #2664
Closed
1 task done

Can't read file .doc (MS Word 97) with utf-8 characters #1454

nhatquang20 opened this issue Sep 3, 2018 · 6 comments · Fixed by #2664
Assignees
Labels
MS-DOC (Word 97) Status: Waiting for feedback Question has been asked, waiting for response from PR author
Milestone

Comments

@nhatquang20
Copy link

nhatquang20 commented Sep 3, 2018

This is:

  • a bug report
    My .doc file has the following text: است که در زمان ساخت داده هائی در آن
    I use this code to read file:
    `
    $phpWord = \PhpOffice\PhpWord\IOFactory::load($source, 'MsDoc');

      $sections = $phpWord->getSections();
    
      foreach ($sections as $section) {
          foreach ($section->getElements() as $element) {
             $string = $element->gettext();
             echo $string;
             exit;
          }
      }
    

`
its return: '�3�� ©�G� /�1� 2�E�'�F� 3�'�.�
when i dump it, its return: "'\x063\x06*\x06 ©\x06G\x06 /\x061\x06 2\x06E\x06'\x06F\x06 3\x06'\x06.\x06*\x06 "
How could i read it and save to db?

@Progi1984
Copy link
Member

@nhatquang20 Could you give us a sample file for testing, please ? Thank you

@Progi1984 Progi1984 self-assigned this Oct 4, 2018
@LuongTranNguyen
Copy link

Hi Progi1984,

Is this issue fixed yet? I use PHPOffice and have the same problem.

@KitKanWong
Copy link

same problem

@nevergone
Copy link

v0.17.0 tested and it have same problem.

Sample text for testing (text with hungarian special characters):

árvíztűrő tükörfúrógép
ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP

@Progi1984 Progi1984 added the Status: Waiting for feedback Question has been asked, waiting for response from PR author label Aug 30, 2024
@Progi1984
Copy link
Member

@KitKanWong @nevergone @LuongTranNguyen Hi, Could you send me a file with this case, please, for analysis ?

@Progi1984
Copy link
Member

@nhatquang20 @KitKanWong @nevergone @LuongTranNguyen

This issue has been fixed by a maintainer in the PR #2664. You can help him by sponsoring him through Github sponsors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MS-DOC (Word 97) Status: Waiting for feedback Question has been asked, waiting for response from PR author
Development

Successfully merging a pull request may close this issue.

5 participants