Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

报错ValueError: Unable to avoid copy while creating an array as requested. #414

Open
Putin0922 opened this issue Aug 13, 2024 · 14 comments
Labels
bug Something isn't working

Comments

@Putin0922
Copy link

Description of the bug | 错误描述

采用版本为0.7.0b1,在运行测试时出现ValueError: Unable to avoid copy while creating an array as requested.报错,完整内容如下:
2024-08-13 16:23:06.329 | ERROR | magic_pdf.tools.cli:parse_doc:69 - Unable to avoid copy while creating an array as requested.
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
Traceback (most recent call last):

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\libs\language.py", line 20, in detect_lang
lang_upper = detect_language(text)
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function detect_language at 0x0000029B9965FEB0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect_init_.py", line 23, in detect_language
lang_code = detect(sentence, low_memory=low_memory).get("lang").upper()
│ │ └ True
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function detect at 0x0000029B99974B80>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect\infer.py", line 81, in detect
labels, scores = model.predict(text)
│ │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
│ └ <function _FastText.predict at 0x0000029B9967CEE0>
└ <fasttext.FastText._FastText object at 0x0000029BBE2B9D50>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 221, in predict
text = check(text)
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function _FastText.predict..check at 0x0000029BBE2CFC70>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 208, in check
raise ValueError(

ValueError: predict processes one line at a time (remove '\n')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "D:\Anaconda3\envs\MinerU\lib\runpy.py", line 196, in _run_module_as_main
return run_code(code, main_globals, None,
│ │ └ {'name': 'main', 'doc': None, 'package': '', 'loader': <zipimporter object "D:\Anaconda3\envs\MinerU\Scri...
│ └ <code object at 0x0000029B954C79F0, file "D:\Anaconda3\envs\MinerU\Scripts\magic-pdf.exe_main
.py", line 1>
└ <function _run_code at 0x0000029B954B0CA0>

File "D:\Anaconda3\envs\MinerU\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
│ └ {'name': 'main', 'doc': None, 'package': '', 'loader': <zipimporter object "D:\Anaconda3\envs\MinerU\Scri...
└ <code object at 0x0000029B954C79F0, file "D:\Anaconda3\envs\MinerU\Scripts\magic-pdf.exe_main
.py", line 1>

File "D:\Anaconda3\envs\MinerU\Scripts\magic-pdf.exe_main_.py", line 7, in

File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
│ │ │ └ {}
│ │ └ ()
│ └ <function BaseCommand.main at 0x0000029B9708FAC0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
│ │ └ <click.core.Context object at 0x0000029B955187F0>
│ └ <function Command.invoke at 0x0000029B970A45E0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
│ │ │ │ │ └ {'path': 'C:\Users\dengg\Desktop\test', 'output_dir': 'C:\Users\dengg\Desktop\test_out', 'method': 'auto'}
│ │ │ │ └ <click.core.Context object at 0x0000029B955187F0>
│ │ │ └ <function cli at 0x0000029BBE2CF490>
│ │ └
│ └ <function Context.invoke at 0x0000029B9708F2E0>
└ <click.core.Context object at 0x0000029B955187F0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
│ └ {'path': 'C:\Users\dengg\Desktop\test', 'output_dir': 'C:\Users\dengg\Desktop\test_out', 'method': 'auto'}
└ ()

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\cli.py", line 73, in cli
parse_doc(doc_path)
│ └ WindowsPath('C:/Users/dengg/Desktop/test/Optical characteristics and energy transfer analysis of Dy3+-Pr3+ ions doped in CeF3...
└ <function cli..parse_doc at 0x0000029B9550F250>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\cli.py", line 60, in parse_doc
do_parse(
└ <function do_parse at 0x0000029BBE2CE8C0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\common.py", line 61, in do_parse
pipe.pipe_classify()
│ └ <function UNIPipe.pipe_classify at 0x0000029BBE2CE5F0>
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000029BBE2B9510>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 25, in pipe_classify
self.pdf_type = AbsPipe.classify(self.pdf_bytes)
│ │ │ │ │ └ b'%PDF-1.7\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec...
│ │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000029BBE2B9510>
│ │ │ └ <staticmethod(<function AbsPipe.classify at 0x0000029B9BD66170>)>
│ │ └ <class 'magic_pdf.pipe.AbsPipe.AbsPipe'>
│ └ ''
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000029BBE2B9510>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\pipe\AbsPipe.py", line 63, in classify
pdf_meta = pdf_meta_scan(pdf_bytes)
│ └ b'%PDF-1.7\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec...
└ <function pdf_meta_scan at 0x0000029B9BD65630>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\filter\pdf_meta_scan.py", line 337, in pdf_meta_scan
text_language = get_language(doc)
│ └ Document('', <memory, doc# 1>)
└ <function get_language at 0x0000029B9BD65510>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\filter\pdf_meta_scan.py", line 289, in get_language
page_language = detect_lang(text_block)
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function detect_lang at 0x0000029B9965F910>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\libs\language.py", line 23, in detect_lang
lang_upper = detect_language(html_no_ctrl_chars)
│ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful...
└ <function detect_language at 0x0000029B9965FEB0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect_init_.py", line 23, in detect_language
lang_code = detect(sentence, low_memory=low_memory).get("lang").upper()
│ │ └ True
│ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful...
└ <function detect at 0x0000029B99974B80>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect\infer.py", line 81, in detect
labels, scores = model.predict(text)
│ │ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful...
│ └ <function _FastText.predict at 0x0000029B9967CEE0>
└ <fasttext.FastText._FastText object at 0x0000029BBE2B9D50>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 228, in predict
return labels, np.array(probs, copy=False)
│ │ │ └ (0.9080705046653748,)
│ │ └
│ └ <module 'numpy' from 'D:\Anaconda3\envs\MinerU\lib\site-packages\numpy\init.py'>
└ ('__label__en',)

ValueError: Unable to avoid copy while creating an array as requested.
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
请各位解答疑惑

How to reproduce the bug | 如何复现

如报错描述所示

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cpu

@Putin0922 Putin0922 added the bug Something isn't working label Aug 13, 2024
@myhloli
Copy link
Collaborator

myhloli commented Aug 13, 2024

项目不兼容numpy2.x,需要安装1.x版本,正常安装项目会自动处理依赖版本,请按readme执行操作。

@Putin0922
Copy link
Author

我这边就是按照readme的步骤来安装的,在解决了fairscale模块的问题后,又出现了缺少fvcore.transforms模块的问题,然后通过conda安装了fvcore之后,就出现了上述这个ValueError的问题,请问有什么解决方法吗

@Putin0922
Copy link
Author

我刚刚检查了在MinerU环境下的numpy版本为1.26.4,并非2.x版本,仍然出现上述报错

@myhloli
Copy link
Collaborator

myhloli commented Aug 13, 2024

我这边就是按照readme的步骤来安装的,在解决了fairscale模块的问题后,又出现了缺少fvcore.transforms模块的问题,然后通过conda安装了fvcore之后,就出现了上述这个ValueError的问题,请问有什么解决方法吗

正常安装流程不应该缺少这么多依赖,而且十分不建议使用conda安装任何依赖,项目所有依赖都应该通过pip安装

@myhloli
Copy link
Collaborator

myhloli commented Aug 13, 2024

我刚刚检查了在MinerU环境下的numpy版本为1.26.4,并非2.x版本,仍然出现上述报错

上述报错的原因很明确是由于numpy2.x导致的,1.26.4不会触发这个问题

@Putin0922
Copy link
Author

(base) PS C:\Users\dengg> conda activate MinerU
(MinerU) PS C:\Users\dengg> conda list numpy

packages in environment at D:\Anaconda3\envs\MinerU:

Name Version Build Channel

numpy 1.26.4 pypi_0 pypi
(MinerU) PS C:\Users\dengg> magic-pdf -p C:\Users\dengg\Desktop\test -o C:\Users\dengg\Desktop\test_out -m auto
2024-08-13 17:19:38.772 | ERROR | magic_pdf.tools.cli:parse_doc:69 - Unable to avoid copy while creating an array as requested.
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
Traceback (most recent call last):

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\libs\language.py", line 20, in detect_lang
lang_upper = detect_language(text)
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function detect_language at 0x0000020F706FFEB0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect_init_.py", line 23, in detect_language
lang_code = detect(sentence, low_memory=low_memory).get("lang").upper()
│ │ └ True
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function detect at 0x0000020F70A20B80>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect\infer.py", line 81, in detect
labels, scores = model.predict(text)
│ │ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
│ └ <function _FastText.predict at 0x0000020F7071CEE0>
└ <fasttext.FastText._FastText object at 0x0000020F1545DCF0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 221, in predict
text = check(text)
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function _FastText.predict..check at 0x0000020F1546FC70>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 208, in check
raise ValueError(

ValueError: predict processes one line at a time (remove '\n')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "D:\Anaconda3\envs\MinerU\lib\runpy.py", line 196, in _run_module_as_main
return run_code(code, main_globals, None,
│ │ └ {'name': 'main', 'doc': None, 'package': '', 'loader': <zipimporter object "D:\Anaconda3\envs\MinerU\Scri...
│ └ <code object at 0x0000020F6C5279F0, file "D:\Anaconda3\envs\MinerU\Scripts\magic-pdf.exe_main
.py", line 1>
└ <function _run_code at 0x0000020F6C510CA0>

File "D:\Anaconda3\envs\MinerU\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
│ └ {'name': 'main', 'doc': None, 'package': '', 'loader': <zipimporter object "D:\Anaconda3\envs\MinerU\Scri...
└ <code object at 0x0000020F6C5279F0, file "D:\Anaconda3\envs\MinerU\Scripts\magic-pdf.exe_main
.py", line 1>

File "D:\Anaconda3\envs\MinerU\Scripts\magic-pdf.exe_main_.py", line 7, in

File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
│ │ │ └ {}
│ │ └ ()
│ └ <function BaseCommand.main at 0x0000020F6E12FAC0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
│ │ └ <click.core.Context object at 0x0000020F6C5787F0>
│ └ <function Command.invoke at 0x0000020F6E1445E0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
│ │ │ │ │ └ {'path': 'C:\Users\dengg\Desktop\test', 'output_dir': 'C:\Users\dengg\Desktop\test_out', 'method': 'auto'}
│ │ │ │ └ <click.core.Context object at 0x0000020F6C5787F0>
│ │ │ └ <function cli at 0x0000020F1546F490>
│ │ └
│ └ <function Context.invoke at 0x0000020F6E12F2E0>
└ <click.core.Context object at 0x0000020F6C5787F0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
│ └ {'path': 'C:\Users\dengg\Desktop\test', 'output_dir': 'C:\Users\dengg\Desktop\test_out', 'method': 'auto'}
└ ()

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\cli.py", line 73, in cli
parse_doc(doc_path)
│ └ WindowsPath('C:/Users/dengg/Desktop/test/Optical characteristics and energy transfer analysis of Dy3+-Pr3+ ions doped in CeF3...
└ <function cli..parse_doc at 0x0000020F6C56F250>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\cli.py", line 60, in parse_doc
do_parse(
└ <function do_parse at 0x0000020F1546E8C0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\tools\common.py", line 61, in do_parse
pipe.pipe_classify()
│ └ <function UNIPipe.pipe_classify at 0x0000020F1546E5F0>
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000020F1545D4B0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 25, in pipe_classify
self.pdf_type = AbsPipe.classify(self.pdf_bytes)
│ │ │ │ │ └ b'%PDF-1.7\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec...
│ │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000020F1545D4B0>
│ │ │ └ <staticmethod(<function AbsPipe.classify at 0x0000020F72E16170>)>
│ │ └ <class 'magic_pdf.pipe.AbsPipe.AbsPipe'>
│ └ ''
└ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x0000020F1545D4B0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\pipe\AbsPipe.py", line 63, in classify
pdf_meta = pdf_meta_scan(pdf_bytes)
│ └ b'%PDF-1.7\r%\x80\x84\x88\x8c\x90\x94\x98\x9c\xa0\xa4\xa8\xac\xb0\xb4\xb8\xbc\xc0\xc4\xc8\xcc\xd0\xd4\xd8\xdc\xe0\xe4\xe8\xec...
└ <function pdf_meta_scan at 0x0000020F72E15630>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\filter\pdf_meta_scan.py", line 337, in pdf_meta_scan
text_language = get_language(doc)
│ └ Document('', <memory, doc# 1>)
└ <function get_language at 0x0000020F72E15510>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\filter\pdf_meta_scan.py", line 289, in get_language
page_language = detect_lang(text_block)
│ └ 'Journal of Luminescence 270 (2024) 120542\nAvailable online 8 March 2024\n0022-2313/© 2024 Elsevier B.V. All rights reserved...
└ <function detect_lang at 0x0000020F706FF910>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\magic_pdf\libs\language.py", line 23, in detect_lang
lang_upper = detect_language(html_no_ctrl_chars)
│ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful...
└ <function detect_language at 0x0000020F706FFEB0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect_init_.py", line 23, in detect_language
lang_code = detect(sentence, low_memory=low_memory).get("lang").upper()
│ │ └ True
│ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful...
└ <function detect at 0x0000020F70A20B80>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fast_langdetect\ft_detect\infer.py", line 81, in detect
labels, scores = model.predict(text)
│ │ └ 'Journal of Luminescence 270 (2024) 120542Available online 8 March 20240022-2313/© 2024 Elsevier B.V. All rights reserved.Ful...
│ └ <function _FastText.predict at 0x0000020F7071CEE0>
└ <fasttext.FastText._FastText object at 0x0000020F1545DCF0>

File "D:\Anaconda3\envs\MinerU\lib\site-packages\fasttext\FastText.py", line 228, in predict
return labels, np.array(probs, copy=False)
│ │ │ └ (0.9080705046653748,)
│ │ └
│ └ <module 'numpy' from 'D:\Anaconda3\envs\MinerU\lib\site-packages\numpy\init.py'>
└ ('__label__en',)

ValueError: Unable to avoid copy while creating an array as requested.
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.

@Putin0922
Copy link
Author

以上是我刚刚测试的结果,您这边可以看一下,numpy版本确实是1.26.4

@myhloli
Copy link
Collaborator

myhloli commented Aug 13, 2024

If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.

报错信息里应该很清楚,1.x的numpy是输出不了这个的

@Putin0922
Copy link
Author

但是我这环境里面显示的numpy版本显示是1.26.4,想问一下是有什么可能的原因呢

@myhloli
Copy link
Collaborator

myhloli commented Aug 13, 2024

看numpy的版本应该使用pip list 而不是conda list吧

@Putin0922
Copy link
Author

我这使用pip list看numpy也是1.26.4版本的

@myhloli
Copy link
Collaborator

myhloli commented Aug 13, 2024

要不你建个新的conda环境从头走一遍再试试?

@Putin0922
Copy link
Author

好的我再尝试一下吧,有问题再来咨询您

@xiahuadong1981
Copy link

找到 FastText.py 文件的 predict 方法的实现部分,找到这段代码:
return labels, np.array(probs, copy=False)
将其替换为:
return labels, np.asarray(probs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants