Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CentOS下编译程序指南 #236

Closed
fcityyyyy opened this issue Nov 20, 2023 · 21 comments
Closed

CentOS下编译程序指南 #236

fcityyyyy opened this issue Nov 20, 2023 · 21 comments
Labels
documentation Improvements or additions to documentation

Comments

@fcityyyyy
Copy link

按照源码中的编译说明,先编译的主程序ElectronJS,
CentOS上下载安装了最新的chrome ,命令google-chrome-stable -version,显示Google Chrome 119.0.6045.159

也按照说明将/opt/google/chrome/,全部copy到了ElectronJS下,并重命名为chrome_linux64。

也下载了对应版本的chromedriver_linux64,放到了chrome_linux64下

npm install和npm install @electron-forge/cli -g 两个命令也都执行安装成功了(换了taobao源,npm安装过程中提示需要python3,也安装了python3.8.15,安装后命令执行成功)

但最后执行npm run start_direct,总是报错,

用root用户执行会报:

easy-spider@0.3.5 start_direct
electron .

[1120/000559.944607:FATAL:electron_main_delegate.cc(294)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
/mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/electron/dist/electron exited with signal SIGTRAP

切换普通用户后执行报错:

easy-spider@0.3.5 start_direct
electron .

[13824:1120/000541.120354:FATAL:setuid_sandbox_host.cc(158)] The SUID sandbox helper binary was found, but is not configured correctly. Rather than run without sandboxing I'm aborting now. You need to make sure that /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/electron/dist/chrome-sandbox is owned by root and has mode 4755.
/mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/electron/dist/electron exited with signal SIGTRAP

麻烦帮忙看看,是哪里出了问题?万分感谢!!!

@NaiboWang
Copy link
Owner

@fcityyyyy
Copy link
Author

好的,我先去看看,非常感谢答复

@fcityyyyy
Copy link
Author

根据说明,修改了权限,现在运行npm run start_direct,主程序可以跑起来了,

Snipaste_2023-11-21 18-27-213

也能浏览任务,


Snipaste_2023-11-21 18-58-662

但点击设计任务后,会报以下错误:

GET A MESSAGE: { type: 0, message: { id: 1 } }
set socket_start
(node:18384) UnhandledPromiseRejectionWarning: Error: spawn /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64 EACCES
at /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/selenium-webdriver/remote/index.js:260:24
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
(Use electron --trace-warnings ... to show where the warning was created)
(node:18384) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:18384) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 1)
(node:18384) UnhandledPromiseRejectionWarning: Error: spawn /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64 EACCES
at /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/selenium-webdriver/remote/index.js:260:24
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
(node:18384) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 3)
(node:18384) UnhandledPromiseRejectionWarning: Error: spawn /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64 EACCES
at /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/node_modules/selenium-webdriver/remote/index.js:260:24
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
(node:18384) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 4)

GET A MESSAGE: { type: 0, message: { id: 2 } }
set socket_flowchart

还请再帮忙看看是哪里出了问题?

单独运行chrome浏览器是可以的,
Snipaste_2023-11-21 17-14-125

另外,运行npm run start_direct,主程序起来后,后台有如下报错,不知道有没有影响
[user1@cent11 ElectronJS]$ npm run start_direct

easy-spider@0.3.5 start_direct
electron .

Server has started.
server_address: http://localhost:8074
x64
/mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64 /mysofts/crawler/EasySpider-0.3.5-c/Elec tronJS/chrome_linux64/chrome /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/execute.sh
linux
A JavaScript error occurred in the main process
Uncaught Exception:
Error: EACCES: permission denied, open 'info.log'
[18384:1121/111727.823623:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[18384:1121/111727.823658:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[18384:1121/111727.846200:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[18384:1121/111727.912438:ERROR:bus.cc(399)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")

以上非常非常感谢

@NaiboWang
Copy link
Owner

遇到的错误信息 UnhandledPromiseRejectionWarning: Error: spawn [...] EACCES 通常说明了以下两个主要问题:

权限问题:EACCES(Error Access)表明你执行 chromedriver_linux64 二进制文件时没有设置必要的执行权限,或者运行 Electron 应用程序的用户没有必要的权限。

未处理的承诺拒绝:意味着你的代码中存在一个被拒绝的承诺,且该拒绝没有被适当地通过 .catch 处理程序捕获,或者在 async 函数中没有被 try/catch 块捕获。

解决这些问题,可以按照以下步骤操作:

解决 EACCES 错误
确保执行权限: 确保 chromedriver_linux64 文件具有执行权限。你可以通过在终端中运行以下命令来设置它:

bash   chmod +x /mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/chrome_linux64/chromedriver_linux64

检查所有者权限: 验证当前用户是否具有访问该文件的权限。如果不是,请使用 chown 或者 sudo 命令改变所有者或者允许当前用户访问该文件。

解决未处理的承诺拒绝问题
检查代码中所有的 promise: 查找代码中可能产生 UnhandledPromiseRejectionWarning 警告的 promise。对于每个 promise 或异步操作,请确保你有适当的错误处理机制,比如 .catch 块或者包含在 try/catch 结构中。

   someAsyncFunction()
       .then((result) => {
           // 处理结果
       })
       .catch((error) => {
           // 错误处理
           console.error(error);
       });

或者在 async 函数中:

   async function asyncCall() {
       try {
           let result = await someAsyncFunction();
           // 处理结果
       } catch (error) {
           // 错误处理
           console.error(error);
       }
   }

确保在应用程序中每个异步任务都被适当地管理和捕获错误,这样可以防止它们造成未处理的承诺拒绝警告。

@fcityyyyy
Copy link
Author

好的,非常非常感谢,我再对照看看

@fcityyyyy
Copy link
Author

按照回复修改了chromedriver_linux64的权限,加上执行权限就好了,主程序可以跑起来了,点设计新任务也能够设计了
非常感谢,

Snipaste_2023-11-22 51-35-582

按照编译说明,开始进行执行阶段程序的编译,
执行了 pip3 install -r requirements.txt,提示都成功,
第一次执行python3 easyspider_executestage.py --id [1],提示lxml模块没找到
pip3 list看了一下我这个环境确实没有安装上,
又pip3 install lxml安装了一下,pip3 list 也能看到这个库了,
再次执行python3 easyspider_executestage.py --id [1],
提示以下信息:

[user1@cent11 ExecuteStage]$ python3 easyspider_executestage.py --id [1]

Configurations:
+------------------+------+-----------------------+
| Key | Type | Value |
+------------------+------+-----------------------+
| id | list | [1] |
| saved_file_name | str | |
| user_data | bool | False |
| config_folder | str | |
| config_file_name | str | config.json |
| read_type | str | remote |
| headless | bool | False |
| server_address | str | http://localhost:8074 |
| version | str | 0.3.5 |
+------------------+------+-----------------------+

linux ('64bit', 'ELF')
Finding chromedriver in EasySpider /mysofts/crawler/EasySpider-0.3.5-c/ExecuteStage/ElectronJS

Absolute_user_data_folder: D:\Documents\Projects\EasySpider\ElectronJS\user_data

<selenium.webdriver.chrome.options.Options object at 0x7fb099ac03a0>
id: 1
Save Name for task ID 1 is: 2023_11_22_20_57_20_236066
任务ID 1 的保存文件名为: 2023_11_22_20_57_20_236066
remote

Cannot automatically check new version, please use the following command to check whether a new version avaliable and upgrade by pip:
pip index versions commandline_config
pip install commandline --upgrade
Task Name: 中国知网
任务名称: 中国知网
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 71, in start
self.process = subprocess.Popen(cmd, env=self.env,
File "/usr/local/python3/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/python3/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '../ElectronJS/chrome_win64/chromedriver_win64.exe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "easyspider_executestage.py", line 1395, in
browser_t = MyChrome(
File "/mysofts/crawler/EasySpider-0.3.5-c/ExecuteStage/myChrome.py", line 25, in init
super().init(*args, **kwargs) # 调用父类的 init
File "/usr/local/python3/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in init
super().init(DesiredCapabilities.CHROME['browserName'], "goog",
File "/usr/local/python3/lib/python3.8/site-packages/selenium/webdriver/chromium/webdriver.py", line 89, in init
self.service.start()
File "/usr/local/python3/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 81, in start
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'chromedriver_win64.exe' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home

我看出错提示好像是说FileNotFoundError: [Errno 2] No such file or directory: '../ElectronJS/chrome_win64/chromedriver_win64.exe',没有找到chromedriver_win64.exe这个文件,我这个是linux环境,应该是chromedriver_linux64这个文件才对啊。

是我哪里执行错了吗?

还请再帮忙看看,非常非常感谢

@NaiboWang
Copy link
Owner

直接修改代码中'../ElectronJS/chrome_win64/chromedriver_win64.exe'那行的路径为你Linux的chromedriver路径即可。

@fcityyyyy
Copy link
Author

好的,非常非常感谢,我再对照看看

@fcityyyyy
Copy link
Author

依据您的回复,我把easyspider_executestage.py中的chrome和chromedriver名称和路径修改了,

Snipaste_2023-11-23 30-14-295

现在运行python3 easyspider_executestage.py --id [1]
能够出来这样一个浏览器窗口
Snipaste_2023-11-23 30-36-678

不过后台还是报错有文件找不到,
[user1@cent11 ExecuteStage]$ python3 easyspider_executestage.py --id [1]

Configurations:
+------------------+------+-----------------------+
| Key | Type | Value |
+------------------+------+-----------------------+
| id | list | [1] |
| saved_file_name | str | |
| user_data | bool | False |
| config_folder | str | |
| config_file_name | str | config.json |
| read_type | str | remote |
| headless | bool | False |
| server_address | str | http://localhost:8074 |
| version | str | 0.3.5 |
+------------------+------+-----------------------+

Cannot automatically check new version, please use the following command to check whether a new version avaliable and upgrade by pip:
pip index versions commandline_config
pip install commandline --upgrade
linux ('64bit', 'ELF')
Finding chromedriver in EasySpider /mysofts/crawler/EasySpider-0.3.5-c/ExecuteStage/ElectronJS

Absolute_user_data_folder: D:\Documents\Projects\EasySpider\ElectronJS\user_data

<selenium.webdriver.chrome.options.Options object at 0x7f36023b13a0>
id: 1
Save Name for task ID 1 is: 2023_11_23_08_25_29_830135
任务ID 1 的保存文件名为: 2023_11_23_08_25_29_830135
remote
Task Name: 中国知网
任务名称: 中国知网
Traceback (most recent call last):
File "easyspider_executestage.py", line 1404, in
thread = BrowserThread(browser_t, i, service,
File "easyspider_executestage.py", line 63, in init
with open(stealth_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '../ElectronJS/chrome_linux64/stealth.min.js'

我查了一下,这个目录确实没有这个js文件,但不知道从哪里去找,
麻烦再帮忙看看,非常非常感谢。


另外我以为是不是直接打包到主程序能够绕过这个问题,按照编译说明
执行generateExecutable_Linux64.sh,报如下错误:

[user1@cent11 ExecuteStage]$ ./generateExecutable_Linux64.sh
rm: 无法删除"build": 没有那个文件或目录
rm: 无法删除"dist": 没有那个文件或目录
./generateExecutable_Linux64.sh:行3: pyinstaller: 未找到命令
rm: 无法删除"../ElectronJS/chrome_linux64/easyspider_executestage": 没有那个文件或目录
cp: 无法获取"dist/easyspider_executestage" 的文件状态(stat): 没有那个文件或目录

这块也麻烦帮忙看看,非常非常感谢。

@NaiboWang
Copy link
Owner

ElectronJS文件夹下有这个文件,拷贝到指定目录即可。

下面的打包脚本是Ubuntu的,不能混用。

@fcityyyyy
Copy link
Author

好的,我拷贝下看看,

另外打包脚本如果是Ubuntu下用的话,CentOS下问下要如何修改吗?
我看generateExecutable_Linux64.sh打包脚本是这样的:
rm -r build
rm -r dist
pyinstaller -F --icon=favicon.ico easyspider_executestage.py
rm ../ElectronJS/chrome_linux64/easyspider_executestage
cp dist/easyspider_executestage ../ElectronJS/chrome_linux64/easyspider_executestage

这几行除了第三行,都是删除和拷贝文件的命令,不知道从何改起?
还请再帮忙指导下,非常非常感谢。

@NaiboWang
Copy link
Owner

不需要打包,能运行起来就行,一定要打包这个脚本可以不用改。

@fcityyyyy
Copy link
Author

拷贝了stealth.min.js到chrome_linux64后,能够正常设计任务和保存任务了,

Snipaste_2023-11-24 43-40-675
Snipaste_2023-11-24 43-59-947

不过当点击调用任务的时候,
Snipaste_2023-11-24 44-29-783

会报zha找不到execute.sh的错误,
Snipaste_2023-11-24 43-09-730

我按照之前的说明,在ElectronJS目录下也没有找到这个文件,只找到execute_macos.sh 和execute.bat文件,

我试着修改execute_macos.sh这个文件,

#!/bin/bash

echo "Executing EasySpider on MacOS"

./easyspider_executestage $1 $2 $3 $4 $5 $6 $7 $8 $9

但发现easyspider_executestage 这个文件也没有,按照编译说明,这似乎是执行阶段编译打包后产生的文件,

试着执行打包命令,

[user1@cent11 ExecuteStage]$ ./generateExecutable_Linux64.sh
rm: 无法删除"build": 没有那个文件或目录
rm: 无法删除"dist": 没有那个文件或目录
./generateExecutable_Linux64.sh:行3: pyinstaller: 未找到命令
rm: 无法删除"../ElectronJS/chrome_linux64/easyspider_executestage": 没有那个文件或目录
cp: 无法获取"dist/easyspider_executestage" 的文件状态(stat): 没有那个文件或目录

仍然还是报以上错误,并且我实际上也是想打包部署到服务器上使用的,

以上还请再帮忙看看我的问题出在了哪儿?非常非常感谢!

@NaiboWang
Copy link
Owner

@fcityyyyy
Copy link
Author

好的,我试试,非常非常感谢

@fcityyyyy
Copy link
Author

按照推荐的方法搜索拷贝两个文件到相应目录,不行,于是查看了execute.sh,发现执行文件的路径不对,
将内容修改为:
#!/bin/bash
./easyspider_executestage $1 $2 $3 $4 $5 $6 $7 $8 $9
调用任务还是不行,主程序没有反应,浏览器界面不出来,也没有数据记录,
Snipaste_2023-11-28 29-05-331

于是想是不是还是得CentOS环境打包编译执行阶段的程序,重新去执行编译generateExecutable_Linux64.sh,这个脚本去排查问题,发现是pyinstaller找不到,在脚本中指定pyintaller的绝对路径,又解决了提示python3 enable--share参数问题后,打包成功了,
Snipaste_2023-11-28 29-48-611

dist目录下的easyspider_executestage也自动拷贝到chrome_linux下。
于是重新执行任务,还是不行,重新设计了个任务来执行,也还是不行。
Snipaste_2023-11-28 40-23-706

试着在ExecuteStage目录下执行python3 easyspider_executestage.py --id [2],也修改了config.json下的数据文件位置,也还是不行,提示如下,目录下也没有生成的数据文件。
[user1@cent11 ExecuteStage]$ python3 easyspider_executestage.py --id [2]

Configurations:
+------------------+------+-----------------------+
| Key | Type | Value |
+------------------+------+-----------------------+
| id | list | [2] |
| saved_file_name | str | |
| user_data | bool | False |
| config_folder | str | |
| config_file_name | str | config.json |
| read_type | str | remote |
| headless | bool | False |
| server_address | str | http://localhost:8074 |
| version | str | 0.3.5 |
+------------------+------+-----------------------+

linux ('64bit', 'ELF')
Finding chromedriver in EasySpider /mysofts/crawler/EasySpider-0.3.5-c/ExecuteStage/ElectronJS

Absolute_user_data_folder: /home/user1/crawler_data

<selenium.webdriver.chrome.options.Options object at 0x7f072c6863a0>
id: 2
Save Name for task ID 2 is: 2023_11_28_12_19_34_045771
任务ID 2 的保存文件名为: 2023_11_28_12_19_34_045771
remote

Cannot automatically check new version, please use the following command to check whether a new version avaliable and upgrade by pip:
pip index versions commandline_config
pip install commandline --upgrade
Traceback (most recent call last):
File "easyspider_executestage.py", line 1362, in
print("Task Name:", service["name"])
KeyError: 'name'


目前不知道从哪方面着手解决问题了,还请再帮忙看看,非常非常感谢。。

@NaiboWang
Copy link
Owner

NaiboWang commented Nov 28, 2023

参考:#239

@fcityyyyy
Copy link
Author

好的,我看看对照下

@fcityyyyy
Copy link
Author

确实是我把执行任务的ID搞错了,我execution_instances下只有0.json和1.json。

python3 easyspider_executestage.py --id [0]
传值正确后就好了,能够抓到相关的数据,控制台也能看得到。

通过命令行./chrome_linux64/easyspider_executestage --id '[0]' --user_data 0 --server_address http://localhost:8074 --config_folder "/mysofts/crawler/EasySpider-0.3.5-c/ElectronJS/" --headless 0 --read_type remote --config_file_name config.json --saved_file_name
也能够抓到相关数据。

很是开心,非常非常感谢您的指导和帮助


现在就是只有在任务页面下点击【本地直接执行】不行,没有反应,后台也看不到报错,就只是正常的提示信息:

GET A MESSAGE: { type: 5, message: { id: 2, user_data_folder: '', execute_type: 0 } }
{ id: 2, user_data_folder: '', execute_type: 0 }

GET A MESSAGE: { type: 5, message: { id: 2, user_data_folder: '', execute_type: 0 } }
{ id: 2, user_data_folder: '', execute_type: 0 }
0.json
1.json
2.json

GET A MESSAGE: { type: 5, message: { id: 3, user_data_folder: '', execute_type: 1 } }
{ id: 3, user_data_folder: '', execute_type: 1 }

data目录下也看不到数据。

这个是和我用x11 forward的方式来打开的有关系吗?设计任务的时候可以正常设计和保存,不知道运行的时候为什么不行?
还请帮助再看看,非常非常感谢!

@NaiboWang NaiboWang added the documentation Improvements or additions to documentation label Nov 29, 2023
@NaiboWang NaiboWang changed the title CentOS下编译主程序ElectronJS后,运行总报FATAL:setuid_sandbox_host.cc(158)] CentOS下编译主程序指南 Nov 29, 2023
@NaiboWang
Copy link
Owner

NaiboWang commented Nov 29, 2023

本地直接执行需要依赖目录下的chrome_linux64/execute.sh文件,和设计任务的流程无关,其核心仍然是命令行调用脚本,CentOS下我也没有测试过,核心代码在ElectronJS文件夹下的main.js的76-78行以及341-347行,你可以自行调试下,如果调试不成功那就用命令行执行吧:

driverPath = path.join(__dirname, "chrome_linux64/chromedriver_linux64");
chromeBinaryPath = path.join(__dirname, "chrome_linux64/chrome");
execute_path = path.join(__dirname, "chrome_linux64/execute.sh");
let spawn = require("child_process").spawn;
if (process.platform != "darwin" && msg.message.execute_type == 1 && msg.message.id != -1) {
    let child_process = spawn(execute_path, parameters);
    child_process.stdout.on('data', function (data) {
        console.log(data.toString());
    });
}

@NaiboWang NaiboWang changed the title CentOS下编译主程序指南 CentOS下编译程序指南 Nov 29, 2023
@fcityyyyy
Copy link
Author

好的,明白了,我再试试看,非常非常感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants