MacOS with two execute stage version

This commit is contained in:
Naibo_Mac_M2 2023-12-23 15:44:38 +08:00
parent 476cec0537
commit 2fb6283063
8 changed files with 74 additions and 27 deletions

View File

@ -2,24 +2,46 @@ Due to the complex security settings of MacOS, the issue of being unable to open
https://github.com/NaiboWang/EasySpider/wiki/MacOS-Guide
For the Arm version, if it shows "the package is damaged", you need to use the following command to modify the package attributes:
The main steps are as follows:
- Design phase - Apple Arm chip version of MacOS
1. For the Arm version, if it shows "The package is damaged", you need to do the following to run EasySpider:
2. Open the terminal command line window.
3. Switch to the EasySpider software directory, such as:
cd ~/Downloads/EasySpider_MacOS
4. In the EasySpider directory, use the following command to modify the software package attributes:
xattr -c YourEasySpider.appFilePath
xattr -cr Your EasySpider.app file path
For example:
xattr -cr /Users/your_username/Downloads/EasySpider_MacOS/EasySpider.app
Then try to open it again.
xattr -c EasySpider.app
When executing the xattr command, if an error like the one below occurs, you can ignore it. After the execution is finished, you can open the software:
You can now open and use the software.
xattr: [Errno 13] Permission denied: 'EasySpider.app/Contents/Resources/app/node_modules/node-window-manager/build/node_gyp_bins/python3'
File access permissions must be granted, but microphone permissions are not needed at all. The author is also unclear why microphone access would be requested, so it can be refused.
During the execution of tasks, if an error similar to the one below occurs, it can also be ignored:
- Design phase - Intel chip version of MacOS
Traceback (most recent call last):
File "multiprocessing/resource_tracker.py", line 209, in main
KeyError: '/mp-5dxyey7c'
1. Due to MacOS's security policy, the system does not allow EasySpider to run when it is first opened, and it will prompt you to move it to the trash. At this point, you need to click "Cancel".
2. Then open System Settings -> Security & Privacy.
3. Click "Open Anyway" (if you can't see it, scroll to the bottom).
Now, you can design tasks as you would in other operating systems.
- Execution phase
The operation is the same as the design phase of the Intel version. When running the 'easyspider_executestage' program for the first time, you need to set "Always Allow" in System Settings -> Security & Privacy, and re-run the "./easyspider_executestage EID" command, and click "Open Anyway" to run the task.
During the execution of the task, if an error similar to the following occurs, it can be ignored:
Traceback (most recent call last): File "multiprocessing/resource_tracker.py", line 209, in main KeyError: '/mp-5dxyey7c'
File access permissions must be granted, but microphone permissions are not needed at all. The author is also unclear why microphone access permissions would be requested, so you can refuse.

View File

@ -0,0 +1 @@
{"id":306,"name":"XML Example","url":"https://www.chinanews.com.cn/rss/scroll-news.xml","links":"https://www.chinanews.com.cn/rss/scroll-news.xml","create_time":"2023-12-23 10:47:31","update_time":"2023-12-23 11:07:16","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.chinanews.com.cn/rss/scroll-news.xml","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.chinanews.com.cn/rss/scroll-news.xml","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.chinanews.com.cn/rss/scroll-news.xml"}],"outputParameters":[{"id":0,"name":"自定义参数_1","desc":"","type":"text","recordASField":1,"exampleValue":"自定义值"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,3],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.chinanews.com.cn/rss/scroll-news.xml","links":"https://www.chinanews.com.cn/rss/scroll-news.xml","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"自定义参数_1","desc":"","iframe":false,"extractType":0,"relativeXPath":"","recordASField":1,"allXPaths":[],"exampleValues":[{"num":0,"value":"自定义值"}],"default":"","beforeJS":"1","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"paraType":"text","splitLine":0}]}},{"id":2,"index":3,"parentId":0,"type":1,"option":8,"title":"循环 - 不固定元素列表","sequence":[2],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"/rss/channel/item","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}},{"id":-1,"index":4,"parentId":0,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":false,"name":"自定义参数_1","desc":"","iframe":false,"extractType":0,"relativeXPath":"/rss/channel/item","recordASField":1,"allXPaths":[],"exampleValues":[{"num":0,"value":"自定义值"}],"default":"","beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"paraType":"text","splitLine":0}]}}]}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1 @@
{"id":308,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"2023-12-23 14:21:24","update_time":"2023-12-23 14:23:36","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":1,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.jd.com"}],"outputParameters":[{"id":0,"name":"参数1_链接文本","desc":"","type":"text","recordASField":1,"exampleValue":"手机"},{"id":1,"name":"参数2_链接地址","desc":"","type":"text","recordASField":1,"exampleValue":"https://shouji.jd.com/"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div/a","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":["/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/a[1]","//a[contains(., '手机')]","/html/body/div[last()-6]/div/div[last()-4]/div/div[last()-2]/div/div/div/div[last()-1]/div[last()-12]/a[last()-1]"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":1,"contentType":8,"relative":true,"name":"参数1_链接文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"手机"}],"unique_index":"m5moh4pro4rlqhoa60d","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0},{"nodeType":2,"contentType":0,"relative":true,"name":"参数2_链接地址","desc":"","relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"https://shouji.jd.com/"}],"unique_index":"m5moh4pro4rlqhoa60d","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0}]}}]}

View File

@ -1,23 +1,46 @@
由于MacOS复杂的安全性设置初次打开软件会显示未验证开发者从而不允许打开的问题请参考以下视频来查看MacOS版本如何打开软件和执行任务https://www.bilibili.com/video/BV1E34y137fT/
对于Arm版本如果显示“包已损坏”则需要使用下面的命令修改包属性
主要步骤如下
xattr -cr 你的EasySpider.app文件路径
- 设计阶段 - Apple Arm芯片版MacOS
对于Arm版本如果显示“软件包已损坏”需要进行以下操作以运行EasySpider
1. 打开系统terminal命令行窗口。
2. 切换到EasySpider软件目录
cd ~/Downloads/EasySpider_MacOS
3. 在EasySpider目录下使用以下命令修改软件包属性
xattr -c 您的EasySpider.app文件路径
如:
xattr -cr /Users/你的用户名/Downloads/EasySpider_MacOS/EasySpider.app
xattr -c EasySpider.app
然后再次尝试打开。
即可打开软件
执行xattr命令时如果出现类似下面的错误可以忽略执行完成之后即可打开软件
xattr: [Errno 13] Permission denied: 'EasySpider.app/Contents/Resources/app/node_modules/node-window-manager/build/node_gyp_bins/python3'
- 设计阶段 - Intel芯片版本MacOS
文件访问权限必须给,麦克风权限完全用不到,作者也不清楚为什么会需要麦克风,因此可以拒绝。
1. 由于MacOS的安全策略系统首次打开EasySpider时不允许运行并会提示您移动到废纸篓这时您需要点击“取消”
任务执行过程中,如果出现类似下面的错误,同样可以忽略:
2. 然后打开系统偏好设置 -> 安全性与隐私。
Traceback (most recent call last):
File "multiprocessing/resource_tracker.py", line 209, in main
KeyError: '/mp-5dxyey7c'
3. 点击“仍要打开”(如果看不到,滑动至底部)。
现在,您就可以像在其他操作系统中一样设计任务了。
- 执行阶段
与Intel版本设计阶段操作相同首次运行'easyspider_executestage'程序时,需要在系统偏好设置 -> 安全性与隐私中设置“始终允许”,并重新运行"./easyspider_executestage EID"命令,并点击“仍要打开”来运行任务。
在执行任务过程中,如果出现类似下面的错误,可以忽略:
Traceback (most recent call last): File "multiprocessing/resource_tracker.py", line 209, in main KeyError: '/mp-5dxyey7c'
必须授予文件访问权限,但根本不需要麦克风权限。作者也不清楚为什么会请求麦克风访问权限,所以可以拒绝。

Binary file not shown.

Binary file not shown.

View File

@ -53,7 +53,6 @@ try:
except:
print("OCR识别无法在当前环境下使用ddddocr库缺失请使用完整版执行器easyspider_executestage_full来运行需要OCR识别的任务。")
print("OCR recognition cannot be used in the current environment (ddddocr library is missing), please use the executor with ddddocr 'easyspider_executestage_full' to run the task which requires OCR recognition.")
time.sleep(2)
from urllib.parse import urljoin
from lxml import etree, html
try:
@ -61,7 +60,7 @@ try:
except:
print("数据去重无法在当前环境下使用pandas库缺失请使用完整版执行器easyspider_executestage_full来运行需要去重的任务。")
print("Data deduplication cannot be used in the current environment (pandas library is missing), please use the executor with pandas 'easyspider_executestage_full' to run the task which requires data deduplication.")
time.sleep(2)
time.sleep(1)
# import numpy
# import pytesseract