mirror of
https://github.com/NaiboWang/EasySpider.git
synced 2025-04-16 16:26:56 +08:00
MacOS with two execute stage version
This commit is contained in:
parent
476cec0537
commit
2fb6283063
@ -2,24 +2,46 @@ Due to the complex security settings of MacOS, the issue of being unable to open
|
||||
|
||||
https://github.com/NaiboWang/EasySpider/wiki/MacOS-Guide
|
||||
|
||||
For the Arm version, if it shows "the package is damaged", you need to use the following command to modify the package attributes:
|
||||
The main steps are as follows:
|
||||
|
||||
- Design phase - Apple Arm chip version of MacOS
|
||||
|
||||
1. For the Arm version, if it shows "The package is damaged", you need to do the following to run EasySpider:
|
||||
|
||||
2. Open the terminal command line window.
|
||||
|
||||
3. Switch to the EasySpider software directory, such as:
|
||||
|
||||
cd ~/Downloads/EasySpider_MacOS
|
||||
|
||||
4. In the EasySpider directory, use the following command to modify the software package attributes:
|
||||
|
||||
xattr -c YourEasySpider.appFilePath
|
||||
|
||||
xattr -cr Your EasySpider.app file path
|
||||
|
||||
For example:
|
||||
|
||||
xattr -cr /Users/your_username/Downloads/EasySpider_MacOS/EasySpider.app
|
||||
|
||||
Then try to open it again.
|
||||
xattr -c EasySpider.app
|
||||
|
||||
When executing the xattr command, if an error like the one below occurs, you can ignore it. After the execution is finished, you can open the software:
|
||||
You can now open and use the software.
|
||||
|
||||
xattr: [Errno 13] Permission denied: 'EasySpider.app/Contents/Resources/app/node_modules/node-window-manager/build/node_gyp_bins/python3'
|
||||
|
||||
File access permissions must be granted, but microphone permissions are not needed at all. The author is also unclear why microphone access would be requested, so it can be refused.
|
||||
|
||||
During the execution of tasks, if an error similar to the one below occurs, it can also be ignored:
|
||||
- Design phase - Intel chip version of MacOS
|
||||
|
||||
Traceback (most recent call last):
|
||||
File "multiprocessing/resource_tracker.py", line 209, in main
|
||||
KeyError: '/mp-5dxyey7c'
|
||||
1. Due to MacOS's security policy, the system does not allow EasySpider to run when it is first opened, and it will prompt you to move it to the trash. At this point, you need to click "Cancel".
|
||||
|
||||
2. Then open System Settings -> Security & Privacy.
|
||||
|
||||
3. Click "Open Anyway" (if you can't see it, scroll to the bottom).
|
||||
|
||||
Now, you can design tasks as you would in other operating systems.
|
||||
|
||||
|
||||
- Execution phase
|
||||
|
||||
The operation is the same as the design phase of the Intel version. When running the 'easyspider_executestage' program for the first time, you need to set "Always Allow" in System Settings -> Security & Privacy, and re-run the "./easyspider_executestage EID" command, and click "Open Anyway" to run the task.
|
||||
|
||||
During the execution of the task, if an error similar to the following occurs, it can be ignored:
|
||||
|
||||
Traceback (most recent call last): File "multiprocessing/resource_tracker.py", line 209, in main KeyError: '/mp-5dxyey7c'
|
||||
|
||||
File access permissions must be granted, but microphone permissions are not needed at all. The author is also unclear why microphone access permissions would be requested, so you can refuse.
|
1
.temp_to_pub/EasySpider_MacOS/Sample Tasks/306.json
Normal file
1
.temp_to_pub/EasySpider_MacOS/Sample Tasks/306.json
Normal file
@ -0,0 +1 @@
|
||||
{"id":306,"name":"XML Example","url":"https://www.chinanews.com.cn/rss/scroll-news.xml","links":"https://www.chinanews.com.cn/rss/scroll-news.xml","create_time":"2023-12-23 10:47:31","update_time":"2023-12-23 11:07:16","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.chinanews.com.cn/rss/scroll-news.xml","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.chinanews.com.cn/rss/scroll-news.xml","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.chinanews.com.cn/rss/scroll-news.xml"}],"outputParameters":[{"id":0,"name":"自定义参数_1","desc":"","type":"text","recordASField":1,"exampleValue":"自定义值"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,3],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.chinanews.com.cn/rss/scroll-news.xml","links":"https://www.chinanews.com.cn/rss/scroll-news.xml","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"自定义参数_1","desc":"","iframe":false,"extractType":0,"relativeXPath":"","recordASField":1,"allXPaths":[],"exampleValues":[{"num":0,"value":"自定义值"}],"default":"","beforeJS":"1","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"paraType":"text","splitLine":0}]}},{"id":2,"index":3,"parentId":0,"type":1,"option":8,"title":"循环 - 不固定元素列表","sequence":[2],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"/rss/channel/item","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}},{"id":-1,"index":4,"parentId":0,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":false,"name":"自定义参数_1","desc":"","iframe":false,"extractType":0,"relativeXPath":"/rss/channel/item","recordASField":1,"allXPaths":[],"exampleValues":[{"num":0,"value":"自定义值"}],"default":"","beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"paraType":"text","splitLine":0}]}}]}
|
1
.temp_to_pub/EasySpider_MacOS/Sample Tasks/307.json
Normal file
1
.temp_to_pub/EasySpider_MacOS/Sample Tasks/307.json
Normal file
File diff suppressed because one or more lines are too long
1
.temp_to_pub/EasySpider_MacOS/Sample Tasks/308.json
Normal file
1
.temp_to_pub/EasySpider_MacOS/Sample Tasks/308.json
Normal file
@ -0,0 +1 @@
|
||||
{"id":308,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"2023-12-23 14:21:24","update_time":"2023-12-23 14:23:36","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":1,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.jd.com"}],"outputParameters":[{"id":0,"name":"参数1_链接文本","desc":"","type":"text","recordASField":1,"exampleValue":"手机"},{"id":1,"name":"参数2_链接地址","desc":"","type":"text","recordASField":1,"exampleValue":"https://shouji.jd.com/"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div/a","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":["/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/a[1]","//a[contains(., '手机')]","/html/body/div[last()-6]/div/div[last()-4]/div/div[last()-2]/div/div/div/div[last()-1]/div[last()-12]/a[last()-1]"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":1,"contentType":8,"relative":true,"name":"参数1_链接文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"手机"}],"unique_index":"m5moh4pro4rlqhoa60d","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0},{"nodeType":2,"contentType":0,"relative":true,"name":"参数2_链接地址","desc":"","relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"https://shouji.jd.com/"}],"unique_index":"m5moh4pro4rlqhoa60d","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0}]}}]}
|
@ -1,23 +1,46 @@
|
||||
由于MacOS复杂的安全性设置,初次打开软件会显示未验证开发者从而不允许打开的问题,请参考以下视频来查看MacOS版本如何打开软件和执行任务:https://www.bilibili.com/video/BV1E34y137fT/
|
||||
|
||||
对于Arm版本,如果显示“包已损坏”,则需要使用下面的命令修改包属性:
|
||||
主要步骤如下:
|
||||
|
||||
xattr -cr 你的EasySpider.app文件路径
|
||||
- 设计阶段 - Apple Arm芯片版MacOS
|
||||
|
||||
对于Arm版本,如果显示“软件包已损坏”,需要进行以下操作以运行EasySpider:
|
||||
|
||||
1. 打开系统terminal命令行窗口。
|
||||
|
||||
2. 切换到EasySpider软件目录,如:
|
||||
|
||||
cd ~/Downloads/EasySpider_MacOS
|
||||
|
||||
3. 在EasySpider目录下,使用以下命令修改软件包属性:
|
||||
|
||||
xattr -c 您的EasySpider.app文件路径
|
||||
|
||||
如:
|
||||
|
||||
xattr -cr /Users/你的用户名/Downloads/EasySpider_MacOS/EasySpider.app
|
||||
xattr -c EasySpider.app
|
||||
|
||||
然后再次尝试打开。
|
||||
即可打开软件。
|
||||
|
||||
执行xattr命令时如果出现类似下面的错误可以忽略,执行完成之后即可打开软件:
|
||||
|
||||
xattr: [Errno 13] Permission denied: 'EasySpider.app/Contents/Resources/app/node_modules/node-window-manager/build/node_gyp_bins/python3'
|
||||
- 设计阶段 - Intel芯片版本MacOS
|
||||
|
||||
文件访问权限必须给,麦克风权限完全用不到,作者也不清楚为什么会需要麦克风,因此可以拒绝。
|
||||
1. 由于MacOS的安全策略,系统首次打开EasySpider时不允许运行,并会提示您移动到废纸篓,这时您需要点击“取消”。
|
||||
|
||||
任务执行过程中,如果出现类似下面的错误,同样可以忽略:
|
||||
2. 然后打开系统偏好设置 -> 安全性与隐私。
|
||||
|
||||
Traceback (most recent call last):
|
||||
File "multiprocessing/resource_tracker.py", line 209, in main
|
||||
KeyError: '/mp-5dxyey7c'
|
||||
3. 点击“仍要打开”(如果看不到,滑动至底部)。
|
||||
|
||||
现在,您就可以像在其他操作系统中一样设计任务了。
|
||||
|
||||
|
||||
|
||||
- 执行阶段
|
||||
|
||||
与Intel版本设计阶段操作相同,首次运行'easyspider_executestage'程序时,需要在系统偏好设置 -> 安全性与隐私中设置“始终允许”,并重新运行"./easyspider_executestage EID"命令,并点击“仍要打开”来运行任务。
|
||||
|
||||
在执行任务过程中,如果出现类似下面的错误,可以忽略:
|
||||
|
||||
Traceback (most recent call last): File "multiprocessing/resource_tracker.py", line 209, in main KeyError: '/mp-5dxyey7c'
|
||||
|
||||
必须授予文件访问权限,但根本不需要麦克风权限。作者也不清楚为什么会请求麦克风访问权限,所以可以拒绝。
|
Binary file not shown.
Binary file not shown.
@ -53,7 +53,6 @@ try:
|
||||
except:
|
||||
print("OCR识别无法在当前环境下使用(ddddocr库缺失),请使用完整版执行器easyspider_executestage_full来运行需要OCR识别的任务。")
|
||||
print("OCR recognition cannot be used in the current environment (ddddocr library is missing), please use the executor with ddddocr 'easyspider_executestage_full' to run the task which requires OCR recognition.")
|
||||
time.sleep(2)
|
||||
from urllib.parse import urljoin
|
||||
from lxml import etree, html
|
||||
try:
|
||||
@ -61,7 +60,7 @@ try:
|
||||
except:
|
||||
print("数据去重无法在当前环境下使用(pandas库缺失),请使用完整版执行器easyspider_executestage_full来运行需要去重的任务。")
|
||||
print("Data deduplication cannot be used in the current environment (pandas library is missing), please use the executor with pandas 'easyspider_executestage_full' to run the task which requires data deduplication.")
|
||||
time.sleep(2)
|
||||
time.sleep(1)
|
||||
|
||||
# import numpy
|
||||
# import pytesseract
|
||||
|
Loading…
x
Reference in New Issue
Block a user