mirror of
https://github.com/NaiboWang/EasySpider.git
synced 2025-04-23 01:29:20 +08:00
Linux shell command update
This commit is contained in:
parent
2cb9d799ef
commit
b746556d71
2
ElectronJS/.gitignore
vendored
2
ElectronJS/.gitignore
vendored
@ -10,7 +10,5 @@ user_data/
|
||||
Data/
|
||||
Chrome/
|
||||
execution_instances/*
|
||||
EasySpider_en.crx
|
||||
EasySpider_zh.crx
|
||||
.DS_Store
|
||||
npminstall-debug.log
|
||||
|
BIN
ElectronJS/EasySpider_en.crx
Normal file
BIN
ElectronJS/EasySpider_en.crx
Normal file
Binary file not shown.
BIN
ElectronJS/EasySpider_zh.crx
Normal file
BIN
ElectronJS/EasySpider_zh.crx
Normal file
Binary file not shown.
@ -296,7 +296,7 @@
|
||||
} else if(OSInfo.version == 'win32' && OSInfo.bit == 'ia32'){
|
||||
app.$data.command = "./EasySpider/resources/app/chrome_win32/easyspider_executestage.exe --id [" + app.$data.ID.toString() + "] --user_data " + (app.$data.with_user_data ? "1" : "0") + " --server_address " + app.$data.backEndAddressServiceWrapper;
|
||||
} else if(OSInfo.version == 'linux'){
|
||||
app.$data.command = "./EasySpider/resources/app/chrome_linux64/easyspider_executestage --id [" + app.$data.ID.toString() + "] --user_data " + (app.$data.with_user_data ? "1" : "0") + " --server_address " + app.$data.backEndAddressServiceWrapper;
|
||||
app.$data.command = "./EasySpider/resources/app/chrome_linux64/easyspider_executestage --id '[" + app.$data.ID.toString() + "]' --user_data " + (app.$data.with_user_data ? "1" : "0") + " --server_address " + app.$data.backEndAddressServiceWrapper;
|
||||
} else if(OSInfo.version == 'darwin'){
|
||||
app.$data.command = "./easyspider_executestage --id [" + app.$data.ID.toString() + "] --user_data " + (app.$data.with_user_data ? "1" : "0") + " --server_address " + app.$data.backEndAddressServiceWrapper;
|
||||
}
|
||||
|
1
ElectronJS/tasks/82.json
Normal file
1
ElectronJS/tasks/82.json
Normal file
File diff suppressed because one or more lines are too long
@ -0,0 +1,52 @@
|
||||
## Update Instruction
|
||||
|
||||
|
||||
1. Advanced Operations:
|
||||
|
||||
- Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations.
|
||||
- Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element.
|
||||
|
||||
2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
|
||||
|
||||
|
||||
3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
|
||||
|
||||
4. Added the functionality to extract the background image URL of elements, current page title, and current page URL.
|
||||
|
||||
5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
|
||||
|
||||
6. Added the functionality to download images.
|
||||
|
||||
7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
|
||||
|
||||
8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
|
||||
|
||||
9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
|
||||
|
||||
10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
|
||||
|
||||
11. Added instructions on how to execute tasks from the command line.
|
||||
|
||||
12. Added headless mode configuration, allowing the software to run without a browser interface.
|
||||
|
||||
13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
|
||||
|
||||
14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
|
||||
|
||||
15. Fixed the issue where the input box would freeze after saving a task.
|
||||
|
||||
16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
|
||||
|
||||
17. Added the functionality to move the mouse to an element.
|
||||
|
||||
18. Displays a prompt when an element cannot be found.
|
||||
|
||||
19. Fixed the webpage scrolling bug.
|
||||
|
||||
20. The task name is initialized with the value of the page title upon the first visit.
|
||||
|
||||
21. Added version update prompts.
|
||||
|
||||
22. Added the information of the publisher as requested.
|
||||
|
||||
23. Updated Chrome version to 113.
|
63
Releases/EasySpider_linux_amd64_Ubuntu/V0.3.1 新特性.txt
Normal file
63
Releases/EasySpider_linux_amd64_Ubuntu/V0.3.1 新特性.txt
Normal file
@ -0,0 +1,63 @@
|
||||
如果下载速度慢,可以考虑中国境内下载地址:[中国境内下载地址](https://github.com/NaiboWang/EasySpider/releases/download/v0.3.0/Download_Link_Address_in_China_Mainland.txt)。
|
||||
|
||||
### 强烈建议大家观看新特性讲解视频
|
||||
|
||||
B站最新版特性视频已上传,新视频非常有用,推荐大家观看。
|
||||
|
||||
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)
|
||||
|
||||
[如何同时执行多个任务(并行多开)](https://www.bilibili.com/video/BV13c411G7LE/)
|
||||
|
||||
[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)
|
||||
|
||||
[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)
|
||||
|
||||
[如何对元素和网页截图及(无头模式)命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)
|
||||
|
||||
[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)
|
||||
|
||||
注意,v0.3.1版本任务task文件夹内`.json`文件和之前所有版本均不兼容,请重新设计v0.3.1版本任务。
|
||||
|
||||
## 更新说明
|
||||
1. 高级操作:
|
||||
- 可以在任务流程中**执行自定义脚本**,包括在浏览器中**执行Javascript指令**以及**操作系统级别的脚本调用**并可**得到命令返回值并记录**,大大扩展了可操作空间。
|
||||
|
||||

|
||||
|
||||
- 在每一个操作执行前和执行后,都可以指定执行一段针对当前定位元素的JavaScript指令。
|
||||
|
||||
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/dde64388-5668-40ff-951e-fb8f60655c49" height=50% width=50%>
|
||||
|
||||
2. **判断条件和循环条件**中同样增加了**执行自定义脚本**,并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件,同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定,自定义操作可以操作循环内元素。
|
||||

|
||||
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/5ce7cf50-e5c9-4714-a83b-9c65934e9c68" width=50%></img>
|
||||
|
||||
3. 可同时生成多种XPath供用户选择,并**预装了XPath Helper扩展**供大家调试XPath。
|
||||
4. 增加采集元素背景图片地址,当前页面标题,当前页面URL地址功能。
|
||||
5. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
|
||||
6. 增加下载图片功能。
|
||||
7. 增加OCR识别元素功能(使用此功能需首先自行安装Tesseract库:[https://blog.csdn.net/u010454030/article/details/80515501](https://blog.csdn.net/u010454030/article/details/80515501))
|
||||
|
||||
8. 可直接提取对元素执行JavaScript代码后的返回值,实现如正则表达式,获得元素背景颜色等功能。
|
||||
9. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
|
||||
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/c0b2bec1-2a97-4516-930e-1b310697212b" width=50%></img>
|
||||
|
||||

|
||||
|
||||
10. 大幅增加使用提示和说明,使软件更易用(如增加了iframe标签的处理方式说明,各个选项的参数意义,以及循环项XPath的修改说明等等)。
|
||||
11. 执行命令时增加了如何用命令行执行任务的提示:[https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction](https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction)。
|
||||

|
||||
12. 增加并行多开模式。
|
||||
13. 增加无头模式,即无浏览器界面模式配置。
|
||||
14. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
|
||||
15. 修复了条件分支没有无条件分支时会卡死的问题。
|
||||
16. 修复了保存任务后会输入框卡死的问题。
|
||||
17. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
|
||||
18. 增加了鼠标移动到元素功能。
|
||||
19. 找不到元素时会提示。
|
||||
20. 修复网页滚动Bug。
|
||||
21. 增加新增提取数据字段操作。
|
||||
22. 任务名称初始化为第一次进入页面的标题值。
|
||||
23. 增加版本更新提示。
|
||||
24. 应要求增加出品方信息。
|
||||
25. 更新chrome版本为113。
|
File diff suppressed because one or more lines are too long
@ -1 +0,0 @@
|
||||
{"id":1,"name":"知乎_登录后采集","url":"https://www.zhihu.com","links":"https://www.zhihu.com","containJudge":false,"desc":"https://www.zhihu.com\n使用带用户配置的浏览器模式来先手工登录后保存信息,再接着执行。","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.zhihu.com"}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"string","exampleValue":"历史上有哪些通过“正当手段”干出不正当事的人物?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","scrollType":0,"scrollCount":0}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","wait":0,"scrollType":0,"scrollCount":0,"loopType":2,"pathList":"//*[contains(@class, \"css-0\")]/div[2]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[3]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[4]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[5]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[6]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[7]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[8]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[9]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[10]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[11]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[12]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[13]/div[1]/div[1]/div[1]/h2[1]/div[1]","textList":"","exitCount":0,"historyWait":2}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","wait":0,"paras":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","relativeXPath":"","exampleValues":[{"num":0,"value":"历史上有哪些通过“正当手段”干出不正当事的人物?"},{"num":1,"value":"新加坡有哪些不好的地方?"},{"num":2,"value":"孙悟空可以秒杀山村老尸那样的厉鬼吗?"},{"num":3,"value":"为什么渐渐厌倦玩《原神》了?"},{"num":4,"value":"历史上有哪些著名的考古乌龙事件?"},{"num":5,"value":"苹果公司为什么能把用户调教得这么好?"},{"num":6,"value":"哪个瞬间让你发现了世界的bug?"},{"num":7,"value":"假如中国的院士,想为亲属谋体制内的工作,难度大吗?为什么?"},{"num":8,"value":"你一直珍藏的视频是哪个?"},{"num":9,"value":"如何评价《原神》角色艾莉丝?"},{"num":10,"value":"索罗斯如何做空的英镑、泰铢?为什么做空香港失败了?"},{"num":11,"value":"如何在婚前认清并杜绝王力宏这种男人?"}],"default":""}],"loopType":2}}]}
|
@ -1 +0,0 @@
|
||||
{"id":2,"name":"知乎_登录后采集","url":"https://www.zhihu.com","links":"https://www.zhihu.com","containJudge":false,"desc":"https://www.zhihu.com\n使用带用户配置的浏览器模式来先手工登录后保存信息,再接着执行。","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.zhihu.com"}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"string","exampleValue":"历史上有哪些通过“正当手段”干出不正当事的人物?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","scrollType":0,"scrollCount":0}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","wait":0,"scrollType":0,"scrollCount":0,"loopType":2,"pathList":"//*[contains(@class, \"css-0\")]/div[2]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[3]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[4]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[5]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[6]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[7]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[8]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[9]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[10]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[11]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[12]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[13]/div[1]/div[1]/div[1]/h2[1]/div[1]","textList":"","exitCount":0,"historyWait":2}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","wait":0,"paras":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","relativeXPath":"","exampleValues":[{"num":0,"value":"历史上有哪些通过“正当手段”干出不正当事的人物?"},{"num":1,"value":"新加坡有哪些不好的地方?"},{"num":2,"value":"孙悟空可以秒杀山村老尸那样的厉鬼吗?"},{"num":3,"value":"为什么渐渐厌倦玩《原神》了?"},{"num":4,"value":"历史上有哪些著名的考古乌龙事件?"},{"num":5,"value":"苹果公司为什么能把用户调教得这么好?"},{"num":6,"value":"哪个瞬间让你发现了世界的bug?"},{"num":7,"value":"假如中国的院士,想为亲属谋体制内的工作,难度大吗?为什么?"},{"num":8,"value":"你一直珍藏的视频是哪个?"},{"num":9,"value":"如何评价《原神》角色艾莉丝?"},{"num":10,"value":"索罗斯如何做空的英镑、泰铢?为什么做空香港失败了?"},{"num":11,"value":"如何在婚前认清并杜绝王力宏这种男人?"}],"default":""}],"loopType":2}}]}
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -1,12 +1,17 @@
|
||||
在此文件夹下打开Linux Terimal, 并输入以下命令运行软件:
|
||||
./easy-spider.sh
|
||||
注意软件运行过程中不要关闭terminal。
|
||||
|
||||
To open the EasySpider, please open your terminal, and then type:
|
||||
./easy-spider.sh
|
||||
Then EasySpider will be opened, and don't close the terminal when running EasySpider.
|
||||
|
||||
Official Site: https://github.com/NaiboWang/EasySpider
|
||||
|
||||
可以从其他机器导入任务,只需要把其他机器的tasks文件夹里的.json文件放入此目录的tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意,两个文件夹里的.json文件只支持命名为大于0的数字。
|
||||
Welcome to promote this software to other friends.
|
||||
|
||||
This version is for Windows 10 x64 and above.
|
||||
|
||||
Please wait for at most 20 seconds if you see a white screen when open EasySpider.
|
||||
|
||||
Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp
|
||||
|
||||
The software is totally not trojan/virus! If mistaken by antivirus software such as windows defender as a virus, please recover it, or open "EasySpider.bat" to run our software instead.
|
||||
|
||||
Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders.
|
||||
|
1
Releases/EasySpider_linux_amd64_Ubuntu/tasks/0.json
Normal file
1
Releases/EasySpider_linux_amd64_Ubuntu/tasks/0.json
Normal file
@ -0,0 +1 @@
|
||||
{"id":0,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/27/2023, 6:15:47 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[{"id":0,"name":"参数1_页面标题","desc":"","type":"string","exampleValue":"京东全球版-专业的综合网上购物商城"},{"id":1,"name":"参数2_图片页面网址","desc":"","type":"string","exampleValue":"https://global.jd.com/"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2,3],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"paras":[{"nodeType":0,"contentType":6,"relative":false,"name":"参数1_页面标题","desc":"","extractType":0,"relativeXPath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[2]","allXPaths":["/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[2]","//div[contains(., '')]","//DIV[@class='slick-slide slick-active slick-current']"],"exampleValues":[{"num":0,"value":"京东全球版-专业的综合网上购物商城"}],"default":"","beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0},{"nodeType":0,"contentType":5,"relative":false,"name":"参数2_图片页面网址","desc":"","extractType":0,"relativeXPath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[2]/div[1]/div[1]/a[1]/img[1]","allXPaths":["/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[2]/div[1]/div[1]/a[1]/img[1]","//img[contains(., '')]"],"exampleValues":[{"num":0,"value":"https://global.jd.com/"}],"default":"","beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}]}},{"id":3,"index":3,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[4],"isInLoop":false,"position":2,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":""}},{"id":4,"index":4,"parentId":3,"type":0,"option":7,"title":"移动到元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"allXPaths":"","loopType":1}}]}
|
1
Releases/EasySpider_linux_amd64_Ubuntu/tasks/1.json
Normal file
1
Releases/EasySpider_linux_amd64_Ubuntu/tasks/1.json
Normal file
@ -0,0 +1 @@
|
||||
{"id":1,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/27/2023, 6:15:15 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[{"id":0,"name":"参数1_图片地址","desc":"","type":"string","exampleValue":"//m.360buyimg.com/babel/s1125x600_jfs/t1/156011/19/36990/85599/646c850aF5e22eaa0/87641bfb5cf707ba.jpg!q70.dpg"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div/div[1]/div[1]/a[1]/img[1]","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/a[1]/img[1]","//img[contains(., '')]"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"paras":[{"nodeType":4,"contentType":0,"relative":true,"name":"参数1_图片地址","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"//m.360buyimg.com/babel/s1125x600_jfs/t1/156011/19/36990/85599/646c850aF5e22eaa0/87641bfb5cf707ba.jpg!q70.dpg"},{"num":1,"value":"//m.360buyimg.com/babel/s1420x740_jfs/t1/194401/20/32669/76553/64142a96F7733e6ad/cf2727848c86cf45.jpg!q70.dpg"},{"num":2,"value":"//m.360buyimg.com/babel/s1420x740_jfs/t1/157323/27/24475/67142/646dee40F69bc6df5/fe4249a7d6dab710.jpg!q70.dpg"},{"num":3,"value":"//m.360buyimg.com/babel/s710x370_jfs/t1/197659/30/31344/62825/640fd751F694963ed/a6e1ac2e5c27f160.jpg!q70.dpg"},{"num":4,"value":"//m.360buyimg.com/babel/s1420x740_jfs/t1/194401/20/32669/76553/64142a96F7733e6ad/cf2727848c86cf45.jpg!q70.dpg"},{"num":5,"value":"//m.360buyimg.com/babel/s1420x740_jfs/t1/157323/27/24475/67142/646dee40F69bc6df5/fe4249a7d6dab710.jpg!q70.dpg"},{"num":6,"value":"//m.360buyimg.com/babel/s710x370_jfs/t1/197659/30/31344/62825/640fd751F694963ed/a6e1ac2e5c27f160.jpg!q70.dpg"},{"num":7,"value":"//m.360buyimg.com/babel/s1125x600_jfs/t1/156011/19/36990/85599/646c850aF5e22eaa0/87641bfb5cf707ba.jpg!q70.dpg"},{"num":8,"value":"//m.360buyimg.com/babel/s1420x740_jfs/t1/194401/20/32669/76553/64142a96F7733e6ad/cf2727848c86cf45.jpg!q70.dpg"}],"default":"","beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":1}],"loopType":1}}]}
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -1 +0,0 @@
|
||||
{"id": 20, "name": "Bilibili\u7c89\u4e1d", "url": "https://space.bilibili.com/291929894/fans/fans", "links": "https://space.bilibili.com/291929894/fans/fans", "containJudge": false, "desc": "https://space.bilibili.com/291929894/fans/fans", "inputParameters": [{"id": 0, "name": "urlList_0", "nodeId": 1, "nodeName": "Open Page", "value": "https://space.bilibili.com/291929894/fans/fans", "desc": "List of URLs to be collected, separated by \\n for multiple lines", "type": "string", "exampleValue": "https://space.bilibili.com/291929894/fans/fans"}, {"id": 1, "name": "loopTimes_Loop_1", "nodeId": 2, "nodeName": "Loop", "desc": "Number of loop executions, 0 means unlimited loops (until element not found)", "type": "int", "exampleValue": 0, "value": 0}], "outputParameters": [{"id": 0, "name": "\u53c2\u65701_\u6587\u672c", "desc": "", "type": "string", "exampleValue": "\u5bf9\u65b9\u7b54\u590d5"}], "graph": [{"index": 0, "id": 0, "parentId": 0, "type": -1, "option": 0, "title": "root", "sequence": [1, 2], "parameters": {"history": 1, "tabIndex": 0, "useLoop": false, "xpath": "", "wait": 0}, "isInLoop": false}, {"id": 1, "index": 1, "parentId": 0, "type": 0, "option": 1, "title": "Open Page", "sequence": [], "isInLoop": false, "position": 0, "parameters": {"useLoop": false, "xpath": "", "wait": 0, "url": "https://space.bilibili.com/291929894/fans/fans", "links": "https://space.bilibili.com/291929894/fans/fans", "scrollType": 0, "scrollCount": 0}}, {"id": 2, "index": 2, "parentId": 0, "type": 1, "option": 8, "title": "Loop", "sequence": [4], "isInLoop": false, "position": 1, "parameters": {"history": 4, "tabIndex": -1, "useLoop": false, "xpath": "//a[contains(text(),\"\u4e0b\u4e00\u9875\")]", "wait": 0, "scrollType": 0, "scrollCount": 0, "loopType": 0, "pathList": "", "textList": "", "exitCount": 0, "historyWait": 2}}, {"id": -1, "index": 3, "parentId": 2, "type": 0, "option": 2, "title": "Click Element", "sequence": [], "isInLoop": true, "position": 1, "parameters": {"history": 4, "tabIndex": -1, "useLoop": true, "xpath": "//*[@id=\"page-follows\"]/div[1]/div[2]/div[2]/div[2]/ul[2]/li[7]", "wait": 1, "scrollType": 0, "scrollCount": 0, "paras": [], "loopType": 0}}, {"id": 3, "index": 4, "parentId": 2, "type": 1, "option": 8, "title": "Loop", "sequence": [5], "isInLoop": true, "position": 0, "parameters": {"history": 4, "tabIndex": -1, "useLoop": false, "xpath": "/html/body/div[2]/div[4]/div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/ul[1]/li/div[2]/a[1]/span[1]", "wait": 0, "scrollType": 0, "scrollCount": 0, "loopType": 1, "pathList": "", "textList": "", "exitCount": 0, "historyWait": 2}}, {"id": 4, "index": 5, "parentId": 3, "type": 0, "option": 3, "title": "Extract Data", "sequence": [], "isInLoop": true, "position": 0, "parameters": {"history": 4, "tabIndex": -1, "useLoop": false, "xpath": "", "wait": 0, "paras": [{"nodeType": 0, "contentType": 0, "relative": true, "name": "\u53c2\u65701_\u6587\u672c", "desc": "", "relativeXPath": "", "exampleValues": [{"num": 0, "value": "\u5bf9\u65b9\u7b54\u590d5"}], "default": ""}], "loopType": 1}}]}
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -1 +0,0 @@
|
||||
{"id":32,"name":"知乎_登录后采集","url":"https://www.zhihu.com","links":"https://www.zhihu.com","containJudge":false,"desc":"https://www.zhihu.com\n使用带用户配置的浏览器模式来先手工登录后保存信息,再接着执行。","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.zhihu.com"}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"string","exampleValue":"历史上有哪些通过“正当手段”干出不正当事的人物?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","scrollType":0,"scrollCount":0}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","wait":0,"scrollType":0,"scrollCount":0,"loopType":2,"pathList":"//*[contains(@class, \"css-0\")]/div[2]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[3]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[4]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[5]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[6]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[7]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[8]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[9]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[10]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[11]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[12]/div[1]/div[1]/div[1]/h2[1]/div[1]\n//*[contains(@class, \"css-0\")]/div[13]/div[1]/div[1]/div[1]/h2[1]/div[1]","textList":"","exitCount":0,"historyWait":2}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","wait":0,"paras":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","relativeXPath":"","exampleValues":[{"num":0,"value":"历史上有哪些通过“正当手段”干出不正当事的人物?"},{"num":1,"value":"新加坡有哪些不好的地方?"},{"num":2,"value":"孙悟空可以秒杀山村老尸那样的厉鬼吗?"},{"num":3,"value":"为什么渐渐厌倦玩《原神》了?"},{"num":4,"value":"历史上有哪些著名的考古乌龙事件?"},{"num":5,"value":"苹果公司为什么能把用户调教得这么好?"},{"num":6,"value":"哪个瞬间让你发现了世界的bug?"},{"num":7,"value":"假如中国的院士,想为亲属谋体制内的工作,难度大吗?为什么?"},{"num":8,"value":"你一直珍藏的视频是哪个?"},{"num":9,"value":"如何评价《原神》角色艾莉丝?"},{"num":10,"value":"索罗斯如何做空的英镑、泰铢?为什么做空香港失败了?"},{"num":11,"value":"如何在婚前认清并杜绝王力宏这种男人?"}],"default":""}],"loopType":2}}]}
|
@ -1 +0,0 @@
|
||||
{"id":33,"name":"JD","url":"https://www.jd.com","links":"https://www.jd.com","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"},{"id":1,"name":"inputText_1","nodeName":"输入文字","nodeId":2,"desc":"要输入的文本,如京东搜索框输入:电脑","type":"string","exampleValue":"123","value":"123"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2,3],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"url":"https://www.jd.com","links":"https://www.jd.com","scrollType":0,"scrollCount":0}},{"id":2,"index":2,"parentId":0,"type":0,"option":4,"title":"输入文字","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[@id=\"key\"]","wait":0,"value":"123"}},{"id":3,"index":3,"parentId":0,"type":0,"option":2,"title":"点击元素","sequence":[],"isInLoop":false,"position":2,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[@id=\"search-btn\"]/i[1]","wait":0,"scrollType":0,"scrollCount":0,"paras":[]}}]}
|
@ -1 +0,0 @@
|
||||
{"id":34,"name":"新web采集任务","url":"https://www.jd.com","links":"https://www.jd.com","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"url":"https://www.jd.com","links":"https://www.jd.com","scrollType":0,"scrollCount":0}}]}
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
18
Releases/EasySpider_linux_amd64_Ubuntu/软件使用说明.txt
Normal file
18
Releases/EasySpider_linux_amd64_Ubuntu/软件使用说明.txt
Normal file
@ -0,0 +1,18 @@
|
||||
欢迎将软件宣传给更多需要的朋友!
|
||||
|
||||
在此文件夹下打开Linux Terimal, 并输入以下命令运行软件:
|
||||
./easy-spider.sh
|
||||
注意软件运行过程中不要关闭terminal。
|
||||
|
||||
官方网址: https://github.com/NaiboWang/EasySpider
|
||||
|
||||
支持Windows 10 x64及以上版本。
|
||||
|
||||
打开如果白屏请等待最多20秒,界面就会显示。
|
||||
|
||||
视频教程:https://www.bilibili.com/video/BV1Fk4y1L7xX/
|
||||
|
||||
这个软件绝对不是特洛伊木马/病毒!如果被像 Windows Defender 这样的杀毒软件误认为是病毒,请进行恢复,或者打开“EasySpider.bat”来运行我们的软件。
|
||||
|
||||
可以从其他机器导入任务,只需要把其他机器的tasks文件夹里的.json文件放入此目录的tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意,两个文件夹里的.json文件只支持命名为大于0的数字。
|
||||
|
Loading…
x
Reference in New Issue
Block a user