Update Readmes for pub

This commit is contained in:
naibo 2023-12-27 17:41:28 +08:00
parent cff8ae5b93
commit 66918e347c
9 changed files with 45 additions and 677 deletions

View File

@ -4,82 +4,14 @@ Then EasySpider will be opened, and don't close the terminal when running EasySp
Official Site: https://www.easyspider.net
Welcome to promote this software to other friends.
Welcome to promote this software to other friends and star our Github Repository!
This version is for Ubuntu 20.04, Debian, Deepin x64 and above.
The software's open-source code repository on GitHub: https://github.com/NaiboWang/EasySpider
Official documentation can be found at: https://github.com/NaiboWang/EasySpider/wiki
Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp
Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders.
======Version Update Instruction======
Please see more new features for version greater than v0.3.2 at github release page: https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## Update Instruction
1. Selected child element operations can delete fields and unmark deleted fields in real-time in the browser.
2. Selecting child elements adds a selection mode that allows you to choose only the child elements that are present in all blocks or the child elements that are the same as the first selected block.
3. In the text input and webpage open options, you can use the extracted field value as a variable for text input, represented by Field["field_name"].
4. Files can be downloaded, such as PDF files.
5. Fixed a bug where the software could display a blank screen for about 10 seconds after opening, making it usable in intranets, darknets, and any local network.
6. Fixed a bug where the current page URL and title could not be extracted.
7. Fixed a bug where OCR recognition could fail to extract information.
8. Updated extraction logic to save locally every 10 records collected.
9. When modifying a task, the default anchor position is set to after the last operation in the task flow.
10. Updated Chrome version to 114.
-----v0.3.1-----
1. Advanced Operations:
- Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations.
- Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element.
2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
4. Added the functionality to extract the background image URL of elements, current page title, and current page URL.
5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
6. Added the functionality to download images.
7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
11. Added instructions on how to execute tasks from the command line.
12. Added headless mode configuration, allowing the software to run without a browser interface.
13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
15. Fixed the issue where the input box would freeze after saving a task.
16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
17. Added the functionality to move the mouse to an element.
18. Displays a prompt when an element cannot be found.
19. Fixed the webpage scrolling bug.
20. The task name is initialized with the value of the page title upon the first visit.
21. Added version update prompts.
22. Added the information of the publisher as requested.
23. Updated Chrome version to 113.

View File

@ -1,4 +1,4 @@
欢迎将软件宣传给更多需要的朋友!
欢迎将软件宣传给更多需要的朋友和Star我们的Github仓库
在此文件夹下打开Linux Terimal, 并输入以下命令运行软件:
./easy-spider.sh
@ -8,99 +8,10 @@
支持Ubuntu 20.04, Debian, Deepin x64及以上版本。
软件开源代码Github库地址https://github.com/NaiboWang/EasySpider
官方文档地址https://github.com/NaiboWang/EasySpider/wiki
视频教程https://www.bilibili.com/video/BV1th411A7ey/
可以从其他机器导入任务只需要把其他机器的tasks文件夹里的.json文件放入此目录的tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意两个文件夹里的.json文件只支持命名为大于0的数字。
======版本更新说明======
v0.3.2以上版本更新说明请查看Github Release Pages页面https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## 更新说明
1. 选中子元素操作可删除字段并在浏览器中实时取消标记被删除的字段。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/e016c832-6ff9-4814-b86c-38787e73aa30" width=50% />
2. 选中子元素增加选择模式,可以只选择所有块都有的子元素,或者所有块中和第一个选中的块相同的子元素。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/0082b11d-96bc-43f1-acdb-8280decb48b4" width=50% />
3. 输入文字和打开网页选项中可以使用最后一次提取到的字段值**作为变量**进行文字输入,用`Field["字段名"]`表示此变量。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/d81cd082-e01a-490e-85f7-9baac93510d8)
4. 可下载文件如PDF。
5. 修复打开后有可能会白屏10秒左右的Bug使得在内网暗网以及任意局域网都可以使用软件。
6. 修复提取当前页面URL和标题时可能提取不到的bug。
7. 修复OCR识别可能提取不到的bug。
8. 提取逻辑更新为每采集10条本地保存一次。
9. 修改任务时默认锚点位置为任务流程的最后操作后。
10. 更新Chrome版本为114。
-----V0.3.1-----
如果下载速度慢,可以考虑中国境内下载地址:[中国境内下载地址](https://github.com/NaiboWang/EasySpider/releases/download/v0.3.0/Download_Link_Address_in_China_Mainland.txt)。
### 强烈建议大家观看新特性讲解视频
B站最新版特性视频已上传新视频非常有用推荐大家观看。
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)
[如何同时执行多个任务(并行多开)](https://www.bilibili.com/video/BV13c411G7LE/)
[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)
[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)
[如何对元素和网页截图及(无头模式)命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)
[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)
注意v0.3.1版本任务task文件夹内`.json`文件和之前所有版本均不兼容请重新设计v0.3.1版本任务。
## 更新说明
1. 自定义操作:
- 可以在任务流程中**执行自定义脚本**,包括在浏览器中**执行Javascript指令**以及**操作系统级别的脚本调用**并可**得到命令返回值并记录**,大大扩展了可操作空间。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/06e63a06-328d-4339-b40b-2d57c94cee66)
- 在每一个操作执行前和执行后都可以指定执行一段针对当前定位元素的JavaScript指令。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/dde64388-5668-40ff-951e-fb8f60655c49" height=50% width=50%>
2. **判断条件和循环条件**中同样增加了**执行自定义脚本**并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定自定义操作可以操作循环内元素。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/9dea0564-1a1c-487d-9fa4-427c5e284796)
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/5ce7cf50-e5c9-4714-a83b-9c65934e9c68" width=50%></img>
3. 可同时生成多种XPath供用户选择并**预装了XPath Helper扩展**供大家调试XPath。
4. 增加采集元素背景图片地址当前页面标题当前页面URL地址功能。
5. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
6. 增加下载图片功能。
7. 增加OCR识别元素功能使用此功能需首先自行安装Tesseract库[https://blog.csdn.net/u010454030/article/details/80515501](https://blog.csdn.net/u010454030/article/details/80515501)
8. 可直接提取对元素执行JavaScript代码后的返回值实现如正则表达式获得元素背景颜色等功能。
9. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/c0b2bec1-2a97-4516-930e-1b310697212b" width=50%></img>
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/42cc0009-00d1-4c5c-af47-0fa6340fba80)
10. 大幅增加使用提示和说明使软件更易用如增加了iframe标签的处理方式说明各个选项的参数意义以及循环项XPath的修改说明等等
11. 执行命令时增加了如何用命令行执行任务的提示:[https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction](https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction)。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/a9e774df-e345-4d51-b7c9-2c4dac0ec624)
12. 增加并行多开模式。
13. 增加无头模式,即无浏览器界面模式配置。
14. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
15. 修复了条件分支没有无条件分支时会卡死的问题。
16. 修复了保存任务后会输入框卡死的问题。
17. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
18. 增加了鼠标移动到元素功能。
19. 找不到元素时会提示。
20. 修复网页滚动Bug。
21. 增加新增提取数据字段操作。
22. 任务名称初始化为第一次进入页面的标题值。
23. 增加版本更新提示。
24. 应要求增加出品方信息。
25. 更新chrome版本为113。

View File

@ -1,87 +1,17 @@
Official Site: https://www.easyspider.net
Welcome to promote this software to other friends.
Welcome to promote this software to other friends and star our Github Repository!
This version is for MacOS, can be used on all Chips, including Intel (such as Corel i7) and Arm (such as M1). Support on MacOS 11.x and above.
If your MacOS version is 10.x and below, please download EasySpider V0.2.0.
The software's open-source code repository on GitHub: https://github.com/NaiboWang/EasySpider
Official documentation can be found at: https://github.com/NaiboWang/EasySpider/wiki
Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp
You can import tasks from other machines by simply opening the EasySpider software in this directory, right-clicking "Show Package Contents", and then placing the .json files from the tasks folder in the /Users/your user name/Library/Application Support/EasySpider/tasks folder of the other machine. Similarly, execution ID files can be imported by copying the .json files from the execution_instances folder. Please note that the .json files in both folders only support names greater than 0.
If you need to press p one the keyboard to pause and continue the execution of the task, you need to grant the program keyboard monitoring permission.
======Version Update Instruction======
Please see more new features for version greater than v0.3.2 at github release page: https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## Update Instruction
1. Selected child element operations can delete fields and unmark deleted fields in real-time in the browser.
2. Selecting child elements adds a selection mode that allows you to choose only the child elements that are present in all blocks or the child elements that are the same as the first selected block.
3. In the text input and webpage open options, you can use the extracted field value as a variable for text input, represented by Field["field_name"].
4. Files can be downloaded, such as PDF files.
5. Fixed a bug where the software could display a blank screen for about 10 seconds after opening, making it usable in intranets, darknets, and any local network.
6. Fixed a bug where the current page URL and title could not be extracted.
7. Fixed a bug where OCR recognition could fail to extract information.
8. Updated extraction logic to save locally every 10 records collected.
9. When modifying a task, the default anchor position is set to after the last operation in the task flow.
10. Updated Chrome version to 114.
-----V0.3.1-----
1. Advanced Operations:
- Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations.
- Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element.
2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
4. Added the functionality to extract the background image URL of elements, current page title, and current page URL.
5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
6. Added the functionality to download images.
7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
11. Added instructions on how to execute tasks from the command line.
12. Added headless mode configuration, allowing the software to run without a browser interface.
13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
15. Fixed the issue where the input box would freeze after saving a task.
16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
17. Added the functionality to move the mouse to an element.
18. Displays a prompt when an element cannot be found.
19. Fixed the webpage scrolling bug.
20. The task name is initialized with the value of the page title upon the first visit.
21. Added version update prompts.
22. Added the information of the publisher as requested.
23. Updated Chrome version to 113.

View File

@ -1,4 +1,4 @@
欢迎将软件宣传给更多需要的朋友!
欢迎将软件宣传给更多需要的朋友和Star我们的Github仓库
官方网址: https://www.easyspider.cn
@ -6,104 +6,12 @@
10.x版本MacOS请下载v0.2.0版本使用。
软件开源代码Github库地址https://github.com/NaiboWang/EasySpider
官方文档地址https://github.com/NaiboWang/EasySpider/wiki
视频教程https://www.bilibili.com/video/BV1th411A7ey/
可以从其他机器导入任务只需要把其他机器的tasks文件夹里的.json文件放入/Users/你的用户名/Library/Application Support/EasySpider/tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意两个文件夹里的.json文件只支持命名为大于0的数字。
如果需要按p键暂停和继续任务的执行,需要赋予程序键盘监控权限。
======版本更新说明======
v0.3.2以上版本更新说明请查看Github Release Pages页面https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## 更新说明
1. 选中子元素操作可删除字段并在浏览器中实时取消标记被删除的字段。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/e016c832-6ff9-4814-b86c-38787e73aa30" width=50% />
2. 选中子元素增加选择模式,可以只选择所有块都有的子元素,或者所有块中和第一个选中的块相同的子元素。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/0082b11d-96bc-43f1-acdb-8280decb48b4" width=50% />
3. 输入文字和打开网页选项中可以使用最后一次提取到的字段值**作为变量**进行文字输入,用`Field["字段名"]`表示此变量。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/d81cd082-e01a-490e-85f7-9baac93510d8)
4. 可下载文件如PDF。
5. 修复打开后有可能会白屏10秒左右的Bug使得在内网暗网以及任意局域网都可以使用软件。
6. 修复提取当前页面URL和标题时可能提取不到的bug。
7. 修复OCR识别可能提取不到的bug。
8. 提取逻辑更新为每采集10条本地保存一次。
9. 修改任务时默认锚点位置为任务流程的最后操作后。
10. 更新Chrome版本为114。
------V0.3.1------
如果下载速度慢,可以考虑中国境内下载地址:[中国境内下载地址](https://github.com/NaiboWang/EasySpider/releases/download/v0.3.0/Download_Link_Address_in_China_Mainland.txt)。
### 强烈建议大家观看新特性讲解视频
B站最新版特性视频已上传新视频非常有用推荐大家观看。
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)
[如何同时执行多个任务(并行多开)](https://www.bilibili.com/video/BV13c411G7LE/)
[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)
[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)
[如何对元素和网页截图及(无头模式)命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)
[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)
注意v0.3.1版本任务task文件夹内`.json`文件和之前所有版本均不兼容请重新设计v0.3.1版本任务。
## 更新说明
1. 自定义操作:
- 可以在任务流程中**执行自定义脚本**,包括在浏览器中**执行Javascript指令**以及**操作系统级别的脚本调用**并可**得到命令返回值并记录**,大大扩展了可操作空间。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/06e63a06-328d-4339-b40b-2d57c94cee66)
- 在每一个操作执行前和执行后都可以指定执行一段针对当前定位元素的JavaScript指令。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/dde64388-5668-40ff-951e-fb8f60655c49" height=50% width=50%>
2. **判断条件和循环条件**中同样增加了**执行自定义脚本**并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定自定义操作可以操作循环内元素。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/9dea0564-1a1c-487d-9fa4-427c5e284796)
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/5ce7cf50-e5c9-4714-a83b-9c65934e9c68" width=50%></img>
3. 可同时生成多种XPath供用户选择并**预装了XPath Helper扩展**供大家调试XPath。
4. 增加采集元素背景图片地址当前页面标题当前页面URL地址功能。
5. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
6. 增加下载图片功能。
7. 增加OCR识别元素功能使用此功能需首先自行安装Tesseract库[https://blog.csdn.net/u010454030/article/details/80515501](https://blog.csdn.net/u010454030/article/details/80515501)
8. 可直接提取对元素执行JavaScript代码后的返回值实现如正则表达式获得元素背景颜色等功能。
9. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/c0b2bec1-2a97-4516-930e-1b310697212b" width=50%></img>
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/42cc0009-00d1-4c5c-af47-0fa6340fba80)
10. 大幅增加使用提示和说明使软件更易用如增加了iframe标签的处理方式说明各个选项的参数意义以及循环项XPath的修改说明等等
11. 执行命令时增加了如何用命令行执行任务的提示:[https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction](https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction)。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/a9e774df-e345-4d51-b7c9-2c4dac0ec624)
12. 增加并行多开模式。
13. 增加无头模式,即无浏览器界面模式配置。
14. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
15. 修复了条件分支没有无条件分支时会卡死的问题。
16. 修复了保存任务后会输入框卡死的问题。
17. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
18. 增加了鼠标移动到元素功能。
19. 找不到元素时会提示。
20. 修复网页滚动Bug。
21. 增加新增提取数据字段操作。
22. 任务名称初始化为第一次进入页面的标题值。
23. 增加版本更新提示。
24. 应要求增加出品方信息。
25. 更新chrome版本为113。

View File

@ -1,86 +1,15 @@
Official Site: https://www.easyspider.net
Welcome to promote this software to other friends.
Welcome to promote this software to other friends and star our Github Repository!
This version is for Windows 7 and above, including both 32-bit and 64-bit version. Please note that this version of the Chrome browser will always remain at version 109 and will not update with Chrome updates (for compatibility with Windows 7). Therefore, if you want to use the latest version of the Chrome browser for data scraping, please run the x64 version of EasySpider on Windows 10 x64 or higher systems. There is no version support for Windows Server 2012 and below. These systems require manual compilation for execution.
Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp
The software's open-source code repository on GitHub: https://github.com/NaiboWang/EasySpider
Official documentation can be found at: https://github.com/NaiboWang/EasySpider/wiki
The software is totally not trojan/virus! If mistaken by antivirus software such as Windows Defender as a virus, please recover it, or open "EasySpider.bat" to run our software instead.
Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders.
======Version New Features======
Please see more new features for version greater than v0.3.2 at github release page: https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## Update Instruction
1. Selected child element operations can delete fields and unmark deleted fields in real-time in the browser.
2. Selecting child elements adds a selection mode that allows you to choose only the child elements that are present in all blocks or the child elements that are the same as the first selected block.
3. In the text input and webpage open options, you can use the extracted field value as a variable for text input, represented by Field["field_name"].
4. Files can be downloaded, such as PDF files.
5. Fixed a bug where the software could display a blank screen for about 10 seconds after opening, making it usable in intranets, darknets, and any local network.
6. Fixed a bug where the current page URL and title could not be extracted.
7. Fixed a bug where OCR recognition could fail to extract information.
8. Updated extraction logic to save locally every 10 records collected.
9. When modifying a task, the default anchor position is set to after the last operation in the task flow.
10. Updated Chrome version to 114.
-----v0.3.1-----
## Update Instruction
1. Advanced Operations:
- Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations.
- Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element.
2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
4. Added the functionality to extract the background image URL of elements, current page title, and current page URL.
5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
6. Added the functionality to download images.
7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
11. Added instructions on how to execute tasks from the command line.
12. Added headless mode configuration, allowing the software to run without a browser interface.
13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
15. Fixed the issue where the input box would freeze after saving a task.
16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
17. Added the functionality to move the mouse to an element.
18. Displays a prompt when an element cannot be found.
19. Fixed the webpage scrolling bug.
20. The task name is initialized with the value of the page title upon the first visit.
21. Added version update prompts.
22. Added the information of the publisher as requested.
23. Updated Chrome version to 113.
Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders.

View File

@ -1,102 +1,15 @@
欢迎将软件宣传给更多需要的朋友!
欢迎将软件宣传给更多需要的朋友和Star我们的Github仓库
官方网址: https://www.easyspider.cn
支持Windows 7及以上版本包括32位系统和64位系统。注意此版本的Chrome浏览器永远都是109不会随着Chrome更新而更新为了兼容Win 7系统因此如果想用最新版Chrome浏览器采集数据请在Windows 10 x64及以上系统上运行x64版本的EasySpider。无任何版本支持Windows Server 2012及以下版本系统这些系统下需要自行编译运行。
软件开源代码Github库地址https://github.com/NaiboWang/EasySpider
官方文档地址https://github.com/NaiboWang/EasySpider/wiki
视频教程https://www.bilibili.com/video/BV1th411A7ey/
这个软件绝对不是特洛伊木马/病毒如果被像Windows Defender这样的杀毒软件误认为是病毒请进行恢复或者打开“EasySpider.bat”来运行我们的软件。
可以从其他机器导入任务只需要把其他机器的tasks文件夹里的.json文件放入此目录的tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意两个文件夹里的.json文件只支持命名为大于0的数字。
======版本更新说明======
v0.3.2以上版本更新说明请查看Github Release Pages页面https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## 更新说明
1. 选中子元素操作可删除字段并在浏览器中实时取消标记被删除的字段。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/e016c832-6ff9-4814-b86c-38787e73aa30" width=50% />
2. 选中子元素增加选择模式,可以只选择所有块都有的子元素,或者所有块中和第一个选中的块相同的子元素。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/0082b11d-96bc-43f1-acdb-8280decb48b4" width=50% />
3. 输入文字和打开网页选项中可以使用最后一次提取到的字段值**作为变量**进行文字输入,用`Field["字段名"]`表示此变量。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/d81cd082-e01a-490e-85f7-9baac93510d8)
4. 可下载文件如PDF。
5. 修复打开后有可能会白屏10秒左右的Bug使得在内网暗网以及任意局域网都可以使用软件。
6. 修复提取当前页面URL和标题时可能提取不到的bug。
7. 修复OCR识别可能提取不到的bug。
8. 提取逻辑更新为每采集10条本地保存一次。
9. 修改任务时默认锚点位置为任务流程的最后操作后。
10. 更新Chrome版本为114。
-----v0.3.1-----
### 强烈建议大家观看新特性讲解视频
B站最新版特性视频已上传新视频非常有用推荐大家观看。
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)
[如何同时执行多个任务(并行多开)](https://www.bilibili.com/video/BV13c411G7LE/)
[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)
[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)
[如何对元素和网页截图及(无头模式)命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)
[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)
注意v0.3.1版本任务task文件夹内`.json`文件和之前所有版本均不兼容请重新设计v0.3.1版本任务。
## 更新说明
1. 自定义操作:
- 可以在任务流程中**执行自定义脚本**,包括在浏览器中**执行Javascript指令**以及**操作系统级别的脚本调用**并可**得到命令返回值并记录**,大大扩展了可操作空间。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/06e63a06-328d-4339-b40b-2d57c94cee66)
- 在每一个操作执行前和执行后都可以指定执行一段针对当前定位元素的JavaScript指令。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/dde64388-5668-40ff-951e-fb8f60655c49" height=50% width=50%>
2. **判断条件和循环条件**中同样增加了**执行自定义脚本**并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定自定义操作可以操作循环内元素。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/9dea0564-1a1c-487d-9fa4-427c5e284796)
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/5ce7cf50-e5c9-4714-a83b-9c65934e9c68" width=50%></img>
3. 可同时生成多种XPath供用户选择并**预装了XPath Helper扩展**供大家调试XPath。
4. 增加采集元素背景图片地址当前页面标题当前页面URL地址功能。
5. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
6. 增加下载图片功能。
7. 增加OCR识别元素功能使用此功能需首先自行安装Tesseract库[https://blog.csdn.net/u010454030/article/details/80515501](https://blog.csdn.net/u010454030/article/details/80515501)
8. 可直接提取对元素执行JavaScript代码后的返回值实现如正则表达式获得元素背景颜色等功能。
9. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/c0b2bec1-2a97-4516-930e-1b310697212b" width=50%></img>
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/42cc0009-00d1-4c5c-af47-0fa6340fba80)
10. 大幅增加使用提示和说明使软件更易用如增加了iframe标签的处理方式说明各个选项的参数意义以及循环项XPath的修改说明等等
11. 执行命令时增加了如何用命令行执行任务的提示:[https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction](https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction)。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/a9e774df-e345-4d51-b7c9-2c4dac0ec624)
12. 增加并行多开模式。
13. 增加无头模式,即无浏览器界面模式配置。
14. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
15. 修复了条件分支没有无条件分支时会卡死的问题。
16. 修复了保存任务后会输入框卡死的问题。
17. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
18. 增加了鼠标移动到元素功能。
19. 找不到元素时会提示。
20. 修复网页滚动Bug。
21. 增加新增提取数据字段操作。
22. 任务名称初始化为第一次进入页面的标题值。
23. 增加版本更新提示。
24. 应要求增加出品方信息。
25. 更新chrome版本为113。

View File

@ -1,88 +1,17 @@
Official Site: https://www.easyspider.net
Welcome to promote this software to other friends.
Welcome to promote this software to other friends and star our Github Repository!
This version is for Windows 10/Windows Server 2016 x64 and above.
If you want to use EasySpider on windows 7, please download the Windows x32 version of EasySpider. There is no version support for Windows Server 2012 and below. These systems require manual compilation for execution.
The software's open-source code repository on GitHub: https://github.com/NaiboWang/EasySpider
Official documentation can be found at: https://github.com/NaiboWang/EasySpider/wiki
Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp
The software is totally not trojan/virus! If mistaken by antivirus software such as Windows Defender as a virus, please recover it, or open "EasySpider.bat" to run our software instead.
Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders.
======Version Update Instructions======
Please see more new features for version greater than v0.3.2 at github release page: https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## Update Instruction
1. Selected child element operations can delete fields and unmark deleted fields in real-time in the browser.
2. Selecting child elements adds a selection mode that allows you to choose only the child elements that are present in all blocks or the child elements that are the same as the first selected block.
3. In the text input and webpage open options, you can use the extracted field value as a variable for text input, represented by Field["field_name"].
4. Files can be downloaded, such as PDF files.
5. Fixed a bug where the software could display a blank screen for about 10 seconds after opening, making it usable in intranets, darknets, and any local network.
6. Fixed a bug where the current page URL and title could not be extracted.
7. Fixed a bug where OCR recognition could fail to extract information.
8. Updated extraction logic to save locally every 10 records collected.
9. When modifying a task, the default anchor position is set to after the last operation in the task flow.
10. Updated Chrome version to 114.
-----v0.3.1-----
## Update Instruction
1. Advanced Operations:
- Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations.
- Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element.
2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
4. Added the functionality to extract the background image URL of elements, current page title, and current page URL.
5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
6. Added the functionality to download images.
7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
11. Added instructions on how to execute tasks from the command line.
12. Added headless mode configuration, allowing the software to run without a browser interface.
13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
15. Fixed the issue where the input box would freeze after saving a task.
16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
17. Added the functionality to move the mouse to an element.
18. Displays a prompt when an element cannot be found.
19. Fixed the webpage scrolling bug.
20. The task name is initialized with the value of the page title upon the first visit.
21. Added version update prompts.
22. Added the information of the publisher as requested.
23. Updated Chrome version to 113.

View File

@ -1,6 +1,10 @@
打开报错DiscardVirtualMemory...KERNEL32.dll说明如下
64位版本的易采集EasySpider只支持支持Windows 10/Windows Server 2016 x64及以上版本。
对于Windows 7任意版本包括x64和x32版本以及Windows 10 x32版本请下载Windows的32位版本使用。无任何版本支持Windows Server 2012及以下版本系统这些系统下需要自行编译运行。
If you open the software and see an error like: DiscardVirtualMemory...KERNEL32.dll, the reason is:
This 64-bit version of EasySpider is for Windows 10/Windows Server 2016 x64 and above.
If you want to use EasySpider on windows 7, please download the Windows x32 version of EasySpider. There is no version support for Windows Server 2012 and below. These systems require manual compilation for execution.

View File

@ -1,4 +1,4 @@
欢迎将软件宣传给更多需要的朋友!
欢迎将软件宣传给更多需要的朋友和Star我们的Github仓库
官方网址: https://www.easyspider.cn
@ -6,100 +6,12 @@
Windows 7任意版本包括x64和x32版本以及Windows 10 x32版本请下载Windows的32位版本使用。无任何版本支持Windows Server 2012及以下版本系统这些系统下需要自行编译运行。
软件开源代码Github库地址https://github.com/NaiboWang/EasySpider
官方文档地址https://github.com/NaiboWang/EasySpider/wiki
视频教程https://www.bilibili.com/video/BV1th411A7ey/
这个软件绝对不是特洛伊木马/病毒如果被像Windows Defender这样的杀毒软件误认为是病毒请进行恢复或者打开“EasySpider.bat”来运行我们的软件。
可以从其他机器导入任务只需要把其他机器的tasks文件夹里的.json文件放入此目录的tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意两个文件夹里的.json文件只支持命名为大于0的数字。
======版本更新说明======
v0.3.2以上版本更新说明请查看Github Release Pages页面https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## 更新说明
1. 选中子元素操作可删除字段并在浏览器中实时取消标记被删除的字段。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/e016c832-6ff9-4814-b86c-38787e73aa30" width=50% />
2. 选中子元素增加选择模式,可以只选择所有块都有的子元素,或者所有块中和第一个选中的块相同的子元素。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/0082b11d-96bc-43f1-acdb-8280decb48b4" width=50% />
3. 输入文字和打开网页选项中可以使用最后一次提取到的字段值**作为变量**进行文字输入,用`Field["字段名"]`表示此变量。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/d81cd082-e01a-490e-85f7-9baac93510d8)
4. 可下载文件如PDF。
5. 修复打开后有可能会白屏10秒左右的Bug使得在内网暗网以及任意局域网都可以使用软件。
6. 修复提取当前页面URL和标题时可能提取不到的bug。
7. 修复OCR识别可能提取不到的bug。
8. 提取逻辑更新为每采集10条本地保存一次。
9. 修改任务时默认锚点位置为任务流程的最后操作后。
10. 更新Chrome版本为114。
-----v0.3.1-----
### 强烈建议大家观看新特性讲解视频
B站最新版特性视频已上传新视频非常有用推荐大家观看。
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)
[如何同时执行多个任务(并行多开)](https://www.bilibili.com/video/BV13c411G7LE/)
[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)
[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)
[如何对元素和网页截图及(无头模式)命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)
[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)
注意v0.3.1版本任务task文件夹内`.json`文件和之前所有版本均不兼容请重新设计v0.3.1版本任务。
## 更新说明
1. 自定义操作:
- 可以在任务流程中**执行自定义脚本**,包括在浏览器中**执行Javascript指令**以及**操作系统级别的脚本调用**并可**得到命令返回值并记录**,大大扩展了可操作空间。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/06e63a06-328d-4339-b40b-2d57c94cee66)
- 在每一个操作执行前和执行后都可以指定执行一段针对当前定位元素的JavaScript指令。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/dde64388-5668-40ff-951e-fb8f60655c49" height=50% width=50%>
2. **判断条件和循环条件**中同样增加了**执行自定义脚本**并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定自定义操作可以操作循环内元素。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/9dea0564-1a1c-487d-9fa4-427c5e284796)
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/5ce7cf50-e5c9-4714-a83b-9c65934e9c68" width=50%></img>
3. 可同时生成多种XPath供用户选择并**预装了XPath Helper扩展**供大家调试XPath。
4. 增加采集元素背景图片地址当前页面标题当前页面URL地址功能。
5. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
6. 增加下载图片功能。
7. 增加OCR识别元素功能使用此功能需首先自行安装Tesseract库[https://blog.csdn.net/u010454030/article/details/80515501](https://blog.csdn.net/u010454030/article/details/80515501)
8. 可直接提取对元素执行JavaScript代码后的返回值实现如正则表达式获得元素背景颜色等功能。
9. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/c0b2bec1-2a97-4516-930e-1b310697212b" width=50%></img>
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/42cc0009-00d1-4c5c-af47-0fa6340fba80)
10. 大幅增加使用提示和说明使软件更易用如增加了iframe标签的处理方式说明各个选项的参数意义以及循环项XPath的修改说明等等
11. 执行命令时增加了如何用命令行执行任务的提示:[https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction](https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction)。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/a9e774df-e345-4d51-b7c9-2c4dac0ec624)
12. 增加并行多开模式。
13. 增加无头模式,即无浏览器界面模式配置。
14. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
15. 修复了条件分支没有无条件分支时会卡死的问题。
16. 修复了保存任务后会输入框卡死的问题。
17. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
18. 增加了鼠标移动到元素功能。
19. 找不到元素时会提示。
20. 修复网页滚动Bug。
21. 增加新增提取数据字段操作。
22. 任务名称初始化为第一次进入页面的标题值。
23. 增加版本更新提示。
24. 应要求增加出品方信息。
25. 更新chrome版本为113。