mirror of
https://github.com/NaiboWang/EasySpider.git
synced 2025-04-19 18:59:52 +08:00
To open the EasySpider, please open your terminal, and then type: ./easy-spider.sh Then EasySpider will be opened, and don't close the terminal when running EasySpider. Official Site: https://github.com/NaiboWang/EasySpider Welcome to promote this software to other friends. This version is for Ubuntu 20.04, Debian, Deepin x64 and above. Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders. ======Version Update Instruction====== -----v0.3.1----- 1. Advanced Operations: - Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations. - Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element. 2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop. 3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging. 4. Added the functionality to extract the background image URL of elements, current page title, and current page URL. 5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode. 6. Added the functionality to download images. 7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html 8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements. 9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options. 10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more. 11. Added instructions on how to execute tasks from the command line. 12. Added headless mode configuration, allowing the software to run without a browser interface. 13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes. 14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching. 15. Fixed the issue where the input box would freeze after saving a task. 16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations. 17. Added the functionality to move the mouse to an element. 18. Displays a prompt when an element cannot be found. 19. Fixed the webpage scrolling bug. 20. The task name is initialized with the value of the page title upon the first visit. 21. Added version update prompts. 22. Added the information of the publisher as requested. 23. Updated Chrome version to 113.