Compare commits

...

191 Commits

Author SHA1 Message Date
Naibo Wang
fc5aa8368b Update Readme 2025-04-16 13:27:41 +08:00
Naibo Wang
793f028a00 Update Readme 2025-04-16 13:15:58 +08:00
Naibo Wang
ae22977143 Update Readme 2025-04-16 13:12:57 +08:00
Naibo Wang
541b3c13d2
Update Readme.md 2025-04-16 13:09:32 +08:00
Naibo Wang
a6192b730c Update Readme 2025-03-25 16:37:51 +08:00
Naibo_Mac_M2
d39218f5fd Add IPWO 2025-03-18 17:33:24 +08:00
Naibo_Mac_M2
a94c45b36d Add IPWO 2025-03-18 17:32:08 +08:00
Naibo_Mac_M2
0e8aba6b51 Add IPWO 2025-03-18 17:03:32 +08:00
Naibo Wang
e42ad07d80 Update Readme 2025-03-05 11:26:10 +08:00
Naibo Wang
2f6344d00b Update Readme 2025-03-05 11:10:57 +08:00
Naibo Wang
bfa6c0de76 Update Readme 2025-03-05 11:10:15 +08:00
Naibo Wang
b590cc22c5 Change License 2025-02-17 21:12:19 +08:00
Naibo Wang
d69adacbd1 Change License 2025-02-17 21:11:59 +08:00
Naibo Wang
15654da7eb Change License 2025-02-17 21:11:08 +08:00
Naibo Wang
967f5b8033 Change License 2025-02-17 21:10:38 +08:00
Naibo Wang
aa419ee845 Update Readme 2025-02-11 17:04:18 +08:00
Naibo Wang
f005e48700 Update Readme 2025-02-11 17:02:58 +08:00
Naibo Wang
4e96ed7d50 Merge branch 'master' of https://github.com/NaiboWang/EasySpider 2025-02-02 11:34:54 +08:00
Naibo Wang
e3fecc8926 Update Readme 2025-02-02 11:33:28 +08:00
naibo
119cb99711 Screenshots zoom to the maximum size under headless mode 2025-01-08 12:02:36 +08:00
naibo
f43bdd236d Screenshot folder 2025-01-08 11:44:02 +08:00
naibo
56f0847500 Parameter name change for loopExecute 2025-01-07 23:12:26 +08:00
naibo
0df6cebd18 Update ISSUE_TEMPLATE.md 2025-01-06 16:57:49 +08:00
naibo
4b42f6300c Update ISSUE_TEMPLATE.md 2025-01-06 15:15:28 +08:00
naibo
2cf33794f1 Fix bug: when cannot find elements, switch back to the original handle instead of one of the first two handles 2025-01-06 13:26:59 +08:00
naibo
9efd3b6efe Merge branch 'master' of https://github.com/NaiboWang/EasySpider 2025-01-06 01:31:01 +08:00
naibo
ad956be10d Fix bug for the URL shown in the task list 2025-01-06 01:30:48 +08:00
naibo
01de17d471 Update Readme 2025-01-05 03:55:33 +08:00
naibo
333dcd3ff4 New way to open MacOS program 2025-01-03 01:59:18 +08:00
Naibo_Mac_M2
555f02815c Add first_time_run script for MacOS 2025-01-02 16:01:17 +08:00
Naibo_Mac_M2
34ed41110a New script for copying all code files to the Code folder 2025-01-02 14:50:14 +08:00
Naibo_Mac_M2
32459b622d New script for copying all code files to the Code folder 2025-01-02 14:49:56 +08:00
naibo
02cd8599b0 Optimize reading 2024-12-31 03:16:34 +08:00
naibo
2feede55db New way to show/hide toolkits 2024-12-31 02:52:48 +08:00
Naibo Wang
33dda444d7 Specified User Folder 2024-12-31 01:54:51 +08:00
Naibo Wang
d7ccb22d01 Specified User Folder 2024-12-31 01:31:40 +08:00
naibo
f7a842eed6 Version 0.6.3 2024-12-31 00:14:32 +08:00
Naibo Wang
ea6fb049f5
Merge pull request #647 from NaiboWang/dependabot/npm_and_yarn/Extension/manifest_v3/nanoid-3.3.8
Bump nanoid from 3.3.7 to 3.3.8 in /Extension/manifest_v3
2024-12-31 00:02:21 +08:00
Naibo Wang
5216ffba82
Merge pull request #648 from NaiboWang/dependabot/npm_and_yarn/ElectronJS/multi-6bc014718a
Bump path-to-regexp and express in /ElectronJS
2024-12-31 00:02:10 +08:00
dependabot[bot]
4f0851e361
Bump path-to-regexp and express in /ElectronJS
Bumps [path-to-regexp](https://github.com/pillarjs/path-to-regexp) to 0.1.12 and updates ancestor dependency [express](https://github.com/expressjs/express). These dependencies need to be updated together.


Updates `path-to-regexp` from 0.1.10 to 0.1.12
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md)
- [Commits](https://github.com/pillarjs/path-to-regexp/compare/v0.1.10...v0.1.12)

Updates `express` from 4.21.0 to 4.21.2
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.2/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.21.0...4.21.2)

---
updated-dependencies:
- dependency-name: path-to-regexp
  dependency-type: indirect
- dependency-name: express
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-30 16:00:41 +00:00
dependabot[bot]
7bb9d5a374
Bump nanoid from 3.3.7 to 3.3.8 in /Extension/manifest_v3
Bumps [nanoid](https://github.com/ai/nanoid) from 3.3.7 to 3.3.8.
- [Release notes](https://github.com/ai/nanoid/releases)
- [Changelog](https://github.com/ai/nanoid/blob/main/CHANGELOG.md)
- [Commits](https://github.com/ai/nanoid/compare/3.3.7...3.3.8)

---
updated-dependencies:
- dependency-name: nanoid
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-30 16:00:38 +00:00
naibo
c56e87120d Version 0.6.3 2024-12-30 23:59:33 +08:00
naibo
5180f47b70 Add llm and fl beta code 2024-12-24 00:14:35 +08:00
Naibo_Mac_M2
b4d7ddf5cb Fix bug of document empty because of html.parsestring function 2024-12-11 23:17:21 +08:00
Naibo_Mac_M2
2031b09297 Update Readme 2024-11-25 17:34:16 +08:00
Naibo Wang
cc9a8082da
Update README.md 2024-11-25 17:32:09 +08:00
Naibo Wang
3daf5e8c21
Update README.md 2024-11-25 17:30:36 +08:00
Naibo Wang
8f5d7a3a52
Update main.js about execute.bat name 2024-11-25 17:23:45 +08:00
Naibo_Mac_M2
ee4a077630 Update Readme 2024-11-22 18:36:34 +08:00
naibo
3fe6f42366 Update Readme 2024-11-21 02:18:50 +08:00
naibo
eb3b578745 Add #1 Github Trending Badge 2024-11-12 17:08:47 +08:00
Naibo Wang
4ca5333f8b
Merge pull request #597 from touero/master
Create github issue template file to get details of config
2024-11-08 16:04:16 +11:00
Naibo Wang
b50d4eae3f
Update ISSUE_TEMPLATE.md 2024-11-08 13:03:32 +08:00
Ensong Wei
998a1ddb19
fix: supplementary English in issue template file 2024-11-07 22:05:12 +08:00
touero
07563bc750 fix: format line 2024-11-05 17:15:41 +08:00
touero
7b5ccf4a78 feat: add github issue template file 2024-11-05 17:08:35 +08:00
Naibo Wang
209235de8d
Update Readme.md 2024-11-04 14:24:32 +11:00
naibo
72529c0675 Show detailed JavaScript Error 2024-10-18 17:02:11 +08:00
naibo
081c49357e Update Readme 2024-10-18 16:43:06 +08:00
Naibo Wang
b611ddb6cd
Update README.md 2024-10-15 13:42:48 +08:00
Naibo Wang
abfac8c342
Update README.md 2024-10-15 13:41:45 +08:00
naibo
951a39fff6 Update Readme for building Electron Program 2024-10-15 05:42:06 +08:00
naibo
6d3d10f7a7 Update Readme for building Electron Program 2024-10-15 05:39:51 +08:00
naibo
46b1959564 Update Readme for building Electron Program 2024-10-15 05:33:15 +08:00
naibo
e14896d7cd Update Readme for building Electron Program 2024-10-15 05:15:33 +08:00
naibo
450dfa1a77 Add ElectronJS package speedup solution for Machines in China 2024-10-14 03:40:13 +08:00
naibo
3b907ba382 Add ElectronJS package speedup solution for Machines in China 2024-10-14 03:14:19 +08:00
naibo
70dd90470f Add ElectronJS package speedup solution for Machines in China 2024-10-14 02:49:01 +08:00
naibo
cc8bb70715 RollBack vue plugin version 2024-09-18 16:07:43 +08:00
Naibo Wang
c5f1696f11
Merge pull request #556 from NaiboWang/dependabot/npm_and_yarn/ElectronJS/multi-9423f4c335
Bump body-parser and express in /ElectronJS
2024-09-18 16:02:12 +08:00
Naibo Wang
b987408fc2
Merge pull request #557 from NaiboWang/dependabot/npm_and_yarn/ElectronJS/multi-cf87d80143
Bump send and express in /ElectronJS
2024-09-18 16:02:01 +08:00
Naibo Wang
391f0ea99d
Merge pull request #554 from NaiboWang/dependabot/npm_and_yarn/ElectronJS/multi-d66d039ac5
Bump serve-static and express in /ElectronJS
2024-09-18 16:01:51 +08:00
Naibo Wang
a94b67a1f6
Merge pull request #553 from NaiboWang/dependabot/npm_and_yarn/ElectronJS/multi-1d234c620e
Bump path-to-regexp and express in /ElectronJS
2024-09-18 16:01:43 +08:00
Naibo Wang
54ef89aef7
Merge pull request #552 from NaiboWang/dependabot/npm_and_yarn/Extension/manifest_v3/multi-033fad549c
Bump vite and @vitejs/plugin-vue in /Extension/manifest_v3
2024-09-18 16:01:32 +08:00
dependabot[bot]
22a3b45f13
Bump body-parser and express in /ElectronJS
Bumps [body-parser](https://github.com/expressjs/body-parser) to 1.20.3 and updates ancestor dependency [express](https://github.com/expressjs/express). These dependencies need to be updated together.


Updates `body-parser` from 1.20.2 to 1.20.3
- [Release notes](https://github.com/expressjs/body-parser/releases)
- [Changelog](https://github.com/expressjs/body-parser/blob/master/HISTORY.md)
- [Commits](https://github.com/expressjs/body-parser/compare/1.20.2...1.20.3)

Updates `express` from 4.19.2 to 4.21.0
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.0/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.19.2...4.21.0)

---
updated-dependencies:
- dependency-name: body-parser
  dependency-type: indirect
- dependency-name: express
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-17 22:29:05 +00:00
dependabot[bot]
44bfb69a36
Bump send and express in /ElectronJS
Bumps [send](https://github.com/pillarjs/send) to 0.19.0 and updates ancestor dependency [express](https://github.com/expressjs/express). These dependencies need to be updated together.


Updates `send` from 0.18.0 to 0.19.0
- [Release notes](https://github.com/pillarjs/send/releases)
- [Changelog](https://github.com/pillarjs/send/blob/master/HISTORY.md)
- [Commits](https://github.com/pillarjs/send/compare/0.18.0...0.19.0)

Updates `express` from 4.19.2 to 4.21.0
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.0/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.19.2...4.21.0)

---
updated-dependencies:
- dependency-name: send
  dependency-type: indirect
- dependency-name: express
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-17 22:29:05 +00:00
dependabot[bot]
5c1207649d
Bump serve-static and express in /ElectronJS
Bumps [serve-static](https://github.com/expressjs/serve-static) to 1.16.2 and updates ancestor dependency [express](https://github.com/expressjs/express). These dependencies need to be updated together.


Updates `serve-static` from 1.15.0 to 1.16.2
- [Release notes](https://github.com/expressjs/serve-static/releases)
- [Changelog](https://github.com/expressjs/serve-static/blob/v1.16.2/HISTORY.md)
- [Commits](https://github.com/expressjs/serve-static/compare/v1.15.0...v1.16.2)

Updates `express` from 4.19.2 to 4.21.0
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.0/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.19.2...4.21.0)

---
updated-dependencies:
- dependency-name: serve-static
  dependency-type: indirect
- dependency-name: express
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-17 22:29:00 +00:00
dependabot[bot]
c967db3dac
Bump path-to-regexp and express in /ElectronJS
Bumps [path-to-regexp](https://github.com/pillarjs/path-to-regexp) to 0.1.10 and updates ancestor dependency [express](https://github.com/expressjs/express). These dependencies need to be updated together.


Updates `path-to-regexp` from 0.1.7 to 0.1.10
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md)
- [Commits](https://github.com/pillarjs/path-to-regexp/compare/v0.1.7...v0.1.10)

Updates `express` from 4.19.2 to 4.21.0
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.0/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.19.2...4.21.0)

---
updated-dependencies:
- dependency-name: path-to-regexp
  dependency-type: indirect
- dependency-name: express
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-17 22:28:58 +00:00
dependabot[bot]
baec9c4298
Bump vite and @vitejs/plugin-vue in /Extension/manifest_v3
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) to 5.4.6 and updates ancestor dependency [@vitejs/plugin-vue](https://github.com/vitejs/vite-plugin-vue/tree/HEAD/packages/plugin-vue). These dependencies need to be updated together.


Updates `vite` from 2.9.18 to 5.4.6
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v5.4.6/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v5.4.6/packages/vite)

Updates `@vitejs/plugin-vue` from 1.10.2 to 5.1.3
- [Release notes](https://github.com/vitejs/vite-plugin-vue/releases)
- [Changelog](https://github.com/vitejs/vite-plugin-vue/blob/main/packages/plugin-vue/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite-plugin-vue/commits/plugin-vue@5.1.3/packages/plugin-vue)

---
updated-dependencies:
- dependency-name: vite
  dependency-type: indirect
- dependency-name: "@vitejs/plugin-vue"
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-17 20:30:57 +00:00
naibo
3e7abd6273 Update BrightData 2024-09-10 22:50:51 +08:00
naibo
32df9d5060 Update Poster 2024-09-09 20:35:59 +08:00
naibo
05c52f9dc8 Change 98IP to Koala-IP 2024-09-09 15:53:17 +08:00
naibo
7c4dafc002 Add First Sponsor 2024-08-27 23:52:23 +08:00
Naibo_Mac_M2
2afaf43162 MacOS Prompt to tasks 2024-08-24 17:42:31 +08:00
naibo
b79d92df1d Add 98IP 2024-08-22 23:06:23 +08:00
naibo
e4e1a1b095 Add 98IP 2024-08-22 22:19:49 +08:00
naibo
048dfb1f4b Add 98IP 2024-08-22 21:38:59 +08:00
naibo
1750481744 Add 98IP 2024-08-22 21:20:44 +08:00
naibo
3ead5e7312 Add 98IP 2024-08-22 19:16:18 +08:00
naibo
81957adb52 Add 98IP 2024-08-22 19:13:42 +08:00
naibo
dbad074565 Add 98IP 2024-08-22 13:54:35 +08:00
naibo
8342135b36 Add 98IP 2024-08-22 13:46:41 +08:00
naibo
e74915d94c Typo Fix 2024-08-21 11:57:08 +08:00
naibo
df62f710e3 Change F7 to F2 2024-08-21 11:40:21 +08:00
naibo
118241ba6d 修复任意文件读取漏洞 2024-08-10 17:32:01 +08:00
Naibo Wang
de47e8516a
Update Readme.md 2024-07-29 19:31:06 +08:00
naibo
d438e4b19d Add AD 2024-07-29 19:26:22 +08:00
naibo
0003041dab Add AD 2024-07-29 19:25:43 +08:00
naibo
ec3d9094bf Usage Example Section 2024-07-29 17:28:11 +08:00
naibo
629509a588 Change Download location 2024-07-29 17:26:52 +08:00
naibo
5e17563d11 New AD 2024-07-29 17:10:51 +08:00
naibo
5acafe7948 New AD 2024-07-29 16:54:51 +08:00
naibo
c25f80c175 New AD 2024-07-29 16:54:07 +08:00
naibo
ab88b33c74 New AD 2024-07-29 16:47:33 +08:00
Naibo Wang
7442e43be3
Linux64 new login shell 2024-07-13 22:30:08 +08:00
naibo
a0518412b0 New startup shell for sandbox 2024-07-13 22:27:29 +08:00
naibo
9ccb56aeae New startup shell for sandbox 2024-07-13 22:21:05 +08:00
naibo
3601ddb14d New startup shell for sandbox 2024-07-13 22:11:07 +08:00
Naibo Wang
728a5cb3ea
+x easy-spider.sh 2024-07-13 21:58:16 +08:00
naibo
46909e4866 New start shell for Ubuntu 24.04 2024-07-13 21:44:52 +08:00
naibo
072b6ad21e Bug fix for ... 2024-07-12 20:24:05 +08:00
naibo
bf320abf1a Update Complie and Debug Video Address 2024-07-12 19:39:46 +08:00
Naibo Wang
2d7c3c1323
Merge pull request #362 from touero/master
Dictionary's get replace catch exception in first three if case
2024-07-12 17:38:35 +08:00
Naibo Wang
c185e914e7
Update easyspider_executestage.py
skipCount from 1 to 0
2024-07-12 17:37:45 +08:00
naibo
7c0ab0e519 More XPaths Bug Fix 2024-07-12 17:12:45 +08:00
naibo
f50b08e9c4 MySQL Constant Bug Fix 2024-07-12 16:47:51 +08:00
Naibo Wang
ff7d82f4d0
Merge pull request #435 from NaiboWang/dependabot/npm_and_yarn/ElectronJS/ws-8.17.1
Bump ws from 8.14.2 to 8.17.1 in /ElectronJS
2024-06-19 10:48:45 +08:00
dependabot[bot]
944d968679
Bump ws from 8.14.2 to 8.17.1 in /ElectronJS
Bumps [ws](https://github.com/websockets/ws) from 8.14.2 to 8.17.1.
- [Release notes](https://github.com/websockets/ws/releases)
- [Commits](https://github.com/websockets/ws/compare/8.14.2...8.17.1)

---
updated-dependencies:
- dependency-name: ws
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-18 20:10:49 +00:00
Naibo Wang
9f1f152680
Merge pull request #431 from NaiboWang/dependabot/npm_and_yarn/ElectronJS/braces-3.0.3
Bump braces from 3.0.2 to 3.0.3 in /ElectronJS
2024-06-17 17:24:31 +08:00
dependabot[bot]
18321e4fee
Bump braces from 3.0.2 to 3.0.3 in /ElectronJS
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3.
- [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md)
- [Commits](https://github.com/micromatch/braces/compare/3.0.2...3.0.3)

---
updated-dependencies:
- dependency-name: braces
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-17 09:23:33 +00:00
Naibo Wang
b79bda9001
Merge pull request #425 from NaiboWang/dependabot/npm_and_yarn/Extension/manifest_v3/multi-6365d02c7e
Bump @grpc/grpc-js and firebase in /Extension/manifest_v3
2024-06-17 17:22:28 +08:00
dependabot[bot]
80bc210ff1
Bump @grpc/grpc-js and firebase in /Extension/manifest_v3
Bumps [@grpc/grpc-js](https://github.com/grpc/grpc-node) to 1.9.15 and updates ancestor dependency [firebase](https://github.com/firebase/firebase-js-sdk). These dependencies need to be updated together.


Updates `@grpc/grpc-js` from 1.7.3 to 1.9.15
- [Release notes](https://github.com/grpc/grpc-node/releases)
- [Commits](https://github.com/grpc/grpc-node/compare/@grpc/grpc-js@1.7.3...@grpc/grpc-js@1.9.15)

Updates `firebase` from 9.23.0 to 10.12.2
- [Release notes](https://github.com/firebase/firebase-js-sdk/releases)
- [Changelog](https://github.com/firebase/firebase-js-sdk/blob/master/CHANGELOG.md)
- [Commits](https://github.com/firebase/firebase-js-sdk/compare/firebase@9.23.0...firebase@10.12.2)

---
updated-dependencies:
- dependency-name: "@grpc/grpc-js"
  dependency-type: indirect
- dependency-name: firebase
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-11 00:25:40 +00:00
Naibo Wang
dbf7681518
Merge pull request #392 from NaiboWang/dependabot/pip/ExecuteStage/pymysql-1.1.1
Bump pymysql from 1.1.0 to 1.1.1 in /ExecuteStage
2024-05-22 18:22:18 +08:00
Naibo Wang
f18616e3ff
Merge pull request #391 from NaiboWang/dependabot/pip/ExecuteStage/requests-2.32.0
Bump requests from 2.31.0 to 2.32.0 in /ExecuteStage
2024-05-22 18:21:43 +08:00
dependabot[bot]
911ea02f3f
---
updated-dependencies:
- dependency-name: pymysql
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-21 19:54:31 +00:00
dependabot[bot]
22f86cf0f2
---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-21 05:25:55 +00:00
naibo
0285246337 Merge branch 'master' of https://github.com/NaiboWang/EasySpider 2024-05-16 02:12:46 +08:00
naibo
4fdce9a915 Update Stealth.min.js 2024-05-16 02:12:34 +08:00
touero
15aab7c0c5 fix: remove unnecessary variables make it more concise 2024-05-07 21:50:30 +08:00
touero
3ec64d2623 [fix] the rest if case from try catch turn to dict' get 2024-05-04 11:30:14 +08:00
touero
5582205204 fix: dictionary's get replace catch exception in first three if case 2024-04-28 00:04:33 +08:00
Naibo Wang
c272e5da86
Merge pull request #360 from touero/master
Fixing get data before if case in preprocess event loop
2024-04-27 22:39:09 +08:00
touero
52702d4eb3 fix: getting data before if case in preprocess event loop 2024-04-27 00:18:30 +08:00
Naibo Wang
a8e77b5e15
Merge pull request #359 from touero/master
Define constants using enumeration classes
2024-04-26 10:40:34 +08:00
touero
606de75577 fix: format string and using enum class defined constants Ⅱ 2024-04-25 23:58:29 +08:00
touero
76fd4bad55 fix: format string and using enum class defined constants 2024-04-25 23:44:44 +08:00
Naibo_Mac_M2
2860bc7b8c Fix wrong word 2024-04-25 22:09:22 +08:00
Naibo_Mac_M2
ebe8a56a6f Bug fix for Field[] 2024-04-25 21:56:23 +08:00
Naibo Wang
e086de2852
Merge pull request #356 from touero/master
Getting data by dictionary's 'get' and remove not necessary catching Exception
2024-04-25 00:20:29 +08:00
touero
c2d16e13c2 fix: get data by dictionary's 'get' and remove not necessary catching Exception 2024-04-24 23:57:47 +08:00
naibo
e43318f57a Bug fix for Excel Upload 2024-04-24 23:31:28 +08:00
naibo
7849707486 Bug fix for Local Server 2024-04-24 23:23:01 +08:00
naibo
b1632459ef Bug fix for OS Version 2024-04-24 23:12:03 +08:00
naibo
a2bd496e8e Change windows to Windows 2024-04-24 22:04:16 +08:00
naibo
9ed61c4f50 Remove force headless 2024-04-24 02:20:05 +08:00
naibo
c8b71835de Update Docker 2024-04-23 23:55:42 +08:00
naibo
0afa159c98 Update Only Server 2024-04-23 23:22:54 +08:00
naibo
3ba748b101 Update Readme 2024-04-23 22:26:31 +08:00
Naibo Wang
818d3e0ddc Docker Support 2024-04-23 22:19:49 +08:00
Naibo Wang
ad568af5f3 Docker Support 2024-04-23 21:55:45 +08:00
naibo
b2a6fd6b6b win32 2024-04-22 19:12:53 +08:00
Naibo_Mac_M2
960cf74de1 MacOS 2024-04-22 08:24:14 +08:00
Naibo Wang
fce97dec61 Linux 2024-04-22 07:44:19 +08:00
Naibo Wang
3ffd34d0fd Linux 2024-04-22 07:13:54 +08:00
naibo
1b6661afb8 V0.6.2 2024-04-22 06:33:23 +08:00
Naibo Wang
3350c50600
Merge pull request #339 from NaiboWang/dependabot/npm_and_yarn/Extension/manifest_v3/vite-2.9.18
Bump vite from 2.9.17 to 2.9.18 in /Extension/manifest_v3
2024-04-04 01:53:33 +08:00
dependabot[bot]
6b14afcf00
Bump vite from 2.9.17 to 2.9.18 in /Extension/manifest_v3
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 2.9.17 to 2.9.18.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v2.9.18/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v2.9.18/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-03 17:52:27 +00:00
Naibo Wang
73c9a3a647
Merge pull request #333 from NaiboWang/dependabot/npm_and_yarn/ElectronJS/express-4.19.2
Bump express from 4.18.2 to 4.19.2 in /ElectronJS
2024-03-30 21:04:06 +08:00
dependabot[bot]
93a49d8c58
Bump express from 4.18.2 to 4.19.2 in /ElectronJS
Bumps [express](https://github.com/expressjs/express) from 4.18.2 to 4.19.2.
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/master/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.18.2...4.19.2)

---
updated-dependencies:
- dependency-name: express
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-29 04:44:45 +00:00
naibo
5cf81ce5d5 Update Readme 2024-03-09 14:06:43 +08:00
naibo
7d9ae708b2 Update Readme 2024-02-29 16:37:54 +08:00
NaiboWang-Alienware
590c9907a4 Update Readme 2024-02-22 19:56:27 +08:00
NaiboWang-Alienware
4c85fdbf5d Update Readme 2024-02-22 16:51:29 +08:00
NaiboWang-Alienware
8f4bd8709c Remove Promote 2024-02-21 18:11:30 +08:00
NaiboWang-Alienware
38d329fe27 Add BrightData 2024-02-21 18:10:31 +08:00
NaiboWang-Alienware
10b0210983 Add BrightData 2024-02-21 18:08:10 +08:00
Naibo Wang
ea6f17477d
Merge pull request #305 from NaiboWang/dependabot/npm_and_yarn/ElectronJS/ip-2.0.1
Bump ip from 2.0.0 to 2.0.1 in /ElectronJS
2024-02-21 17:25:32 +08:00
dependabot[bot]
a43189d4cd
Bump ip from 2.0.0 to 2.0.1 in /ElectronJS
Bumps [ip](https://github.com/indutny/node-ip) from 2.0.0 to 2.0.1.
- [Commits](https://github.com/indutny/node-ip/compare/v2.0.0...v2.0.1)

---
updated-dependencies:
- dependency-name: ip
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-20 23:56:25 +00:00
naibo
3c1b4a1019 Update Readme 2024-01-27 00:25:57 +08:00
naibo
fe2a3ee87a Update Readme 2024-01-27 00:06:11 +08:00
naibo
d1b7b247b8 Update Pillow 2024-01-23 14:46:47 +08:00
Naibo Wang
49241abf02
Merge pull request #281 from NaiboWang/dependabot/npm_and_yarn/Extension/manifest_v3/vite-2.9.17
Bump vite from 2.9.16 to 2.9.17 in /Extension/manifest_v3
2024-01-21 23:34:12 +08:00
dependabot[bot]
cb47353da6
Bump vite from 2.9.16 to 2.9.17 in /Extension/manifest_v3
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 2.9.16 to 2.9.17.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v2.9.17/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v2.9.17/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-19 22:55:40 +00:00
naibo
a971b52d38 Auto Rename Download Files 2024-01-05 22:12:53 +08:00
naibo
a365783e41 Auto Rename Download Files 2024-01-05 17:52:50 +08:00
naibo
ab47ed2be0 Automatically Split Line 2024-01-03 16:23:51 +08:00
naibo
838616e131 Optimize Code 2023-12-30 23:49:23 +08:00
naibo
7d247d68ec Logic for handles of text list 2023-12-30 23:10:31 +08:00
naibo
c5a4b11dfb Two new custom operations 2023-12-30 16:05:29 +08:00
naibo
2a241010a9 可以试运行提取数据操作的绝大多数操作 2023-12-30 00:30:54 +08:00
naibo
499c3f21b6 试运行JS增加显示返回值功能,且可以试运行提取数据操作中的JS 2023-12-29 23:28:19 +08:00
naibo
4f858ffee1 New Download Location 2023-12-28 20:01:06 +08:00
naibo
580af6faaa Search Button 2023-12-28 00:46:14 +08:00
naibo
4e53596680 Format Code 2023-12-27 20:49:24 +08:00
naibo
0ded0fb67c Search and Sort for task list 2023-12-27 20:20:34 +08:00
naibo
66918e347c Update Readmes for pub 2023-12-27 17:41:28 +08:00
naibo
cff8ae5b93 Windows 7 Alert! 2023-12-27 17:32:49 +08:00
naibo
8ea69c9d0f 命令行默认local模式 2023-12-27 13:31:58 +08:00
naibo
0f5c6a89bf 增加最终XPath提示 2023-12-26 21:06:29 +08:00
naibo
c8d6017190 Update Instruction of Trail Run 2023-12-25 12:30:09 +08:00
naibo
def19ba4bf Update Readme 2023-12-24 12:50:30 +08:00
149 changed files with 20496 additions and 3428 deletions

25
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,25 @@
## 版本信息 Version Information
**EasySpider版本 EasySpider Version**:
**系统版本(架构) System Version (Architecture)**:
**浏览器版本 Browser Version**:
**安装方式 Installation method**:
## 问题描述 Issue Description
## 如何复现 Steps to Reproduce
## 示例任务文件 Example Task File
Windows和Linux版本的软件设计的任务文件在软件目录下的`tasks`文件夹中,文件名为任务列表中`任务的ID号.json`MacOS系统的任务文件目录请运行下面的命令打开tasks文件夹
The task file designed for the Windows and Linux versions of the software is in the `tasks` folder in the software directory, and the file name is `the ID number of the task.json` in the task list; the task file directory of the MacOS system is opened by running the following command:
```bash
cd /Users/$(whoami)/Library/Application\ Support/EasySpider/tasks
open .
```
请将任务文件直接以文件的方式粘贴到这里,不要截图和打开复制里面的内容。
Please paste the task file directly as a file here, do not take screenshots and open to copy the content.

4
.gitignore vendored
View File

@ -13,4 +13,6 @@ old_code/
*.mp4
*.tar.xz
*.zip
Data/
Data/
**/__pycache__/
**/.venv/

View File

@ -1,10 +1,10 @@
EasySpider_MacOS/easyspider_executestage
EasySpider_MacOS/easyspider_executestage_full
EasySpider_Linux64_x64/user_data
EasySpider_windows_x32/user_data
EasySpider_Windows_x32/user_data
EasySpider
EasySpider.app/
EasySpider_windows_x64/user_data
EasySpider_Windows_x64/user_data
*.tmp
*.tar.gz
*.7z*

View File

@ -5,9 +5,11 @@ import copy
import platform
import shutil
import string
import threading
# import undetected_chromedriver as uc
from utils import detect_optimizable, download_image, extract_text_from_html, get_output_code, isnotnull, lowercase_tags_in_xpath, myMySQL, new_line, \
on_press_creator, on_release_creator, readCode, replace_field_values, send_email, split_text_by_lines, write_to_csv, write_to_excel, write_to_json
on_press_creator, on_release_creator, readCode, rename_downloaded_file, replace_field_values, send_email, split_text_by_lines, write_to_csv, write_to_excel, write_to_json
from constants import WriteMode, DataWriteMode, GraphOption
from myChrome import MyChrome
from threading import Thread, Event
from PIL import Image
@ -30,7 +32,6 @@ from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from pynput.keyboard import Key, Listener
from datetime import datetime
import io # 遇到错误退出时应执行的代码
import json
@ -75,10 +76,7 @@ class BrowserThread(Thread):
def __init__(self, browser_t, id, service, version, event, saveName, config, option):
Thread.__init__(self)
self.logs = io.StringIO()
try:
self.log = bool(service["recordLog"])
except:
self.log = True
self.log = bool(service.get("recordLog", True))
self.browser = browser_t
self.option = option
self.config = config
@ -86,22 +84,13 @@ class BrowserThread(Thread):
self.totalSteps = 0
self.id = id
self.event = event
try:
self.saveName = service["saveName"] # 保存文件的名字
except:
now = datetime.now()
# 将时间格式化为精确到秒的字符串
self.saveName = now.strftime("%Y_%m_%d_%H_%M_%S")
now = datetime.now()
self.saveName = service.get("saveName", now.strftime("%Y_%m_%d_%H_%M_%S")) # 保存文件的名字
self.OUTPUT = ""
self.SAVED = False
self.BREAK = False
self.CONTINUE = False
try:
maximizeWindow = service["maximizeWindow"]
except:
maximizeWindow = 0
if maximizeWindow == 1:
self.browser.maximize_window()
self.browser.maximize_window() if service.get("maximizeWindow") == 1 else ...
# 名称设定
if saveName != "": # 命令行覆盖保存名称
self.saveName = saveName # 保存文件的名字
@ -112,19 +101,23 @@ class BrowserThread(Thread):
self.print_and_log("Save Name for task ID", id, "is:", self.saveName)
if not os.path.exists("Data/Task_" + str(id)):
os.mkdir("Data/Task_" + str(id))
if not os.path.exists("Data/Task_" + str(id) + "/" + self.saveName):
os.mkdir("Data/Task_" + str(id) + "/" +
self.saveName) # 创建保存文件夹用来保存截图
self.downloadFolder = "Data/Task_" + str(id) + "/" + self.saveName
if not os.path.exists(self.downloadFolder):
os.mkdir(self.downloadFolder) # 创建保存文件夹用来保存截图和文件
if not os.path.exists(self.downloadFolder + "/files"):
os.mkdir(self.downloadFolder + "/files")
if not os.path.exists(self.downloadFolder + "/images"):
os.mkdir(self.downloadFolder + "/images")
self.getDataStep = 0
self.startSteps = 0
try:
startFromExit = service["startFromExit"] # 从上次退出的步骤开始
if startFromExit == 1:
if service.get("startFromExit", 0) == 1:
with open("Data/Task_" + str(self.id) + "/" + self.saveName + '_steps.txt', 'r',
encoding='utf-8-sig') as file_obj:
self.startSteps = int(file_obj.read()) # 读取已执行步数
except:
pass
except Exception as e:
self.print_and_log(f"读取steps.txt失败原因{str(e)}")
if self.startSteps != 0:
self.print_and_log("此模式下任务ID", self.id, "将从上次退出的步骤开始执行,之前已采集条数为",
self.startSteps, "条。")
@ -132,7 +125,7 @@ class BrowserThread(Thread):
"will start from the last step, before we already collected", self.startSteps, " items.")
else:
self.print_and_log("此模式下任务ID", self.id,
"将从头F开始执行,如果需要从上次退出的步骤开始执行,请在保存任务时设置是否从上次保存位置开始执行为“是”。")
"将从头开始执行,如果需要从上次退出的步骤开始执行,请在保存任务时设置是否从上次保存位置开始执行为“是”。")
self.print_and_log("In this mode, task ID", self.id,
"will start from the beginning, if you want to start from the last step, please set the option 'start from the last step' to 'yes' when saving the task.")
stealth_path = driver_path[:driver_path.find(
@ -140,78 +133,83 @@ class BrowserThread(Thread):
with open(stealth_path, 'r') as f:
js = f.read()
self.print_and_log("Loading stealth.min.js")
self.browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {
'source': js}) # TMALL 反扒
self.browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {'source': js}) # TMALL 反扒
self.browser.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
WebDriverWait(self.browser, 10)
self.browser.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
path = os.path.join(os.path.abspath("./"), "Data", "Task_" + str(self.id))
path = os.path.join(os.path.abspath("./"), "Data", "Task_" + str(self.id), self.saveName, "files")
self.paramss = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': path}}
self.browser.execute("send_command", self.paramss) # 下载地址改变
self.browser.execute("send_command", self.paramss) # 下载目录改变
self.monitor_event = threading.Event()
self.monitor_thread = threading.Thread(target=rename_downloaded_file, args=(path, self.monitor_event)) #path后面的逗号不能省略是元组固定写法
self.monitor_thread.start()
# self.browser.get('about:blank')
self.procedure = service["graph"] # 程序执行流程
try:
self.maxViewLength = service["maxViewLength"] # 最大显示长度
except:
self.maxViewLength = 15
try:
self.outputFormat = service["outputFormat"] # 输出格式
except:
self.outputFormat = "csv"
try:
self.task_version = service["version"] # 任务版本
if service["version"] >= "0.3.1": # 0.3.1及以上版本以上的EasySpider兼容从0.3.1版本开始的所有版本
pass
else: # 0.3.1以下版本的EasySpider不兼容0.3.1及以上版本的EasySpider
if service["version"] != version:
self.print_and_log("版本不一致,请使用" +
service["version"] + "版本的EasySpider运行该任务")
self.print_and_log("Version not match, please use EasySpider " +
service["version"] + " to run this task!")
self.browser.quit()
sys.exit()
except: # 0.2.0版本没有version字段所以直接退出
self.maxViewLength = service.get("maxViewLength", 15) # 最大显示长度
self.outputFormat = service.get("outputFormat", "csv") # 输出格式
self.save_threshold = service.get("saveThreshold", 10) # 保存最低阈值
self.dataWriteMode = service.get("dataWriteMode", DataWriteMode.Append.value) # 数据写入模式1为追加2为覆盖3为重命名文件
self.task_version = service.get("version", "") # 任务版本
if not self.task_version:
self.print_and_log("版本不一致请使用v0.2.0版本的EasySpider运行该任务")
self.print_and_log(
"Version not match, please use EasySpider v0.2.0 to run this task!")
self.print_and_log("Version not match, please use EasySpider v0.2.0 to run this task!")
self.browser.quit()
sys.exit()
try:
self.save_threshold = service["saveThreshold"] # 保存最低阈值
except:
self.save_threshold = 10
try:
self.links = list(
filter(isnotnull, service["links"].split("\n"))) # 要执行的link的列表
except:
if self.task_version >= "0.3.1": # 0.3.1及以上版本以上的EasySpider兼容从0.3.1版本开始的所有版本
pass
elif self.task_version != version: # 0.3.1以下版本的EasySpider不兼容0.3.1及以上版本的EasySpider
self.print_and_log(f"版本不一致,请使用{self.task_version}版本的EasySpider运行该任务")
self.print_and_log(f"Version not match, please use EasySpider {self.task_version} to run this task!")
self.browser.quit()
sys.exit()
service_links = service.get("links")
if service_links:
self.links = list(filter(isnotnull, service_links.split("\n"))) # 要执行的link的列表
else:
self.links = list(filter(isnotnull, service["url"])) # 要执行的link
self.OUTPUT = [] # 采集的数据
try:
self.dataWriteMode = service["dataWriteMode"] # 数据写入模式1为追加2为覆盖
except:
self.dataWriteMode = 1
if self.outputFormat == "csv" or self.outputFormat == "txt" or self.outputFormat == "xlsx" or self.outputFormat == "json":
if self.dataWriteMode == 2 and os.path.exists("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat):
os.remove("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat)
self.writeMode = 1 # 写入模式0为新建1为追加
if self.outputFormat == "csv" or self.outputFormat == "txt" or self.outputFormat == "xlsx":
if not os.path.exists("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat):
if self.outputFormat in ["csv", "txt", "xlsx", "json"]:
if os.path.exists("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat):
if self.dataWriteMode == DataWriteMode.Cover.value:
os.remove("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat)
elif self.dataWriteMode == DataWriteMode.Rename.value:
i = 2
while os.path.exists("Data/Task_" + str(self.id) + "/" + self.saveName + '_' + str(i) + '.' + self.outputFormat):
i = i + 1
self.saveName = self.saveName + '_' + str(i)
self.print_and_log("文件已存在,已重命名为", self.saveName)
self.writeMode = WriteMode.Create.value # 写入模式0为新建1为追加
if self.outputFormat in ['csv', 'txt', 'xlsx']:
if not os.path.exists(f"Data/Task_{str(self.id)}/{self.saveName}.{self.outputFormat}"):
self.OUTPUT.append([]) # 添加表头
self.writeMode = 0
self.writeMode = WriteMode.Create.value
elif self.outputFormat == "json":
self.writeMode = 3 # JSON模式无需判断是否存在文件
self.writeMode = WriteMode.Json.value # JSON模式无需判断是否存在文件
elif self.outputFormat == "mysql":
self.mysql = myMySQL(config["mysql_config_path"])
self.mysql.create_table(self.saveName, service["outputParameters"], remove_if_exists=self.dataWriteMode == 2)
self.writeMode = 2
if self.writeMode == 0:
self.mysql.create_table(self.saveName, service["outputParameters"],
remove_if_exists=self.dataWriteMode == DataWriteMode.Cover.value)
self.writeMode = WriteMode.MySQL.value # MySQL模式
if self.writeMode == WriteMode.Create.value:
self.print_and_log("新建模式|Create Mode")
elif self.writeMode == 1:
elif self.writeMode == WriteMode.Append.value:
self.print_and_log("追加模式|Append Mode")
elif self.writeMode == 2:
elif self.writeMode == WriteMode.MySQL.value:
self.print_and_log("MySQL模式|MySQL Mode")
elif self.writeMode == 3:
elif self.writeMode == WriteMode.Json.value:
self.print_and_log("JSON模式|JSON Mode")
self.containJudge = service["containJudge"] # 是否含有判断语句
self.outputParameters = {}
self.service = service
@ -224,191 +222,140 @@ class BrowserThread(Thread):
if param["name"] not in self.outputParameters.keys():
self.outputParameters[param["name"]] = ""
self.dataNotFoundKeys[param["name"]] = False
try:
self.outputParametersTypes.append(param["type"])
except:
self.outputParametersTypes.append("text")
try:
self.outputParametersRecord.append(
bool(param["recordASField"]))
except:
self.outputParametersRecord.append(True)
self.outputParametersTypes.append(param.get("type", "text"))
self.outputParametersRecord.append(bool(param.get("recordASField", True)))
# 文件叠加的时候不添加表头
if self.outputFormat == "csv" or self.outputFormat == "txt" or self.outputFormat == "xlsx":
if self.writeMode == 0:
self.OUTPUT[0].append(param["name"])
if self.outputFormat in ["csv", "txt", "xlsx"] and self.writeMode == WriteMode.Create.value:
self.OUTPUT[0].append(param["name"])
self.urlId = 0 # 全局记录变量
self.preprocess() # 预处理,优化提取数据流程
try:
self.inputExcel = service["inputExcel"] # 输入Excel
except:
self.inputExcel = ""
self.inputExcel = service.get("inputExcel", "") # 输入Excel
self.readFromExcel() # 读取Excel获得参数值
# 检测如果没有复杂的操作,优化提取数据流程
def preprocess(self):
for node in self.procedure:
try:
iframe = node["parameters"]["iframe"]
except:
node["parameters"]["iframe"] = False
for index_node, node in enumerate(self.procedure):
parameters: dict = node["parameters"]
iframe = parameters.get('iframe')
option = node["option"]
try:
node["parameters"]["xpath"] = lowercase_tags_in_xpath(
node["parameters"]["xpath"])
except:
pass
try:
node["parameters"]["waitElementIframeIndex"] = int(
node["parameters"]["waitElementIframeIndex"])
except:
node["parameters"]["waitElement"] = ""
node["parameters"]["waitElementTime"] = 10
node["parameters"]["waitElementIframeIndex"] = 0
if node["option"] == 1: # 打开网页操作
try:
cookies = node["parameters"]["cookies"]
except:
node["parameters"]["cookies"] = ""
elif node["option"] == 2: # 点击操作
try:
alertHandleType = node["parameters"]["alertHandleType"]
except:
node["parameters"]["alertHandleType"] = 0
if node["parameters"]["useLoop"]:
parameters["iframe"] = False if not iframe else parameters.get('iframe', False)
if parameters.get("xpath"):
parameters["xpath"] = lowercase_tags_in_xpath(parameters["xpath"])
if parameters.get("waitElementIframeIndex"):
parameters["waitElementIframeIndex"] = int(parameters["waitElementIframeIndex"])
else:
parameters["waitElement"] = ""
parameters["waitElementTime"] = 10
parameters["waitElementIframeIndex"] = 0
if option == GraphOption.Get.value: # 打开网页操作
parameters["cookies"] = parameters.get("cookies", "")
elif option == GraphOption.Click.value: # 点击操作
parameters["alertHandleType"] = parameters.get("alertHandleType", 0)
if parameters.get("useLoop"):
if self.task_version <= "0.3.5":
# 0.3.5及以下版本的EasySpider下的循环点击不支持相对XPath
node["parameters"]["xpath"] = ""
self.print_and_log("您的任务版本号为" + self.task_version +
"循环点击不支持相对XPath写法已自动切换为纯循环的XPath")
elif node["option"] == 3: # 提取数据操作
node["parameters"]["recordASField"] = 0
try:
params = node["parameters"]["params"]
except:
node["parameters"]["params"] = node["parameters"]["paras"] # 兼容0.5.0及以下版本的EasySpider
params = node["parameters"]["params"]
try:
clear = node["parameters"]["clear"]
except:
node["parameters"]["clear"] = 0
try:
newLine = node["parameters"]["newLine"]
except:
node["parameters"]["newLine"] = 1
parameters["xpath"] = ""
self.print_and_log(f"您的任务版本号为{self.task_version}循环点击不支持相对XPath写法已自动切换为纯循环的XPath")
elif option == GraphOption.Extract.value: # 提取数据操作
parameters["recordASField"] = 0
parameters["params"] = parameters.get("params", parameters.get("paras")) # 兼容0.5.0及以下版本的EasySpider
parameters["clear"] = parameters.get("clear", 0)
parameters["newLine"] = parameters.get("newLine", 1)
params = parameters["params"]
for param in params:
try:
iframe = param["iframe"]
except:
param["iframe"] = False
try:
param["iframe"] = param.get("iframe", False)
if param.get("relativeXPath"):
param["relativeXPath"] = lowercase_tags_in_xpath(param["relativeXPath"])
except:
pass
try:
node["parameters"]["recordASField"] = param["recordASField"]
except:
node["parameters"]["recordASField"] = 1
try:
splitLine = int(param["splitLine"])
except:
param["splitLine"] = 0
if param["contentType"] == 8:
self.print_and_log(
"默认的ddddocr识别功能如果觉得不好用可以自行修改源码get_content函数->contentType == 8的位置换成自己想要的OCR模型然后自己编译运行或者可以先设置采集内容类型为“元素截图”把图片保存下来然后用自定义操作调用自己写的程序程序的功能是读取这个最新生成的图片然后用好用的模型如PaddleOCR把图片识别出来然后把返回值返回给程序作为参数输出。")
self.print_and_log(
"If you think the default ddddocr function is not good enough, you can modify the source code get_content function -> contentType == 8 position to your own OCR model and then compile and run it; or you can first set the content type of the crawler to \"Element Screenshot\" to save the picture, and then call your own program with custom operations. The function of the program is to read the latest generated picture, then use a good model, such as PaddleOCR to recognize the picture, and then return the return value as a parameter output to the program.")
parameters["recordASField"] = param.get("recordASField", 1)
param["splitLine"] = 0 if not param.get("splitLine") else param.get("splitLine")
if param.get("contentType") == 8:
self.print_and_log("默认的ddddocr识别功能如果觉得不好用可以自行修改源码get_content函数->contentType =="
"8的位置换成自己想要的OCR模型然后自己编译运行或者可以先设置采集内容类型为“元素截图”把图片"
"保存下来,然后用自定义操作调用自己写的程序,程序的功能是读取这个最新生成的图片,然后用好用"
"的模型如PaddleOCR把图片识别出来然后把返回值返回给程序作为参数输出。")
self.print_and_log("If you think the default ddddocr function is not good enough, you can "
"modify the source code get_content function -> contentType == 8 position "
"to your own OCR model and then compile and run it; or you can first set "
"the content type of the crawler to \"Element Screenshot\" to save the "
"picture, and then call your own program with custom operations. The "
"function of the program is to read the latest generated picture, then use "
"a good model, such as PaddleOCR to recognize the picture, and then return "
"the return value as a parameter output to the program.")
param["optimizable"] = detect_optimizable(param)
elif node["option"] == 4: # 输入文字
try:
index = node["parameters"]["index"] # 索引值
except:
node["parameters"]["index"] = 0
elif node["option"] == 5: # 自定义操作
try:
clear = node["parameters"]["clear"]
except:
node["parameters"]["clear"] = 0
try:
newLine = node["parameters"]["newLine"]
except:
node["parameters"]["newLine"] = 1
elif node["option"] == 7: # 移动到元素
if node["parameters"]["useLoop"]:
if self.task_version <= "0.3.5":
# 0.3.5及以下版本的EasySpider下的循环点击不支持相对XPath
node["parameters"]["xpath"] = ""
self.print_and_log("您的任务版本号为" + self.task_version +
"循环点击不支持相对XPath写法已自动切换为纯循环的XPath")
elif node["option"] == 8: # 循环操作
try:
exitElement = node["parameters"]["exitElement"]
if exitElement == "":
node["parameters"]["exitElement"] = "//body"
except:
node["parameters"]["exitElement"] = "//body"
node["parameters"]["quickExtractable"] = False # 是否可以快速提取
try:
skipCount = node["parameters"]["skipCount"]
except:
node["parameters"]["skipCount"] = 0
elif option == GraphOption.Input.value: # 输入文字
parameters['index'] = parameters.get('index', 0)
elif option == GraphOption.Custom.value: # 自定义操作
parameters['clear'] = parameters.get('clear', 0)
parameters['newLine'] = parameters.get('newLine', 1)
elif option == GraphOption.Move.value: # 移动到元素
if parameters.get('useLoop'):
if self.task_version <= "0.3.5": # 0.3.5及以下版本的EasySpider下的循环点击不支持相对XPath
parameters["xpath"] = ""
self.print_and_log(f"您的任务版本号为{self.task_version}循环点击不支持相对XPath写法已自动切换为纯循环的XPath")
elif option == GraphOption.Loop.value: # 循环操作
parameters['exitElement'] = "//body" if not parameters.get('exitElement') or parameters.get('exitElement') == "" else parameters.get('exitElement')
parameters["quickExtractable"] = False # 是否可以快速提取
parameters['skipCount'] = parameters.get('skipCount', 0)
# 如果(不)固定元素列表循环中只有一个提取数据操作,且提取数据操作的提取内容为元素截图,那么可以快速提取
if len(node["sequence"]) == 1 and self.procedure[node["sequence"][0]]["option"] == 3 and (int(node["parameters"]["loopType"]) == 1 or int(node["parameters"]["loopType"]) == 2):
try:
params = self.procedure[node["sequence"][0]]["parameters"]["params"]
except:
params = self.procedure[node["sequence"][0]]["parameters"]["paras"] # 兼容0.5.0及以下版本的EasySpider
try:
waitElement = self.procedure[node["sequence"][0]]["parameters"]["waitElement"]
except:
waitElement = ""
if node["parameters"]["iframe"]:
node["parameters"]["quickExtractable"] = False # 如果是iframe那么不可以快速提取
if len(node["sequence"]) == 1 and self.procedure[node["sequence"][0]]["option"] == 3 \
and (int(node["parameters"]["loopType"]) == 1 or int(node["parameters"]["loopType"]) == 2):
params = self.procedure[node["sequence"][0]].get("parameters").get("params")
if not params:
params = self.procedure[node["sequence"][0]]["parameters"]["paras"] # 兼容0.5.0及以下版本的EasySpider
waitElement = self.procedure[node["sequence"][0]]["parameters"].get("waitElement", "")
if parameters["iframe"]:
parameters["quickExtractable"] = False # 如果是iframe那么不可以快速提取
else:
node["parameters"]["quickExtractable"] = True # 先假设可以快速提取
if node["parameters"]["skipCount"] > 0:
node["parameters"]["quickExtractable"] = False # 如果有跳过的元素,那么不可以快速提取
parameters["quickExtractable"] = True # 先假设可以快速提取
if parameters["skipCount"] > 0:
parameters["quickExtractable"] = False # 如果有跳过的元素,那么不可以快速提取
for param in params:
optimizable = detect_optimizable(param, ignoreWaitElement=False, waitElement=waitElement)
try:
iframe = param["iframe"]
except:
param["iframe"] = False
if param["iframe"] and not param["relative"]: # 如果是iframe那么不可以快速提取
param['iframe'] = param.get('iframe', False)
if param["iframe"] and not param["relative"]: # 如果是iframe那么不可以快速提取
optimizable = False
if not optimizable: # 如果有一个不满足优化条件,那么就不能快速提取
node["parameters"]["quickExtractable"] = False
if not optimizable: # 如果有一个不满足优化条件,那么就不能快速提取
parameters["quickExtractable"] = False
break
if node["parameters"]["quickExtractable"]:
self.print_and_log("循环操作<" + node["title"] + ">可以快速提取数据")
self.print_and_log("Loop operation <" + node["title"] + "> can extract data quickly")
try:
node["parameters"]["clear"] = self.procedure[node["sequence"][0]]["parameters"]["clear"]
except:
node["parameters"]["clear"] = 0
try:
node["parameters"]["newLine"] = self.procedure[node["sequence"][0]]["parameters"]["newLine"]
except:
node["parameters"]["newLine"] = 1
if int(node["parameters"]["loopType"]) == 1: # 不固定元素列表
if parameters["quickExtractable"]:
self.print_and_log(f"循环操作<{node['title']}>可以快速提取数据")
self.print_and_log(f"Loop operation <{node['title']}> can extract data quickly")
parameters["clear"] = self.procedure[node["sequence"][0]]["parameters"].get("clear", 0)
parameters["newLine"] = self.procedure[node["sequence"][0]]["parameters"].get("newLine", 1)
if int(node["parameters"]["loopType"]) == 1: # 不固定元素列表
node["parameters"]["baseXPath"] = node["parameters"]["xpath"]
elif int(node["parameters"]["loopType"]) == 2: # 固定元素列表
elif int(node["parameters"]["loopType"]) == 2: # 固定元素列表
node["parameters"]["baseXPath"] = node["parameters"]["pathList"]
node["parameters"]["quickParams"] = []
for param in params:
content_type = ""
if param["relativeXPath"].find("/@href") >= 0 or param["relativeXPath"].find("/text()") >= 0 or param["relativeXPath"].find(
"::text()") >= 0:
if param["relativeXPath"].find("/@href") >= 0 or param["relativeXPath"].find("/text()") >= 0 \
or param["relativeXPath"].find("::text()") >= 0:
content_type = ""
elif param["nodeType"] == 2:
content_type = "//@href"
elif param["nodeType"] == 4: # 图片链接
elif param["nodeType"] == 4: # 图片链接
content_type = "//@src"
elif param["contentType"] == 1:
content_type = "/text()"
elif param["contentType"] == 0:
content_type = "//text()"
if param["relative"]: # 如果是相对XPath
if param["relative"]: # 如果是相对XPath
xpath = "." + param["relativeXPath"] + content_type
else:
xpath = param["relativeXPath"] + content_type
@ -422,6 +369,7 @@ class BrowserThread(Thread):
"nodeType": param["nodeType"],
"default": param["default"],
})
self.procedure[index_node]["parameters"] = parameters
self.print_and_log("预处理完成|Preprocess completed")
def readFromExcel(self):
@ -521,7 +469,7 @@ class BrowserThread(Thread):
"/", len(self.links))
self.executeNode(0)
self.urlId = self.urlId + 1
files = os.listdir("Data/Task_" + str(self.id) + "/" + self.saveName)
# files = os.listdir("Data/Task_" + str(self.id) + "/" + self.saveName)
# 如果目录为空,则删除该目录
# if not files:
# os.rmdir("Data/Task_" + str(self.id) + "/" + self.saveName)
@ -538,12 +486,16 @@ class BrowserThread(Thread):
self.print_and_log(f"任务执行完毕,将在{quitWaitTime}秒后自动退出浏览器并清理临时用户目录,等待时间可在保存任务对话框中设置。")
self.print_and_log(f"The task is completed, the browser will exit automatically and the temporary user directory will be cleaned up after {quitWaitTime} seconds, the waiting time can be set in the save task dialog.")
time.sleep(quitWaitTime)
self.browser.quit()
try:
self.browser.quit()
except:
pass
self.print_and_log("正在清理临时用户目录……|Cleaning up temporary user directory...")
try:
shutil.rmtree(self.option["tmp_user_data_folder"])
except:
pass
self.monitor_event.set()
self.print_and_log("清理完成!|Clean up completed!")
self.print_and_log("您现在可以安全的关闭此窗口了。|You can safely close this window now.")
@ -753,28 +705,32 @@ class BrowserThread(Thread):
self.browser.set_script_timeout(max_wait_time)
try:
output = self.browser.execute_script(code)
except:
except Exception as e:
output = ""
self.recordLog("JavaScript execution failed")
self.print_and_log("执行下面的代码时出错:" + code, ",错误为:", str(e))
self.print_and_log("Error executing the following code:" + code, ", error is:", str(e))
elif int(codeMode) == 2:
self.recordLog("Execute JavaScript for element:" + code)
self.recordLog("对元素执行JavaScript:" + code)
self.browser.set_script_timeout(max_wait_time)
try:
output = self.browser.execute_script(code, element)
except:
except Exception as e:
output = ""
self.recordLog("JavaScript execution failed")
self.print_and_log("执行下面的代码时出错:" + code, ",错误为:", str(e))
self.print_and_log("Error executing the following code:" + code, ", error is:", str(e))
elif int(codeMode) == 5:
try:
code = readCode(code)
# global_namespace = globals().copy()
# global_namespace["self"] = self
output = exec(code)
self.recordLog("执行下面的代码:" + code)
self.recordLog("Execute the following code:" + code)
except Exception as e:
self.print_and_log("执行下面的代码时出错:" + code, ",错误为:", e)
self.print_and_log("执行下面的代码时出错:" + code, ",错误为:", str(e))
self.print_and_log("Error executing the following code:" +
code, ", error is:", e)
code, ", error is:", str(e))
elif int(codeMode) == 6:
try:
code = readCode(code)
@ -847,6 +803,23 @@ class BrowserThread(Thread):
self.print_and_log("根据设置的自定义操作,任务已刷新页面|Task refreshed page according to custom operation")
elif codeMode == 9: # 发送邮件
send_email(node["parameters"]["emailConfig"])
elif codeMode == 10: # 清空所有字段值
self.clearOutputParameters()
elif codeMode == 11: # 生成新的数据行
line = new_line(self.outputParameters,
self.maxViewLength, self.outputParametersRecord)
self.OUTPUT.append(line)
elif codeMode == 12: # 退出程序
self.print_and_log("根据设置的自定义操作,任务已退出|Task exited according to custom operation")
self.saveData(exit=True)
self.browser.quit()
self.print_and_log("正在清理临时用户目录……|Cleaning up temporary user directory...")
try:
shutil.rmtree(self.option["tmp_user_data_folder"])
except:
pass
self.print_and_log("清理完成!|Clean up completed!")
os._exit(0)
else: # 0 1 5 6
output = self.execute_code(
codeMode, code, max_wait_time, iframe=params["iframe"])
@ -1106,7 +1079,25 @@ class BrowserThread(Thread):
self.recordLog(
"判断条件内所有条件分支的条件都不满足|None of the conditions in the judgment condition are met")
def handleHistory(self, node, xpath, thisHistoryURL, thisHistoryLength, index, element=None, elements=None):
def handleHistory(self, node, xpath, thisHandle, thisHistoryURL, thisHistoryLength, index, element=None, elements=None):
try:
changed_handle = self.browser.current_window_handle != thisHandle
except: # 如果网页被意外关闭了的情况下
self.browser.switch_to.window(
self.browser.window_handles[-1])
changed_handle = self.browser.window_handles[-1] != thisHandle
if changed_handle: # 如果执行完一次循环之后标签页的位置发生了变化
try:
while True: # 一直关闭窗口直到当前标签页
self.browser.close() # 关闭使用完的标签页
self.browser.switch_to.window(
self.browser.window_handles[-1])
if self.browser.current_window_handle == thisHandle:
break
except Exception as e:
self.print_and_log("关闭标签页发生错误:", e)
self.print_and_log(
"Error occurred while closing tab: ", e)
if self.history["index"] != thisHistoryLength and self.history["handle"] == self.browser.current_window_handle: # 如果执行完一次循环之后历史记录发生了变化,注意当前页面的判断
difference = thisHistoryLength - self.history["index"] # 计算历史记录变化差值
self.browser.execute_script('history.go(' + str(difference) + ')') # 回退历史记录
@ -1132,12 +1123,13 @@ class BrowserThread(Thread):
if self.browser.current_url == thisHistoryURL or ti > thisHistoryLength: # 如果执行完一次循环之后网址发生了变化
break
time.sleep(2)
if element == None: # 不固定元素列表
element = self.browser.find_elements(By.XPATH, xpath, iframe=node["parameters"]["iframe"])
else: # 固定元素列表
element = self.browser.find_element(By.XPATH, xpath, iframe=node["parameters"]["iframe"])
# if index > 0:
# index -= 1 # 如果是data:开头的网址,就要重试一次
if xpath != "":
if element == None: # 不固定元素列表
element = self.browser.find_elements(By.XPATH, xpath, iframe=node["parameters"]["iframe"])
else: # 固定元素列表
element = self.browser.find_element(By.XPATH, xpath, iframe=node["parameters"]["iframe"])
# if index > 0:
# index -= 1 # 如果是data:开头的网址,就要重试一次
else:
if element == None:
element = elements
@ -1156,6 +1148,14 @@ class BrowserThread(Thread):
self.history["handle"] = thisHandle
thisHistoryURL = self.browser.current_url
# 快速提取处理
# start = time.time()
try:
tree = html.fromstring(self.browser.page_source)
except Exception as e:
self.print_and_log("解析页面时出错,将切换普通提取模式|Error parsing page, will switch to normal extraction mode")
node["parameters"]["quickExtractable"] = False
# end = time.time()
# print("解析页面秒数:", end - start)
if node["parameters"]["quickExtractable"]:
self.browser.switch_to.default_content() # 切换到主页面
tree = html.fromstring(self.browser.page_source)
@ -1321,25 +1321,7 @@ class BrowserThread(Thread):
if self.BREAK:
self.BREAK = False
break
try:
changed_handle = self.browser.current_window_handle != thisHandle
except: # 如果网页被意外关闭了的情况下
self.browser.switch_to.window(
self.browser.window_handles[-1])
changed_handle = self.browser.window_handles[-1] != thisHandle
if changed_handle: # 如果执行完一次循环之后标签页的位置发生了变化
try:
while True: # 一直关闭窗口直到当前标签页
self.browser.close() # 关闭使用完的标签页
self.browser.switch_to.window(
self.browser.window_handles[-1])
if self.browser.current_window_handle == thisHandle:
break
except Exception as e:
self.print_and_log("关闭标签页发生错误:", e)
self.print_and_log(
"Error occurred while closing tab: ", e)
index, elements = self.handleHistory(node, xpath, thisHistoryURL, thisHistoryLength, index, elements=elements)
index, elements = self.handleHistory(node, xpath, thisHandle, thisHistoryURL, thisHistoryLength, index, elements=elements)
if int(node["parameters"]["breakMode"]) > 0: # 如果设置了退出循环的脚本条件
output = self.execute_code(int(
node["parameters"]["breakMode"]) - 1, node["parameters"]["breakCode"],
@ -1381,25 +1363,7 @@ class BrowserThread(Thread):
if self.BREAK:
self.BREAK = False
break
try:
changed_handle = self.browser.current_window_handle != thisHandle
except: # 如果网页被意外关闭了的情况下
self.browser.switch_to.window(
self.browser.window_handles[-1])
changed_handle = self.browser.window_handles[-1] != thisHandle
if changed_handle: # 如果执行完一次循环之后标签页的位置发生了变化
try:
while True: # 一直关闭窗口直到当前标签页
self.browser.close() # 关闭使用完的标签页
self.browser.switch_to.window(
self.browser.window_handles[-1])
if self.browser.current_window_handle == thisHandle:
break
except Exception as e:
self.print_and_log("关闭标签页发生错误:", e)
self.print_and_log(
"Error occurred while closing tab: ", e)
index, element = self.handleHistory(node, path, thisHistoryURL, thisHistoryLength, index, element=element)
index, element = self.handleHistory(node, path, thisHandle, thisHistoryURL, thisHistoryLength, index, element=element)
except NoSuchElementException:
self.print_and_log("Loop element not found: ", path)
self.print_and_log("找不到循环元素:", path)
@ -1447,6 +1411,7 @@ class BrowserThread(Thread):
code = get_output_code(output)
if code <= 0:
break
index, _ = self.handleHistory(node, "", thisHandle, thisHistoryURL, thisHistoryLength, index)
elif int(node["parameters"]["loopType"]) == 4: # 固定网址列表
# tempList = node["parameters"]["textList"].split("\r\n")
urlList = list(
@ -1696,8 +1661,11 @@ class BrowserThread(Thread):
try:
actions = ActionChains(self.browser) # 实例化一个action对象
if newTab == 1: # 在新标签页打开
# Ctrl + Click
actions.key_down(Keys.CONTROL).click(element).key_up(Keys.CONTROL).perform()
if sys.platform == "darwin": # Mac
actions.key_down(Keys.COMMAND).click(element).key_up(Keys.COMMAND).perform()
else:
# Ctrl + Click
actions.key_down(Keys.CONTROL).click(element).key_up(Keys.CONTROL).perform()
else:
actions.click(element).perform()
except Exception as e:
@ -1715,6 +1683,21 @@ class BrowserThread(Thread):
script = 'var result = document.evaluate(`' + path + \
'`, document, null, XPathResult.ANY_TYPE, null);for(let i=0;i<arguments[0];i++){result.iterateNext();} result.iterateNext().click();'
self.browser.execute_script(script, str(index)) # 用js的点击方法
elif click_way == 2: # 双击
try:
actions = ActionChains(self.browser) # 实例化一个action对象
actions.double_click(element).perform()
except Exception as e:
self.browser.execute_script("arguments[0].scrollIntoView();", element)
try:
actions = ActionChains(self.browser) # 实例化一个action对象
actions.double_click(element).perform()
except Exception as e:
self.print_and_log(f"Selenium双击元素{path}失败将尝试使用JavaScript双击")
self.print_and_log(f"Failed to double click element {path} with Selenium, will try to double click with JavaScript")
script = 'var result = document.evaluate(`' + path + \
'`, document, null, XPathResult.ANY_TYPE, null);for(let i=0;i<arguments[0];i++){result.iterateNext();} result.iterateNext().click();'
self.browser.execute_script(script, str(index)) # 用js的点击方法
self.recordLog("点击元素|Click element: " + path)
except TimeoutException:
self.print_and_log(
@ -1797,7 +1780,6 @@ class BrowserThread(Thread):
self.print_and_log("History Length Error")
self.history["index"] = 0
self.scrollDown(param) # 根据参数配置向下滚动
# rt.end()
def get_content(self, p, element):
content = ""
@ -1824,7 +1806,7 @@ class BrowserThread(Thread):
downloadPic = 0
if downloadPic == 1:
download_image(self, content, "Data/Task_" +
str(self.id) + "/" + self.saveName + "/", element)
str(self.id) + "/" + self.saveName + "/images", element)
else: # 普通节点
if p["splitLine"] == 1:
text = extract_text_from_html(element.get_attribute('outerHTML'))
@ -1853,7 +1835,7 @@ class BrowserThread(Thread):
downloadPic = 0
if downloadPic == 1:
download_image(self, content, "Data/Task_" +
str(self.id) + "/" + self.saveName + "/", element)
str(self.id) + "/" + self.saveName + "/images", element)
else:
command = 'var arr = [];\
var content = arguments[0];\
@ -1965,6 +1947,8 @@ class BrowserThread(Thread):
content = element.get_attribute(attribute_name)
except:
content = ""
elif p["contentType"] == 15: # 常量值
content = p["JS"]
if content == None:
content = ""
return content
@ -2208,7 +2192,9 @@ if __name__ == '__main__':
"server_address": "http://localhost:8074",
"keyboard": True, # 是否监听键盘输入
"pause_key": "p", # 暂停键
"version": "0.6.0",
"version": "0.6.3",
"docker_driver": "",
"user_folder": "",
}
c = Config(config)
print(c)
@ -2283,7 +2269,9 @@ if __name__ == '__main__':
options.add_argument(
"--disable-blink-features=AutomationControlled") # TMALL 反扒
# 阻止http -> https的重定向
options.add_argument("--disable-features=CrossSiteDocumentBlockingIfIsolating,CrossSiteDocumentBlockingAlways,IsolateOrigins,site-per-process")
options.add_argument("--disable-web-security") # 禁用同源策略
options.add_argument('-ignore-certificate-errors')
options.add_argument('-ignore -ssl-errors')
@ -2302,35 +2290,43 @@ if __name__ == '__main__':
os.mkdir(tmp_user_folder_parent)
characters = string.ascii_letters + string.digits
for i in range(len(c.ids)):
id = c.ids[i]
# 从字符集中随机选择字符构成字符串
random_string = ''.join(random.choice(characters) for i in range(10))
tmp_user_data_folder = os.path.join(tmp_user_folder_parent, "user_data_" + str(id) + "_" + str(time.time()).replace(".","") + "_" + random_string)
tmp_options[i]["tmp_user_data_folder"] = tmp_user_data_folder
if os.path.exists(tmp_user_data_folder):
try:
shutil.rmtree(tmp_user_data_folder)
except:
pass
print(f"Copying user data folder to: {tmp_user_data_folder}, please wait...")
print(f"正在复制用户信息目录到: {tmp_user_data_folder},请稍等...")
if os.path.exists(absolute_user_data_folder):
try:
shutil.copytree(absolute_user_data_folder, tmp_user_data_folder)
print("User data folder copied successfully, if you exit the program before it finishes, please delete the temporary user data folder manually.")
print("用户信息目录复制成功,如果程序在运行过程中被手动退出,请手动删除临时用户信息目录。")
except:
tmp_user_data_folder = absolute_user_data_folder
print("Copy user data folder failed, use the original folder.")
print("复制用户信息目录失败,使用原始目录。")
else:
tmp_user_data_folder = absolute_user_data_folder
print("Cannot find user data folder, create a new folder.")
print("未找到用户信息目录,创建新目录。")
options = tmp_options[i]["options"]
options.add_argument(
f'--user-data-dir={tmp_user_data_folder}') # TMALL 反扒
options.add_argument("--profile-directory=Default")
if c.user_folder == "":
id = c.ids[i]
# 从字符集中随机选择字符构成字符串
random_string = ''.join(random.choice(characters) for i in range(10))
tmp_user_data_folder = os.path.join(tmp_user_folder_parent, "user_data_" + str(id) + "_" + str(time.time()).replace(".","") + "_" + random_string)
tmp_options[i]["tmp_user_data_folder"] = tmp_user_data_folder
if os.path.exists(tmp_user_data_folder):
try:
shutil.rmtree(tmp_user_data_folder)
except:
pass
print(f"Copying user data folder to: {tmp_user_data_folder}, please wait...")
print(f"正在复制用户信息目录到: {tmp_user_data_folder},请稍等...")
if os.path.exists(absolute_user_data_folder):
try:
shutil.copytree(absolute_user_data_folder, tmp_user_data_folder)
print("User data folder copied successfully, if you exit the program before it finishes, please delete the temporary user data folder manually.")
print("用户信息目录复制成功,如果程序在运行过程中被手动退出,请手动删除临时用户信息目录。")
except:
tmp_user_data_folder = absolute_user_data_folder
print("Copy user data folder failed, use the original folder.")
print("复制用户信息目录失败,使用原始目录。")
else:
tmp_user_data_folder = absolute_user_data_folder
print("Cannot find user data folder, create a new folder.")
print("未找到用户信息目录,创建新目录。")
options.add_argument(
f'--user-data-dir={tmp_user_data_folder}') # TMALL 反扒
print(f"Use local user data folder: {tmp_user_data_folder}")
print(f"使用本地用户信息目录: {tmp_user_data_folder}")
else:
options.add_argument(
f'--user-data-dir={c.user_folder}')
print(f"Use specifed user data folder: {c.user_folder}", ", please note if you are using docker, this user folder path should be the path inside the docker container.")
print(f"使用指定的用户信息目录: {c.user_folder}", "请注意如果您正在使用docker此用户文件夹路径应是容器内的路径。")
print(
"如果报错Selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally说明有之前运行的Chrome实例没有正常关闭请关闭之前打开的所有Chrome实例后再运行程序即可。")
print(
@ -2343,9 +2339,13 @@ if __name__ == '__main__':
print("id: ", id)
if c.read_type == "remote":
print("remote")
content = requests.get(
try:
content = requests.get(
c.server_address + "/queryExecutionInstance?id=" + str(id))
service = json.loads(content.text) # 加载服务信息
service = json.loads(content.text) # 加载服务信息
except:
print("Cannot connect to the server, please make sure that the EasySpider Main Program is running, or you can change the --read_type parameter to 'local' to read the task information from the local task file without keeping the EasySpider Main Program running.")
print("无法连接到服务器请确保EasySpider主程序正在运行或者您可以将--read_type参数更改为'local'以实现从本地任务文件中读取任务信息而无需保持EasySpider主程序运行。")
else:
print("local")
local_folder = os.path.join(os.getcwd(), "execution_instances")
@ -2370,8 +2370,8 @@ if __name__ == '__main__':
cloudflare = 0
if cloudflare == 0:
options.add_argument('log-level=3') # 隐藏日志
path = os.path.join(os.path.abspath("./"), "Data", "Task_" + str(id))
print("Data path:", path)
path = os.path.join(os.path.abspath("./"), "Data", "Task_" + str(id), "files")
print("文件下载路径|File Download path:", path)
options.add_experimental_option("prefs", {
# 设置文件下载路径
"download.default_directory": path,
@ -2396,8 +2396,17 @@ if __name__ == '__main__':
except:
browser = "chrome"
if browser == "chrome":
selenium_service = Service(executable_path=driver_path)
browser_t = MyChrome(service=selenium_service, options=options)
if c.docker_driver == "":
print("Using local driver")
selenium_service = Service(executable_path=driver_path)
browser_t = MyChrome(service=selenium_service, options=options, mode='local_driver')
else:
print("Using remote driver")
# Use docker driver, default address is http://localhost:4444/wd/hub
# Headless mode
# options.add_argument("--headless")
# print("Headless mode")
browser_t = MyChrome(command_executor=c.docker_driver, options=options, mode='remote_driver')
elif browser == "edge":
from selenium.webdriver.edge.service import Service as EdgeService
from selenium.webdriver.edge.options import Options as EdgeOptions
@ -2458,6 +2467,7 @@ if __name__ == '__main__':
# print("Passing the Cloudflare verification mode is sometimes unstable. If the verification fails, you need to try again every few minutes, or you can change to a new user information folder and then execute the task.")
# 使用监听器监听键盘输入
try:
from pynput.keyboard import Key, Listener
if c.keyboard:
with Listener(on_press=on_press_creator(press_time, event),
on_release=on_release_creator(event, press_time)) as listener:

View File

@ -1 +1,50 @@
#!/bin/bash
# 使用 lsb_release 获取系统信息
os_name=$(lsb_release -si)
os_version=$(lsb_release -sr)
# 提取主版本号副版本号
major_version=$(echo $os_version | cut -d'.' -f1)
minor_version=$(echo $os_version | cut -d'.' -f2)
# 检查是否为Ubuntu且版本大于等于24.04
if [ "$os_name" == "Ubuntu" ] && [ "$major_version" -gt 24 ] || { [ "$major_version" -eq 24 ]; }; then
# 要检查的文件路径
file_path="./EasySpider/chrome-sandbox"
# 检查文件是否存在
if [ ! -e "$file_path" ]; then
echo "File Not Exist!"
exit 1
fi
# 获取文件的拥有者
owner=$(stat -c %U "$file_path")
# 获取文件的权限
permissions=$(stat -c %a "$file_path")
# 检查拥有者是否为root且权限是否为4755
if [ "$owner" != "root" ] || [ "$permissions" != "4755" ]; then
echo "这是你第一次在该Ubuntu系统上使用EasySpider请在下方输入密码来调整文件权限以使用EasySpider"
echo "This is the first time you use EasySpider in this Ubuntu system, please change your permission of the software by input your password below (should have root/sudo permission):"
sudo chown root:root "$file_path"
sudo chmod 4755 "$file_path"
sudo chown root:root "./EasySpider/resources/app/chrome_linux64/chrome-sandbox"
sudo chmod 4755 "./EasySpider/resources/app/chrome_linux64/chrome-sandbox"
fi
else
echo "如果报错“The SUID sandbox helper binary was found, but is not configured correctly”请尝试执行以下命令后再次运行EasySpider"
echo "If you encounter the error message “The SUID sandbox helper binary was found, but is not configured correctly”, please try run the following commands and run EasySpider again:"
echo ""
echo "sudo chown root:root ./EasySpider/chrome-sandbox"
echo "sudo chmod 4755 ./EasySpider/chrome-sandbox"
echo "sudo chown root:root ./EasySpider/resources/app/chrome_linux64/chrome-sandbox"
echo "sudo chmod 4755 ./EasySpider/resources/app/chrome_linux64/chrome-sandbox"
echo ""
echo ""
fi
./EasySpider/EasySpider

View File

@ -23,7 +23,7 @@ For more complex operations, please download the source code and compile it for
"""
# 请在下面编写你的代码,不要有代码缩进!!! | Please write your code below, do not indent the code!!!
print(globals())
# 导包 | Import packages
from selenium.common.exceptions import ElementClickInterceptedException
@ -56,3 +56,20 @@ finally:
print("All parameters:", self.outputParameters)
print(test(3))
print("执行完毕|Execution completed")
import time
time.sleep(3)
def new_line(outputParameters, maxViewLength, record):
line = []
print("Use this function to print a new line in the console")
i = 0
for value in outputParameters.values():
line.append(value)
if record[i]:
print(value[:maxViewLength], " ", end="")
i += 1
print("")
return line
new_line(self.outputParameters, 10, [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])

View File

@ -4,82 +4,14 @@ Then EasySpider will be opened, and don't close the terminal when running EasySp
Official Site: https://www.easyspider.net
Welcome to promote this software to other friends.
Welcome to promote this software to other friends and star our Github Repository!
This version is for Ubuntu 20.04, Debian, Deepin x64 and above.
The software's open-source code repository on GitHub: https://github.com/NaiboWang/EasySpider
Official documentation can be found at: https://github.com/NaiboWang/EasySpider/wiki
Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp
Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders.
======Version Update Instruction======
Please see more new features for version greater than v0.3.2 at github release page: https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## Update Instruction
1. Selected child element operations can delete fields and unmark deleted fields in real-time in the browser.
2. Selecting child elements adds a selection mode that allows you to choose only the child elements that are present in all blocks or the child elements that are the same as the first selected block.
3. In the text input and webpage open options, you can use the extracted field value as a variable for text input, represented by Field["field_name"].
4. Files can be downloaded, such as PDF files.
5. Fixed a bug where the software could display a blank screen for about 10 seconds after opening, making it usable in intranets, darknets, and any local network.
6. Fixed a bug where the current page URL and title could not be extracted.
7. Fixed a bug where OCR recognition could fail to extract information.
8. Updated extraction logic to save locally every 10 records collected.
9. When modifying a task, the default anchor position is set to after the last operation in the task flow.
10. Updated Chrome version to 114.
-----v0.3.1-----
1. Advanced Operations:
- Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations.
- Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element.
2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
4. Added the functionality to extract the background image URL of elements, current page title, and current page URL.
5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
6. Added the functionality to download images.
7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
11. Added instructions on how to execute tasks from the command line.
12. Added headless mode configuration, allowing the software to run without a browser interface.
13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
15. Fixed the issue where the input box would freeze after saving a task.
16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
17. Added the functionality to move the mouse to an element.
18. Displays a prompt when an element cannot be found.
19. Fixed the webpage scrolling bug.
20. The task name is initialized with the value of the page title upon the first visit.
21. Added version update prompts.
22. Added the information of the publisher as requested.
23. Updated Chrome version to 113.

File diff suppressed because one or more lines are too long

View File

@ -1 +1 @@
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"12/7/2023, 2:56:47 AM","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}}]}
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"2024-01-05 22:08:46","version":"0.6.0","saveThreshold":10,"quitWaitTime":3,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"TTT","dataWriteMode":3,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"},{"id":1,"name":"loopTimes_1","nodeId":5,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":10,"value":10}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,5],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":2,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":3,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":2,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":4,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":3,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":2,"index":5,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[2],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"//body","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":10,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

View File

@ -1 +1 @@
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"07/12/2023, 03:43:34","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"desc":"https://www.zhihu.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}}]}
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"2023-12-27 20:05:50","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"知了个乎","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"},{"id":1,"name":"loopTimes_1","nodeId":4,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":0,"value":0}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,4,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":2,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":4,"index":3,"parentId":3,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}},{"id":2,"index":4,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

View File

@ -1 +1 @@
{"id":70,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/24/2023, 8:21:45 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":7,"title":"移动到元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":2,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"allXPaths":"","loopType":1}}]}
{"id":-2,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/24/2023, 8:21:45 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":7,"title":"移动到元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":2,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"allXPaths":"","loopType":1}}]}

File diff suppressed because one or more lines are too long

View File

@ -1,4 +1,4 @@
欢迎将软件宣传给更多需要的朋友!
欢迎将软件宣传给更多需要的朋友和Star我们的Github仓库
在此文件夹下打开Linux Terimal, 并输入以下命令运行软件:
./easy-spider.sh
@ -8,99 +8,10 @@
支持Ubuntu 20.04, Debian, Deepin x64及以上版本。
软件开源代码Github库地址https://github.com/NaiboWang/EasySpider
官方文档地址https://github.com/NaiboWang/EasySpider/wiki
视频教程https://www.bilibili.com/video/BV1th411A7ey/
可以从其他机器导入任务只需要把其他机器的tasks文件夹里的.json文件放入此目录的tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意两个文件夹里的.json文件只支持命名为大于0的数字。
======版本更新说明======
v0.3.2以上版本更新说明请查看Github Release Pages页面https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## 更新说明
1. 选中子元素操作可删除字段并在浏览器中实时取消标记被删除的字段。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/e016c832-6ff9-4814-b86c-38787e73aa30" width=50% />
2. 选中子元素增加选择模式,可以只选择所有块都有的子元素,或者所有块中和第一个选中的块相同的子元素。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/0082b11d-96bc-43f1-acdb-8280decb48b4" width=50% />
3. 输入文字和打开网页选项中可以使用最后一次提取到的字段值**作为变量**进行文字输入,用`Field["字段名"]`表示此变量。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/d81cd082-e01a-490e-85f7-9baac93510d8)
4. 可下载文件如PDF。
5. 修复打开后有可能会白屏10秒左右的Bug使得在内网暗网以及任意局域网都可以使用软件。
6. 修复提取当前页面URL和标题时可能提取不到的bug。
7. 修复OCR识别可能提取不到的bug。
8. 提取逻辑更新为每采集10条本地保存一次。
9. 修改任务时默认锚点位置为任务流程的最后操作后。
10. 更新Chrome版本为114。
-----V0.3.1-----
如果下载速度慢,可以考虑中国境内下载地址:[中国境内下载地址](https://github.com/NaiboWang/EasySpider/releases/download/v0.3.0/Download_Link_Address_in_China_Mainland.txt)。
### 强烈建议大家观看新特性讲解视频
B站最新版特性视频已上传新视频非常有用推荐大家观看。
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)
[如何同时执行多个任务(并行多开)](https://www.bilibili.com/video/BV13c411G7LE/)
[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)
[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)
[如何对元素和网页截图及(无头模式)命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)
[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)
注意v0.3.1版本任务task文件夹内`.json`文件和之前所有版本均不兼容请重新设计v0.3.1版本任务。
## 更新说明
1. 自定义操作:
- 可以在任务流程中**执行自定义脚本**,包括在浏览器中**执行Javascript指令**以及**操作系统级别的脚本调用**并可**得到命令返回值并记录**,大大扩展了可操作空间。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/06e63a06-328d-4339-b40b-2d57c94cee66)
- 在每一个操作执行前和执行后都可以指定执行一段针对当前定位元素的JavaScript指令。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/dde64388-5668-40ff-951e-fb8f60655c49" height=50% width=50%>
2. **判断条件和循环条件**中同样增加了**执行自定义脚本**并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定自定义操作可以操作循环内元素。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/9dea0564-1a1c-487d-9fa4-427c5e284796)
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/5ce7cf50-e5c9-4714-a83b-9c65934e9c68" width=50%></img>
3. 可同时生成多种XPath供用户选择并**预装了XPath Helper扩展**供大家调试XPath。
4. 增加采集元素背景图片地址当前页面标题当前页面URL地址功能。
5. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
6. 增加下载图片功能。
7. 增加OCR识别元素功能使用此功能需首先自行安装Tesseract库[https://blog.csdn.net/u010454030/article/details/80515501](https://blog.csdn.net/u010454030/article/details/80515501)
8. 可直接提取对元素执行JavaScript代码后的返回值实现如正则表达式获得元素背景颜色等功能。
9. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/c0b2bec1-2a97-4516-930e-1b310697212b" width=50%></img>
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/42cc0009-00d1-4c5c-af47-0fa6340fba80)
10. 大幅增加使用提示和说明使软件更易用如增加了iframe标签的处理方式说明各个选项的参数意义以及循环项XPath的修改说明等等
11. 执行命令时增加了如何用命令行执行任务的提示:[https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction](https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction)。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/a9e774df-e345-4d51-b7c9-2c4dac0ec624)
12. 增加并行多开模式。
13. 增加无头模式,即无浏览器界面模式配置。
14. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
15. 修复了条件分支没有无条件分支时会卡死的问题。
16. 修复了保存任务后会输入框卡死的问题。
17. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
18. 增加了鼠标移动到元素功能。
19. 找不到元素时会提示。
20. 修复网页滚动Bug。
21. 增加新增提取数据字段操作。
22. 任务名称初始化为第一次进入页面的标题值。
23. 增加版本更新提示。
24. 应要求增加出品方信息。
25. 更新chrome版本为113。

View File

@ -1,8 +1,29 @@
Due to the complex security settings of MacOS, the issue of being unable to open software due to the "unverified developer" message may occur upon the first attempt to open the software. Please refer to the following GitHub document to see how to open software and perform tasks on your MacOS version:
Due to MacOS's complex security settings, software downloaded for the first time will warn that the developer is unverified and will not allow the application to run. Please follow these steps to unlock:
https://github.com/NaiboWang/EasySpider/wiki/MacOS-Guide
1. Open the system Terminal.
The main steps are as follows:
2. Navigate to the EasySpider software directory, such as:
cd ~/Downloads/EasySpider_MacOS
3. In the EasySpider directory, run the `first_time_run.sh` script to modify the package properties by using the following command:
bash first_time_run.sh
This will unlock EasySpider for both design and execution stages.
If you encounter errors such as the one below during the command execution, they can be ignored, and you may proceed to open the software after the command completes:
xattr: [Errno 13] Permission denied: 'EasySpider.app/Contents/Resources/app/node_modules/node-window-manager/build/node_gyp_bins/python3'
For another solution, refer to this video on how to open software and execute tasks in MacOS version: https://www.bilibili.com/video/BV1E34y137fT/
- Design phase - Apple Arm chip version of MacOS

View File

@ -4,6 +4,6 @@ There is a potential issue with the software for MacOS, in that the Chrome softw
To check the Chrome version, enter the EasySpider software and right-click to "Show Package Contents". Then go to Contents/Resources/app folder and double-click on the chrome_mac64 software to open Chrome. Then go to Settings -> About to check if the Chrome version matches the version of chromedriver_mac64 when you open it manually.
If it is not, you can download the corresponding macOS version of Chromedriver for your current Chrome version from the following website: https://chromedriver.chromium.org/downloads, and then place the downloaded Chromedriver in the Contents/Resources/app folder mentioned above, rename it and replace the "chromedriver_mac64" file to restore normal use of the software.
If it is not, you can download the corresponding macOS version of Chromedriver for your current Chrome version (just check at the main version number before the first decimal point, such as 122) from the following website: https://chromedriver.chromium.org/downloads, and then place the downloaded Chromedriver in the Contents/Resources/app folder mentioned above, rename it and replace the "chromedriver_mac64" file to restore normal use of the software.

File diff suppressed because one or more lines are too long

View File

@ -1 +1 @@
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"12/7/2023, 2:56:47 AM","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}}]}
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"2024-01-05 22:08:46","version":"0.6.0","saveThreshold":10,"quitWaitTime":3,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"TTT","dataWriteMode":3,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"},{"id":1,"name":"loopTimes_1","nodeId":5,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":10,"value":10}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,5],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":2,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":3,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":2,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":4,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":3,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":2,"index":5,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[2],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"//body","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":10,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

View File

@ -1 +1 @@
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"07/12/2023, 03:43:34","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"desc":"https://www.zhihu.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}}]}
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"2023-12-27 20:05:50","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"知了个乎","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"},{"id":1,"name":"loopTimes_1","nodeId":4,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":0,"value":0}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,4,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":2,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":4,"index":3,"parentId":3,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}},{"id":2,"index":4,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1 @@
{"id":309,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"2023-12-24 00:34:50","update_time":"2023-12-24 00:36:58","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":1,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.jd.com"},{"id":1,"name":"inputText_1","nodeName":"输入文字","nodeId":2,"desc":"要输入的文本,如京东搜索框输入:电脑","type":"text","exampleValue":"JS(\"return new Date().getYear()\")1","value":"JS(\"return new Date().getYear()\")1"}],"outputParameters":[{"id":0,"name":"参数1_链接文本","desc":"","type":"text","recordASField":1,"exampleValue":"手机"},{"id":1,"name":"参数2_链接地址","desc":"","type":"text","recordASField":1,"exampleValue":"https://shouji.jd.com/"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2,3],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":0,"option":4,"title":"输入文字","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[@id=\"key\"]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"value":"JS(\"return new Date().getYear()\")1","index":0,"allXPaths":["/html/body/div[4]/div[1]/div[2]/div[1]/input[1]","//input[contains(., '')]","id(\"key\")","//INPUT[@class='text']","/html/body/div[last()-6]/div/div[last()-2]/div/input"]}},{"id":3,"index":3,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[4],"isInLoop":false,"position":2,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div/a","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":["/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/a[1]","//a[contains(., '手机')]","/html/body/div[last()-5]/div/div[last()-4]/div/div[last()-2]/div/div/div/div[last()-1]/div[last()-12]/a[last()-1]"]}},{"id":4,"index":4,"parentId":3,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":1,"contentType":8,"relative":true,"name":"参数1_链接文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"手机"}],"unique_index":"ughtq41gxwnlqia7awp","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0},{"nodeType":2,"contentType":0,"relative":true,"name":"参数2_链接地址","desc":"","relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"https://shouji.jd.com/"}],"unique_index":"ughtq41gxwnlqia7awp","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0}]}}]}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1 @@
{"id":311,"name":"重命名测试","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"2023-12-28 14:05:20","update_time":"2023-12-28 14:05:43","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"TTT","dataWriteMode":3,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.jd.com"}],"outputParameters":[{"id":0,"name":"参数1_链接文本","desc":"","type":"text","recordASField":1,"exampleValue":"手机"},{"id":1,"name":"参数2_链接地址","desc":"","type":"text","recordASField":1,"exampleValue":"https://shouji.jd.com/"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div/a","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":["/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/a[1]","//a[contains(., '手机')]","/html/body/div[last()-5]/div/div[last()-4]/div/div[last()-2]/div/div/div/div[last()-1]/div[last()-12]/a[last()-1]"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":1,"contentType":0,"relative":true,"name":"参数1_链接文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"手机"}],"unique_index":"zvn77ulso2lqoswqo4","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0},{"nodeType":2,"contentType":0,"relative":true,"name":"参数2_链接地址","desc":"","relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"https://shouji.jd.com/"}],"unique_index":"zvn77ulso2lqoswqo4","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0}]}}]}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1 @@
{"id":315,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"2023-12-29 22:34:23","update_time":"2023-12-29 22:38:36","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.jd.com"}],"outputParameters":[{"id":0,"name":"Text","desc":"自定义操作返回的数据","type":"text","recordASField":1,"exampleValue":""},{"id":1,"name":"Link","desc":"自定义操作返回的数据","type":"text","recordASField":1,"exampleValue":""}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环点击每个元素","sequence":[4,5,3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div/a","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":""}},{"id":5,"index":3,"parentId":2,"type":0,"option":2,"title":"点击元素","sequence":[],"isInLoop":true,"position":2,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"newTab":1,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":""}},{"id":3,"index":4,"parentId":2,"type":0,"option":5,"title":"Text","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":0,"codeMode":2,"code":"return arguments[0].innerText","waitTime":0,"recordASField":1,"paraType":"text","emailConfig":{"host":"","port":465,"username":"","password":"","from":"","to":"","subject":"","content":""}}},{"id":4,"index":5,"parentId":2,"type":0,"option":5,"title":"Link","sequence":[],"isInLoop":true,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"codeMode":2,"code":"return arguments[0].href","waitTime":0,"recordASField":1,"paraType":"text","emailConfig":{"host":"","port":465,"username":"","password":"","from":"","to":"","subject":"","content":""}}}]}

View File

@ -0,0 +1 @@
{"id":316,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"2023-12-30 22:35:04","update_time":"2023-12-30 22:35:12","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"csv","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.jd.com"}],"outputParameters":[{"id":0,"name":"自定义操作","desc":"自定义操作返回的数据","type":"text","recordASField":0,"exampleValue":""}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":0,"option":5,"title":"自定义操作","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"codeMode":12,"code":"","waitTime":0,"recordASField":0,"paraType":"text","emailConfig":{"host":"","port":465,"username":"","password":"","from":"","to":"","subject":"","content":""}}}]}

View File

@ -0,0 +1 @@
{"id":317,"name":"图片下载","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"2024-01-05 22:14:43","update_time":"2024-01-05 22:15:19","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"csv","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.jd.com"}],"outputParameters":[{"id":0,"name":"参数2_图片地址","desc":"","type":"text","recordASField":1,"exampleValue":"//m.360buyimg.com/babel/jfs/t1/232616/15/5744/219106/656d810aF16705ea9/41c4997dc1b81f17.png"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,3],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":-1,"index":2,"parentId":0,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":4,"contentType":0,"relative":false,"name":"参数1_图片地址","desc":"","extractType":0,"relativeXPath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[2]/div[1]/div[1]/a[1]/img[1]","allXPaths":["/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[2]/div[1]/div[1]/a[1]/img[1]","//img[contains(., '')]","/html/body/div[last()-6]/div/div[last()-4]/div/div[last()-1]/div/div[last()-1]/div/div[last()-1]/div/div[last()-3]/div/div/a/img"],"exampleValues":[{"num":0,"value":"//m.360buyimg.com/babel/s1420x740_jfs/t1/194401/20/32669/76553/64142a96F7733e6ad/cf2727848c86cf45.jpg!q70.dpg"}],"unique_index":"i9in42ta6klr0pwp4k","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0}]}},{"id":2,"index":3,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[4],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div/div[1]/div[1]/a[1]/img[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":["/html/body/div[5]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/a[1]/img[1]","//img[contains(., '')]","/html/body/div[last()-5]/div/div[last()-4]/div/div[last()-1]/div/div[last()-1]/div/div[last()-1]/div/div[last()-4]/div/div/a/img"]}},{"id":3,"index":4,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":4,"contentType":0,"relative":true,"name":"参数2_图片地址","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"//m.360buyimg.com/babel/jfs/t1/232616/15/5744/219106/656d810aF16705ea9/41c4997dc1b81f17.png"}],"unique_index":"i81avec75qflr0pwym8","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":1,"splitLine":0}]}}]}

View File

@ -0,0 +1 @@
{"id":318,"name":"京东(JD.COM)-正品低价、品质保障、配送及时、轻松购物!","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"2024-04-22 05:08:03","update_time":"2024-04-22 05:19:48","version":"0.6.2","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"csv","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.jd.com"}],"outputParameters":[{"id":0,"name":"参数1_链接文本","desc":"","type":"text","recordASField":1,"exampleValue":"电脑数码"},{"id":1,"name":"参数2_链接地址","desc":"","type":"text","recordASField":1,"exampleValue":"https://prodev.jd.com/mall/active/31XPWPTonxJ9e5YoQ85HS7z8XNYQ/index.html?babelChannel=ttt40"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[4]/div[1]/div[4]/ul[1]/li/a[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":["/html/body/div[1]/div[4]/div[1]/div[4]/ul[1]/li[1]/a[1]","//a[contains(., '电脑数码')]","//A[@class='navitems-lk']","/html/body/div[last()-5]/div[last()-2]/div/div[last()-1]/ul/li[last()-8]/a"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":1,"contentType":15,"relative":true,"name":"参数1_链接文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"电脑数码"}],"unique_index":"auwkv5g1krqlva0tsc4","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"123","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0},{"nodeType":2,"contentType":0,"relative":true,"name":"参数2_链接地址","desc":"","relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"https://prodev.jd.com/mall/active/31XPWPTonxJ9e5YoQ85HS7z8XNYQ/index.html?babelChannel=ttt40"}],"unique_index":"auwkv5g1krqlva0tsc4","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0}]}}]}

View File

@ -0,0 +1 @@
{"id":-2,"name":"百度一下,你就知道","url":"https://www.baidu.com?id=1","links":"https://www.baidu.com?id=11\nhttps://www.baidu.com?id=12","create_time":"2024-04-22 05:45:12","update_time":"2024-04-22 05:45:20","version":"0.6.2","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"csv","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.baidu.com?id=1","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.baidu.com?id=11\nhttps://www.baidu.com?id=12","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.baidu.com?id=11\nhttps://www.baidu.com?id=12"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.baidu.com?id=1","links":"https://www.baidu.com?id=11\nhttps://www.baidu.com?id=12","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}}]}

View File

@ -0,0 +1 @@
{"id":320,"name":"百度一下,你就知道","url":"https://www.baidu.com","links":"https://www.baidu.com","create_time":"2024-04-22 05:53:18","update_time":"2024-04-22 05:53:28","version":"0.6.2","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"csv","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.baidu.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.baidu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.baidu.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.baidu.com","links":"https://www.baidu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环点击每个元素","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[2]/div[1]/div[5]/div[1]/div[1]/div[3]/ul[1]/li/a[1]/span[2]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":2,"title":"点击元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"newTab":1,"maxWaitTime":10,"params":[],"alertHandleType":0,"downloadWaitTime":3600,"allXPaths":""}}]}

View File

@ -0,0 +1 @@
{"id":321,"name":"百度一下,你就知道","url":"https://www.baidu.com","links":"https://www.baidu.com","create_time":"2024-04-22 07:02:02","update_time":"2024-04-22 07:02:16","version":"0.6.2","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"csv","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.baidu.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.baidu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.baidu.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.baidu.com","links":"https://www.baidu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环点击每个元素","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[2]/div[1]/div[5]/div[1]/div[1]/div[3]/ul[1]/li/a[1]/span[2]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":2,"title":"点击元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"newTab":1,"maxWaitTime":10,"params":[],"alertHandleType":0,"downloadWaitTime":3600,"allXPaths":""}}]}

View File

@ -0,0 +1 @@
{"id":322,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"2024-04-22 08:13:15","update_time":"2024-04-22 08:13:33","version":"0.6.2","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"csv","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环点击每个元素","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div/a","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":2,"title":"点击元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"newTab":1,"maxWaitTime":10,"params":[],"alertHandleType":0,"downloadWaitTime":3600,"allXPaths":""}}]}

View File

@ -0,0 +1 @@
{"id":323,"name":"新web采集任务","url":"https://www.baidu.com","links":"https://www.baidu.com","create_time":"","update_time":"2024-08-10 17:29:04","version":"0.6.2","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"csv","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.baidu.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.baidu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.baidu.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.baidu.com","links":"https://www.baidu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}}]}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1 @@
{"id":325,"name":"百度一下,你就知道","url":"https://www.baidu.com","links":"https://www.baidu.com","create_time":"2024-12-30 22:37:29","update_time":"2024-12-30 22:37:43","version":"0.6.3","saveThreshold":10,"quitWaitTime":60,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"csv","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://www.baidu.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.baidu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.baidu.com"}],"outputParameters":[{"id":0,"name":"参数1_链接文本","desc":"","type":"text","recordASField":1,"exampleValue":"0暖心2024 总书记的贴心话"},{"id":1,"name":"参数2_链接地址","desc":"","type":"text","recordASField":1,"exampleValue":"https://www.baidu.com/s?wd=%E6%9A%96%E5%BF%832024+%E6%80%BB%E4%B9%A6%E8%AE%B0%E7%9A%84%E8%B4%B4%E5%BF%83%E8%AF%9D&sa=fyb_n_homepage&rsv_dl=fyb_n_homepage&from=super&cl=3&tn=baidutop10&fr=top1000&rsv_idx=2&hisfilter=1"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.baidu.com","links":"https://www.baidu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/div[5]/div[1]/div[1]/div[3]/ul[1]/li/a[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0,"allXPaths":["/html/body/div[1]/div[1]/div[5]/div[1]/div[1]/div[3]/ul[1]/li[1]/a[1]","//a[contains(., '0暖心2024 总')]","//a[@class='title-content c-link c-font-medium c-line-clamp1']","/html/body/div[last()-4]/div[last()-3]/div[last()-3]/div/div/div/ul/li[last()-9]/a"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":1,"contentType":8,"relative":true,"name":"参数1_链接文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"0暖心2024 总书记的贴心话"}],"unique_index":"8rtq2is658sm5b58osr","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0},{"nodeType":2,"contentType":0,"relative":true,"name":"参数2_链接地址","desc":"","relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"https://www.baidu.com/s?wd=%E6%9A%96%E5%BF%832024+%E6%80%BB%E4%B9%A6%E8%AE%B0%E7%9A%84%E8%B4%B4%E5%BF%83%E8%AF%9D&sa=fyb_n_homepage&rsv_dl=fyb_n_homepage&from=super&cl=3&tn=baidutop10&fr=top1000&rsv_idx=2&hisfilter=1"}],"unique_index":"8rtq2is658sm5b58osr","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0,"splitLine":0}]}}]}

View File

@ -1 +1 @@
{"id":70,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/24/2023, 8:21:45 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":7,"title":"移动到元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":2,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"allXPaths":"","loopType":1}}]}
{"id":-2,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/24/2023, 8:21:45 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":7,"title":"移动到元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":2,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"allXPaths":"","loopType":1}}]}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,5 @@
#!/bin/bash
xattr -cr EasySpider.app
xattr -cr easyspider_executestage
xattr -cr easyspider_executestage_full

View File

@ -23,7 +23,7 @@ For more complex operations, please download the source code and compile it for
"""
# 请在下面编写你的代码,不要有代码缩进!!! | Please write your code below, do not indent the code!!!
print(globals())
# 导包 | Import packages
from selenium.common.exceptions import ElementClickInterceptedException
@ -56,3 +56,20 @@ finally:
print("All parameters:", self.outputParameters)
print(test(3))
print("执行完毕|Execution completed")
import time
time.sleep(3)
def new_line(outputParameters, maxViewLength, record):
line = []
print("Use this function to print a new line in the console")
i = 0
for value in outputParameters.values():
line.append(value)
if record[i]:
print(value[:maxViewLength], " ", end="")
i += 1
print("")
return line
new_line(self.outputParameters, 10, [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])

View File

@ -1,87 +1,22 @@
Official Site: https://www.easyspider.net
Welcome to promote this software to other friends.
Welcome to promote this software to other friends and star our Github Repository!
This version is for MacOS, can be used on all Chips, including Intel (such as Corel i7) and Arm (such as M1). Support on MacOS 11.x and above.
If your MacOS version is 10.x and below, please download EasySpider V0.2.0.
The software's open-source code repository on GitHub: https://github.com/NaiboWang/EasySpider
Official documentation can be found at: https://github.com/NaiboWang/EasySpider/wiki
Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp
You can import tasks from other machines by simply opening the EasySpider software in this directory, right-clicking "Show Package Contents", and then placing the .json files from the tasks folder in the /Users/your user name/Library/Application Support/EasySpider/tasks folder of the other machine. Similarly, execution ID files can be imported by copying the .json files from the execution_instances folder. Please note that the .json files in both folders only support names greater than 0.
You can import tasks from other machines by simply opening the EasySpider software in this directory, right-clicking "Show Package Contents", and then placing the .json files from the tasks folder in the /Users/Your User Name/Library/Application Support/EasySpider/tasks folder of the other machine. Similarly, execution ID files can be imported by copying the .json files from the execution_instances folder. Please note that the .json files in both folders only support names greater than 0.
You can quickly navigate to the tasks folder using the following commands:
cd /Users/$(whoami)/Library/Application\ Support/EasySpider/tasks
open .
If you need to press p one the keyboard to pause and continue the execution of the task, you need to grant the program keyboard monitoring permission.
======Version Update Instruction======
Please see more new features for version greater than v0.3.2 at github release page: https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## Update Instruction
1. Selected child element operations can delete fields and unmark deleted fields in real-time in the browser.
2. Selecting child elements adds a selection mode that allows you to choose only the child elements that are present in all blocks or the child elements that are the same as the first selected block.
3. In the text input and webpage open options, you can use the extracted field value as a variable for text input, represented by Field["field_name"].
4. Files can be downloaded, such as PDF files.
5. Fixed a bug where the software could display a blank screen for about 10 seconds after opening, making it usable in intranets, darknets, and any local network.
6. Fixed a bug where the current page URL and title could not be extracted.
7. Fixed a bug where OCR recognition could fail to extract information.
8. Updated extraction logic to save locally every 10 records collected.
9. When modifying a task, the default anchor position is set to after the last operation in the task flow.
10. Updated Chrome version to 114.
-----V0.3.1-----
1. Advanced Operations:
- Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations.
- Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element.
2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
4. Added the functionality to extract the background image URL of elements, current page title, and current page URL.
5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
6. Added the functionality to download images.
7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
11. Added instructions on how to execute tasks from the command line.
12. Added headless mode configuration, allowing the software to run without a browser interface.
13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
15. Fixed the issue where the input box would freeze after saving a task.
16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
17. Added the functionality to move the mouse to an element.
18. Displays a prompt when an element cannot be found.
19. Fixed the webpage scrolling bug.
20. The task name is initialized with the value of the page title upon the first visit.
21. Added version update prompts.
22. Added the information of the publisher as requested.
23. Updated Chrome version to 113.

View File

@ -3,7 +3,7 @@
MacOS版本的软件有一个问题可能存在,即软件所调用的Chrome软件会在打开后经常性自动更新,但软件所依赖的Chromedriver版本并不会随着Chrome自动更新,从而导致软件打不开Chrome的问题。
检查Chrome版本的方式为进入EasySpider软件内部即右键软件“显示包内容”然后进入Contents/Resources/app文件夹内,手动双击打开chrome_mac64软件打开Chrome然后打开设置->关于Chrome来查看Chrome版本是否和手动打开chromedriver_mac64后显示的版本相同。
如果不是请自行到以下网址下载对应自己当前Chrome版本的macOS版本的Chromedriverhttps://googlechromelabs.github.io/chrome-for-testing并将chromedriver文件放在上面提到的Contents/Resources/app文件夹内更名并替换掉“chromedriver_mac64”文件即可使软件恢复正常使用。
如果不是请自行到以下网址下载对应自己当前Chrome版本只需看第一个小数点前的大版本号如122的macOS版本的Chromedriverhttps://googlechromelabs.github.io/chrome-for-testing并将chromedriver文件放在上面提到的Contents/Resources/app文件夹内更名并替换掉“chromedriver_mac64”文件即可使软件恢复正常使用。
如果使用过程中发现其他问题请到Github Issues页面提issue。

View File

@ -1,6 +1,26 @@
由于MacOS复杂的安全性设置初次打开软件会显示未验证开发者从而不允许打开的问题参考以下视频来查看MacOS版本如何打开软件和执行任务https://www.bilibili.com/video/BV1E34y137fT/
由于MacOS复杂的安全性设置初次打开软件会显示未验证开发者从而不允许打开的问题通过以下方式来解锁:
主要步骤如下:
1. 打开系统terminal命令行窗口。
2. 切换到EasySpider软件目录
cd ~/Downloads/EasySpider_MacOS
3. 在EasySpider目录下使用以下命令运行目录下的`first_time_run.sh`脚本修改软件包属性:
bash first_time_run.sh
即可一键解锁并正常使用EasySpider包括设计阶段程序和执行阶段程序。
执行命令时如果出现类似下面的错误可以忽略,执行完成之后即可打开软件:
xattr: [Errno 13] Permission denied: 'EasySpider.app/Contents/Resources/app/node_modules/node-window-manager/build/node_gyp_bins/python3'
以下是另一种方案请参考以下视频来查看MacOS版本如何打开软件和执行任务https://www.bilibili.com/video/BV1E34y137fT/
- 设计阶段 - Apple Arm芯片版MacOS

View File

@ -1,4 +1,4 @@
欢迎将软件宣传给更多需要的朋友!
欢迎将软件宣传给更多需要的朋友和Star我们的Github仓库
官方网址: https://www.easyspider.cn
@ -6,104 +6,17 @@
10.x版本MacOS请下载v0.2.0版本使用。
软件开源代码Github库地址https://github.com/NaiboWang/EasySpider
官方文档地址https://github.com/NaiboWang/EasySpider/wiki
视频教程https://www.bilibili.com/video/BV1th411A7ey/
可以从其他机器导入任务只需要把其他机器的tasks文件夹里的.json文件放入/Users/你的用户名/Library/Application Support/EasySpider/tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意两个文件夹里的.json文件只支持命名为大于0的数字。
可通过以下命令快速进入tasks文件夹
cd /Users/$(whoami)/Library/Application\ Support/EasySpider/tasks
open .
如果需要按p键暂停和继续任务的执行,需要赋予程序键盘监控权限。
======版本更新说明======
v0.3.2以上版本更新说明请查看Github Release Pages页面https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## 更新说明
1. 选中子元素操作可删除字段并在浏览器中实时取消标记被删除的字段。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/e016c832-6ff9-4814-b86c-38787e73aa30" width=50% />
2. 选中子元素增加选择模式,可以只选择所有块都有的子元素,或者所有块中和第一个选中的块相同的子元素。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/0082b11d-96bc-43f1-acdb-8280decb48b4" width=50% />
3. 输入文字和打开网页选项中可以使用最后一次提取到的字段值**作为变量**进行文字输入,用`Field["字段名"]`表示此变量。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/d81cd082-e01a-490e-85f7-9baac93510d8)
4. 可下载文件如PDF。
5. 修复打开后有可能会白屏10秒左右的Bug使得在内网暗网以及任意局域网都可以使用软件。
6. 修复提取当前页面URL和标题时可能提取不到的bug。
7. 修复OCR识别可能提取不到的bug。
8. 提取逻辑更新为每采集10条本地保存一次。
9. 修改任务时默认锚点位置为任务流程的最后操作后。
10. 更新Chrome版本为114。
------V0.3.1------
如果下载速度慢,可以考虑中国境内下载地址:[中国境内下载地址](https://github.com/NaiboWang/EasySpider/releases/download/v0.3.0/Download_Link_Address_in_China_Mainland.txt)。
### 强烈建议大家观看新特性讲解视频
B站最新版特性视频已上传新视频非常有用推荐大家观看。
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)
[如何同时执行多个任务(并行多开)](https://www.bilibili.com/video/BV13c411G7LE/)
[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)
[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)
[如何对元素和网页截图及(无头模式)命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)
[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)
注意v0.3.1版本任务task文件夹内`.json`文件和之前所有版本均不兼容请重新设计v0.3.1版本任务。
## 更新说明
1. 自定义操作:
- 可以在任务流程中**执行自定义脚本**,包括在浏览器中**执行Javascript指令**以及**操作系统级别的脚本调用**并可**得到命令返回值并记录**,大大扩展了可操作空间。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/06e63a06-328d-4339-b40b-2d57c94cee66)
- 在每一个操作执行前和执行后都可以指定执行一段针对当前定位元素的JavaScript指令。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/dde64388-5668-40ff-951e-fb8f60655c49" height=50% width=50%>
2. **判断条件和循环条件**中同样增加了**执行自定义脚本**并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定自定义操作可以操作循环内元素。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/9dea0564-1a1c-487d-9fa4-427c5e284796)
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/5ce7cf50-e5c9-4714-a83b-9c65934e9c68" width=50%></img>
3. 可同时生成多种XPath供用户选择并**预装了XPath Helper扩展**供大家调试XPath。
4. 增加采集元素背景图片地址当前页面标题当前页面URL地址功能。
5. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
6. 增加下载图片功能。
7. 增加OCR识别元素功能使用此功能需首先自行安装Tesseract库[https://blog.csdn.net/u010454030/article/details/80515501](https://blog.csdn.net/u010454030/article/details/80515501)
8. 可直接提取对元素执行JavaScript代码后的返回值实现如正则表达式获得元素背景颜色等功能。
9. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/c0b2bec1-2a97-4516-930e-1b310697212b" width=50%></img>
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/42cc0009-00d1-4c5c-af47-0fa6340fba80)
10. 大幅增加使用提示和说明使软件更易用如增加了iframe标签的处理方式说明各个选项的参数意义以及循环项XPath的修改说明等等
11. 执行命令时增加了如何用命令行执行任务的提示:[https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction](https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction)。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/a9e774df-e345-4d51-b7c9-2c4dac0ec624)
12. 增加并行多开模式。
13. 增加无头模式,即无浏览器界面模式配置。
14. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
15. 修复了条件分支没有无条件分支时会卡死的问题。
16. 修复了保存任务后会输入框卡死的问题。
17. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
18. 增加了鼠标移动到元素功能。
19. 找不到元素时会提示。
20. 修复网页滚动Bug。
21. 增加新增提取数据字段操作。
22. 任务名称初始化为第一次进入页面的标题值。
23. 增加版本更新提示。
24. 应要求增加出品方信息。
25. 更新chrome版本为113。

View File

@ -23,7 +23,7 @@ For more complex operations, please download the source code and compile it for
"""
# 请在下面编写你的代码,不要有代码缩进!!! | Please write your code below, do not indent the code!!!
print(globals())
# 导包 | Import packages
from selenium.common.exceptions import ElementClickInterceptedException
@ -56,3 +56,20 @@ finally:
print("All parameters:", self.outputParameters)
print(test(3))
print("执行完毕|Execution completed")
import time
time.sleep(3)
def new_line(outputParameters, maxViewLength, record):
line = []
print("Use this function to print a new line in the console")
i = 0
for value in outputParameters.values():
line.append(value)
if record[i]:
print(value[:maxViewLength], " ", end="")
i += 1
print("")
return line
new_line(self.outputParameters, 10, [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])

View File

@ -1,86 +1,15 @@
Official Site: https://www.easyspider.net
Welcome to promote this software to other friends.
Welcome to promote this software to other friends and star our Github Repository!
This version is for Windows 7 and above, including both 32-bit and 64-bit version. Please note that this version of the Chrome browser will always remain at version 109 and will not update with Chrome updates (for compatibility with Windows 7). Therefore, if you want to use the latest version of the Chrome browser for data scraping, please run the x64 version of EasySpider on Windows 10 x64 or higher systems.
This version is for Windows 7 and above, including both 32-bit and 64-bit version. Please note that this version of the Chrome browser will always remain at version 109 and will not update with Chrome updates (for compatibility with Windows 7). Therefore, if you want to use the latest version of the Chrome browser for data scraping, please run the x64 version of EasySpider on Windows 10 x64 or higher systems. There is no version support for Windows Server 2012 and below. These systems require manual compilation for execution.
Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp
The software's open-source code repository on GitHub: https://github.com/NaiboWang/EasySpider
Official documentation can be found at: https://github.com/NaiboWang/EasySpider/wiki
The software is totally not trojan/virus! If mistaken by antivirus software such as Windows Defender as a virus, please recover it, or open "EasySpider.bat" to run our software instead.
Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders.
======Version New Features======
Please see more new features for version greater than v0.3.2 at github release page: https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## Update Instruction
1. Selected child element operations can delete fields and unmark deleted fields in real-time in the browser.
2. Selecting child elements adds a selection mode that allows you to choose only the child elements that are present in all blocks or the child elements that are the same as the first selected block.
3. In the text input and webpage open options, you can use the extracted field value as a variable for text input, represented by Field["field_name"].
4. Files can be downloaded, such as PDF files.
5. Fixed a bug where the software could display a blank screen for about 10 seconds after opening, making it usable in intranets, darknets, and any local network.
6. Fixed a bug where the current page URL and title could not be extracted.
7. Fixed a bug where OCR recognition could fail to extract information.
8. Updated extraction logic to save locally every 10 records collected.
9. When modifying a task, the default anchor position is set to after the last operation in the task flow.
10. Updated Chrome version to 114.
-----v0.3.1-----
## Update Instruction
1. Advanced Operations:
- Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations.
- Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element.
2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
4. Added the functionality to extract the background image URL of elements, current page title, and current page URL.
5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
6. Added the functionality to download images.
7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
11. Added instructions on how to execute tasks from the command line.
12. Added headless mode configuration, allowing the software to run without a browser interface.
13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
15. Fixed the issue where the input box would freeze after saving a task.
16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
17. Added the functionality to move the mouse to an element.
18. Displays a prompt when an element cannot be found.
19. Fixed the webpage scrolling bug.
20. The task name is initialized with the value of the page title upon the first visit.
21. Added version update prompts.
22. Added the information of the publisher as requested.
23. Updated Chrome version to 113.
Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders.

File diff suppressed because one or more lines are too long

View File

@ -1 +1 @@
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"12/7/2023, 2:56:47 AM","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}}]}
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"2024-01-05 22:08:46","version":"0.6.0","saveThreshold":10,"quitWaitTime":3,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"TTT","dataWriteMode":3,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"},{"id":1,"name":"loopTimes_1","nodeId":5,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":10,"value":10}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,5],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":2,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":3,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":2,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":4,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":3,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":2,"index":5,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[2],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"//body","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":10,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

View File

@ -1 +1 @@
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"07/12/2023, 03:43:34","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"desc":"https://www.zhihu.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}}]}
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"2023-12-27 20:05:50","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"知了个乎","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"},{"id":1,"name":"loopTimes_1","nodeId":4,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":0,"value":0}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,4,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":2,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":4,"index":3,"parentId":3,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}},{"id":2,"index":4,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

View File

@ -1 +1 @@
{"id":70,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/24/2023, 8:21:45 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":7,"title":"移动到元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":2,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"allXPaths":"","loopType":1}}]}
{"id":-2,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/24/2023, 8:21:45 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":7,"title":"移动到元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":2,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"allXPaths":"","loopType":1}}]}

File diff suppressed because one or more lines are too long

View File

@ -1,102 +1,15 @@
欢迎将软件宣传给更多需要的朋友!
欢迎将软件宣传给更多需要的朋友和Star我们的Github仓库
官方网址: https://www.easyspider.cn
支持Windows 7及以上版本包括32位系统和64位系统。注意此版本的Chrome浏览器永远都是109不会随着Chrome更新而更新为了兼容Win 7系统因此如果想用最新版Chrome浏览器采集数据请在Windows 10 x64及以上系统上运行x64版本的EasySpider。
支持Windows 7及以上版本包括32位系统和64位系统。注意此版本的Chrome浏览器永远都是109不会随着Chrome更新而更新为了兼容Win 7系统因此如果想用最新版Chrome浏览器采集数据请在Windows 10 x64及以上系统上运行x64版本的EasySpider。无任何版本支持Windows Server 2012及以下版本系统这些系统下需要自行编译运行。
软件开源代码Github库地址https://github.com/NaiboWang/EasySpider
官方文档地址https://github.com/NaiboWang/EasySpider/wiki
视频教程https://www.bilibili.com/video/BV1th411A7ey/
这个软件绝对不是特洛伊木马/病毒如果被像Windows Defender这样的杀毒软件误认为是病毒请进行恢复或者打开“EasySpider.bat”来运行我们的软件。
可以从其他机器导入任务只需要把其他机器的tasks文件夹里的.json文件放入此目录的tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意两个文件夹里的.json文件只支持命名为大于0的数字。
======版本更新说明======
v0.3.2以上版本更新说明请查看Github Release Pages页面https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## 更新说明
1. 选中子元素操作可删除字段并在浏览器中实时取消标记被删除的字段。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/e016c832-6ff9-4814-b86c-38787e73aa30" width=50% />
2. 选中子元素增加选择模式,可以只选择所有块都有的子元素,或者所有块中和第一个选中的块相同的子元素。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/0082b11d-96bc-43f1-acdb-8280decb48b4" width=50% />
3. 输入文字和打开网页选项中可以使用最后一次提取到的字段值**作为变量**进行文字输入,用`Field["字段名"]`表示此变量。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/d81cd082-e01a-490e-85f7-9baac93510d8)
4. 可下载文件如PDF。
5. 修复打开后有可能会白屏10秒左右的Bug使得在内网暗网以及任意局域网都可以使用软件。
6. 修复提取当前页面URL和标题时可能提取不到的bug。
7. 修复OCR识别可能提取不到的bug。
8. 提取逻辑更新为每采集10条本地保存一次。
9. 修改任务时默认锚点位置为任务流程的最后操作后。
10. 更新Chrome版本为114。
-----v0.3.1-----
### 强烈建议大家观看新特性讲解视频
B站最新版特性视频已上传新视频非常有用推荐大家观看。
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)
[如何同时执行多个任务(并行多开)](https://www.bilibili.com/video/BV13c411G7LE/)
[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)
[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)
[如何对元素和网页截图及(无头模式)命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)
[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)
注意v0.3.1版本任务task文件夹内`.json`文件和之前所有版本均不兼容请重新设计v0.3.1版本任务。
## 更新说明
1. 自定义操作:
- 可以在任务流程中**执行自定义脚本**,包括在浏览器中**执行Javascript指令**以及**操作系统级别的脚本调用**并可**得到命令返回值并记录**,大大扩展了可操作空间。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/06e63a06-328d-4339-b40b-2d57c94cee66)
- 在每一个操作执行前和执行后都可以指定执行一段针对当前定位元素的JavaScript指令。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/dde64388-5668-40ff-951e-fb8f60655c49" height=50% width=50%>
2. **判断条件和循环条件**中同样增加了**执行自定义脚本**并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定自定义操作可以操作循环内元素。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/9dea0564-1a1c-487d-9fa4-427c5e284796)
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/5ce7cf50-e5c9-4714-a83b-9c65934e9c68" width=50%></img>
3. 可同时生成多种XPath供用户选择并**预装了XPath Helper扩展**供大家调试XPath。
4. 增加采集元素背景图片地址当前页面标题当前页面URL地址功能。
5. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
6. 增加下载图片功能。
7. 增加OCR识别元素功能使用此功能需首先自行安装Tesseract库[https://blog.csdn.net/u010454030/article/details/80515501](https://blog.csdn.net/u010454030/article/details/80515501)
8. 可直接提取对元素执行JavaScript代码后的返回值实现如正则表达式获得元素背景颜色等功能。
9. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/c0b2bec1-2a97-4516-930e-1b310697212b" width=50%></img>
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/42cc0009-00d1-4c5c-af47-0fa6340fba80)
10. 大幅增加使用提示和说明使软件更易用如增加了iframe标签的处理方式说明各个选项的参数意义以及循环项XPath的修改说明等等
11. 执行命令时增加了如何用命令行执行任务的提示:[https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction](https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction)。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/a9e774df-e345-4d51-b7c9-2c4dac0ec624)
12. 增加并行多开模式。
13. 增加无头模式,即无浏览器界面模式配置。
14. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
15. 修复了条件分支没有无条件分支时会卡死的问题。
16. 修复了保存任务后会输入框卡死的问题。
17. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
18. 增加了鼠标移动到元素功能。
19. 找不到元素时会提示。
20. 修复网页滚动Bug。
21. 增加新增提取数据字段操作。
22. 任务名称初始化为第一次进入页面的标题值。
23. 增加版本更新提示。
24. 应要求增加出品方信息。
25. 更新chrome版本为113。

View File

@ -5,9 +5,11 @@ import copy
import platform
import shutil
import string
import threading
# import undetected_chromedriver as uc
from utils import detect_optimizable, download_image, extract_text_from_html, get_output_code, isnotnull, lowercase_tags_in_xpath, myMySQL, new_line, \
on_press_creator, on_release_creator, readCode, replace_field_values, send_email, split_text_by_lines, write_to_csv, write_to_excel, write_to_json
on_press_creator, on_release_creator, readCode, rename_downloaded_file, replace_field_values, send_email, split_text_by_lines, write_to_csv, write_to_excel, write_to_json
from constants import WriteMode, DataWriteMode, GraphOption
from myChrome import MyChrome
from threading import Thread, Event
from PIL import Image
@ -30,7 +32,6 @@ from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from pynput.keyboard import Key, Listener
from datetime import datetime
import io # 遇到错误退出时应执行的代码
import json
@ -75,10 +76,7 @@ class BrowserThread(Thread):
def __init__(self, browser_t, id, service, version, event, saveName, config, option):
Thread.__init__(self)
self.logs = io.StringIO()
try:
self.log = bool(service["recordLog"])
except:
self.log = True
self.log = bool(service.get("recordLog", True))
self.browser = browser_t
self.option = option
self.config = config
@ -86,22 +84,13 @@ class BrowserThread(Thread):
self.totalSteps = 0
self.id = id
self.event = event
try:
self.saveName = service["saveName"] # 保存文件的名字
except:
now = datetime.now()
# 将时间格式化为精确到秒的字符串
self.saveName = now.strftime("%Y_%m_%d_%H_%M_%S")
now = datetime.now()
self.saveName = service.get("saveName", now.strftime("%Y_%m_%d_%H_%M_%S")) # 保存文件的名字
self.OUTPUT = ""
self.SAVED = False
self.BREAK = False
self.CONTINUE = False
try:
maximizeWindow = service["maximizeWindow"]
except:
maximizeWindow = 0
if maximizeWindow == 1:
self.browser.maximize_window()
self.browser.maximize_window() if service.get("maximizeWindow") == 1 else ...
# 名称设定
if saveName != "": # 命令行覆盖保存名称
self.saveName = saveName # 保存文件的名字
@ -112,19 +101,23 @@ class BrowserThread(Thread):
self.print_and_log("Save Name for task ID", id, "is:", self.saveName)
if not os.path.exists("Data/Task_" + str(id)):
os.mkdir("Data/Task_" + str(id))
if not os.path.exists("Data/Task_" + str(id) + "/" + self.saveName):
os.mkdir("Data/Task_" + str(id) + "/" +
self.saveName) # 创建保存文件夹用来保存截图
self.downloadFolder = "Data/Task_" + str(id) + "/" + self.saveName
if not os.path.exists(self.downloadFolder):
os.mkdir(self.downloadFolder) # 创建保存文件夹用来保存截图和文件
if not os.path.exists(self.downloadFolder + "/files"):
os.mkdir(self.downloadFolder + "/files")
if not os.path.exists(self.downloadFolder + "/images"):
os.mkdir(self.downloadFolder + "/images")
self.getDataStep = 0
self.startSteps = 0
try:
startFromExit = service["startFromExit"] # 从上次退出的步骤开始
if startFromExit == 1:
if service.get("startFromExit", 0) == 1:
with open("Data/Task_" + str(self.id) + "/" + self.saveName + '_steps.txt', 'r',
encoding='utf-8-sig') as file_obj:
self.startSteps = int(file_obj.read()) # 读取已执行步数
except:
pass
except Exception as e:
self.print_and_log(f"读取steps.txt失败原因{str(e)}")
if self.startSteps != 0:
self.print_and_log("此模式下任务ID", self.id, "将从上次退出的步骤开始执行,之前已采集条数为",
self.startSteps, "条。")
@ -132,7 +125,7 @@ class BrowserThread(Thread):
"will start from the last step, before we already collected", self.startSteps, " items.")
else:
self.print_and_log("此模式下任务ID", self.id,
"将从头F开始执行,如果需要从上次退出的步骤开始执行,请在保存任务时设置是否从上次保存位置开始执行为“是”。")
"将从头开始执行,如果需要从上次退出的步骤开始执行,请在保存任务时设置是否从上次保存位置开始执行为“是”。")
self.print_and_log("In this mode, task ID", self.id,
"will start from the beginning, if you want to start from the last step, please set the option 'start from the last step' to 'yes' when saving the task.")
stealth_path = driver_path[:driver_path.find(
@ -140,78 +133,83 @@ class BrowserThread(Thread):
with open(stealth_path, 'r') as f:
js = f.read()
self.print_and_log("Loading stealth.min.js")
self.browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {
'source': js}) # TMALL 反扒
self.browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {'source': js}) # TMALL 反扒
self.browser.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
WebDriverWait(self.browser, 10)
self.browser.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
path = os.path.join(os.path.abspath("./"), "Data", "Task_" + str(self.id))
path = os.path.join(os.path.abspath("./"), "Data", "Task_" + str(self.id), self.saveName, "files")
self.paramss = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': path}}
self.browser.execute("send_command", self.paramss) # 下载地址改变
self.browser.execute("send_command", self.paramss) # 下载目录改变
self.monitor_event = threading.Event()
self.monitor_thread = threading.Thread(target=rename_downloaded_file, args=(path, self.monitor_event)) #path后面的逗号不能省略是元组固定写法
self.monitor_thread.start()
# self.browser.get('about:blank')
self.procedure = service["graph"] # 程序执行流程
try:
self.maxViewLength = service["maxViewLength"] # 最大显示长度
except:
self.maxViewLength = 15
try:
self.outputFormat = service["outputFormat"] # 输出格式
except:
self.outputFormat = "csv"
try:
self.task_version = service["version"] # 任务版本
if service["version"] >= "0.3.1": # 0.3.1及以上版本以上的EasySpider兼容从0.3.1版本开始的所有版本
pass
else: # 0.3.1以下版本的EasySpider不兼容0.3.1及以上版本的EasySpider
if service["version"] != version:
self.print_and_log("版本不一致,请使用" +
service["version"] + "版本的EasySpider运行该任务")
self.print_and_log("Version not match, please use EasySpider " +
service["version"] + " to run this task!")
self.browser.quit()
sys.exit()
except: # 0.2.0版本没有version字段所以直接退出
self.maxViewLength = service.get("maxViewLength", 15) # 最大显示长度
self.outputFormat = service.get("outputFormat", "csv") # 输出格式
self.save_threshold = service.get("saveThreshold", 10) # 保存最低阈值
self.dataWriteMode = service.get("dataWriteMode", DataWriteMode.Append.value) # 数据写入模式1为追加2为覆盖3为重命名文件
self.task_version = service.get("version", "") # 任务版本
if not self.task_version:
self.print_and_log("版本不一致请使用v0.2.0版本的EasySpider运行该任务")
self.print_and_log(
"Version not match, please use EasySpider v0.2.0 to run this task!")
self.print_and_log("Version not match, please use EasySpider v0.2.0 to run this task!")
self.browser.quit()
sys.exit()
try:
self.save_threshold = service["saveThreshold"] # 保存最低阈值
except:
self.save_threshold = 10
try:
self.links = list(
filter(isnotnull, service["links"].split("\n"))) # 要执行的link的列表
except:
if self.task_version >= "0.3.1": # 0.3.1及以上版本以上的EasySpider兼容从0.3.1版本开始的所有版本
pass
elif self.task_version != version: # 0.3.1以下版本的EasySpider不兼容0.3.1及以上版本的EasySpider
self.print_and_log(f"版本不一致,请使用{self.task_version}版本的EasySpider运行该任务")
self.print_and_log(f"Version not match, please use EasySpider {self.task_version} to run this task!")
self.browser.quit()
sys.exit()
service_links = service.get("links")
if service_links:
self.links = list(filter(isnotnull, service_links.split("\n"))) # 要执行的link的列表
else:
self.links = list(filter(isnotnull, service["url"])) # 要执行的link
self.OUTPUT = [] # 采集的数据
try:
self.dataWriteMode = service["dataWriteMode"] # 数据写入模式1为追加2为覆盖
except:
self.dataWriteMode = 1
if self.outputFormat == "csv" or self.outputFormat == "txt" or self.outputFormat == "xlsx" or self.outputFormat == "json":
if self.dataWriteMode == 2 and os.path.exists("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat):
os.remove("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat)
self.writeMode = 1 # 写入模式0为新建1为追加
if self.outputFormat == "csv" or self.outputFormat == "txt" or self.outputFormat == "xlsx":
if not os.path.exists("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat):
if self.outputFormat in ["csv", "txt", "xlsx", "json"]:
if os.path.exists("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat):
if self.dataWriteMode == DataWriteMode.Cover.value:
os.remove("Data/Task_" + str(self.id) + "/" + self.saveName + '.' + self.outputFormat)
elif self.dataWriteMode == DataWriteMode.Rename.value:
i = 2
while os.path.exists("Data/Task_" + str(self.id) + "/" + self.saveName + '_' + str(i) + '.' + self.outputFormat):
i = i + 1
self.saveName = self.saveName + '_' + str(i)
self.print_and_log("文件已存在,已重命名为", self.saveName)
self.writeMode = WriteMode.Create.value # 写入模式0为新建1为追加
if self.outputFormat in ['csv', 'txt', 'xlsx']:
if not os.path.exists(f"Data/Task_{str(self.id)}/{self.saveName}.{self.outputFormat}"):
self.OUTPUT.append([]) # 添加表头
self.writeMode = 0
self.writeMode = WriteMode.Create.value
elif self.outputFormat == "json":
self.writeMode = 3 # JSON模式无需判断是否存在文件
self.writeMode = WriteMode.Json.value # JSON模式无需判断是否存在文件
elif self.outputFormat == "mysql":
self.mysql = myMySQL(config["mysql_config_path"])
self.mysql.create_table(self.saveName, service["outputParameters"], remove_if_exists=self.dataWriteMode == 2)
self.writeMode = 2
if self.writeMode == 0:
self.mysql.create_table(self.saveName, service["outputParameters"],
remove_if_exists=self.dataWriteMode == DataWriteMode.Cover.value)
self.writeMode = WriteMode.MySQL.value # MySQL模式
if self.writeMode == WriteMode.Create.value:
self.print_and_log("新建模式|Create Mode")
elif self.writeMode == 1:
elif self.writeMode == WriteMode.Append.value:
self.print_and_log("追加模式|Append Mode")
elif self.writeMode == 2:
elif self.writeMode == WriteMode.MySQL.value:
self.print_and_log("MySQL模式|MySQL Mode")
elif self.writeMode == 3:
elif self.writeMode == WriteMode.Json.value:
self.print_and_log("JSON模式|JSON Mode")
self.containJudge = service["containJudge"] # 是否含有判断语句
self.outputParameters = {}
self.service = service
@ -224,191 +222,140 @@ class BrowserThread(Thread):
if param["name"] not in self.outputParameters.keys():
self.outputParameters[param["name"]] = ""
self.dataNotFoundKeys[param["name"]] = False
try:
self.outputParametersTypes.append(param["type"])
except:
self.outputParametersTypes.append("text")
try:
self.outputParametersRecord.append(
bool(param["recordASField"]))
except:
self.outputParametersRecord.append(True)
self.outputParametersTypes.append(param.get("type", "text"))
self.outputParametersRecord.append(bool(param.get("recordASField", True)))
# 文件叠加的时候不添加表头
if self.outputFormat == "csv" or self.outputFormat == "txt" or self.outputFormat == "xlsx":
if self.writeMode == 0:
self.OUTPUT[0].append(param["name"])
if self.outputFormat in ["csv", "txt", "xlsx"] and self.writeMode == WriteMode.Create.value:
self.OUTPUT[0].append(param["name"])
self.urlId = 0 # 全局记录变量
self.preprocess() # 预处理,优化提取数据流程
try:
self.inputExcel = service["inputExcel"] # 输入Excel
except:
self.inputExcel = ""
self.inputExcel = service.get("inputExcel", "") # 输入Excel
self.readFromExcel() # 读取Excel获得参数值
# 检测如果没有复杂的操作,优化提取数据流程
def preprocess(self):
for node in self.procedure:
try:
iframe = node["parameters"]["iframe"]
except:
node["parameters"]["iframe"] = False
for index_node, node in enumerate(self.procedure):
parameters: dict = node["parameters"]
iframe = parameters.get('iframe')
option = node["option"]
try:
node["parameters"]["xpath"] = lowercase_tags_in_xpath(
node["parameters"]["xpath"])
except:
pass
try:
node["parameters"]["waitElementIframeIndex"] = int(
node["parameters"]["waitElementIframeIndex"])
except:
node["parameters"]["waitElement"] = ""
node["parameters"]["waitElementTime"] = 10
node["parameters"]["waitElementIframeIndex"] = 0
if node["option"] == 1: # 打开网页操作
try:
cookies = node["parameters"]["cookies"]
except:
node["parameters"]["cookies"] = ""
elif node["option"] == 2: # 点击操作
try:
alertHandleType = node["parameters"]["alertHandleType"]
except:
node["parameters"]["alertHandleType"] = 0
if node["parameters"]["useLoop"]:
parameters["iframe"] = False if not iframe else parameters.get('iframe', False)
if parameters.get("xpath"):
parameters["xpath"] = lowercase_tags_in_xpath(parameters["xpath"])
if parameters.get("waitElementIframeIndex"):
parameters["waitElementIframeIndex"] = int(parameters["waitElementIframeIndex"])
else:
parameters["waitElement"] = ""
parameters["waitElementTime"] = 10
parameters["waitElementIframeIndex"] = 0
if option == GraphOption.Get.value: # 打开网页操作
parameters["cookies"] = parameters.get("cookies", "")
elif option == GraphOption.Click.value: # 点击操作
parameters["alertHandleType"] = parameters.get("alertHandleType", 0)
if parameters.get("useLoop"):
if self.task_version <= "0.3.5":
# 0.3.5及以下版本的EasySpider下的循环点击不支持相对XPath
node["parameters"]["xpath"] = ""
self.print_and_log("您的任务版本号为" + self.task_version +
"循环点击不支持相对XPath写法已自动切换为纯循环的XPath")
elif node["option"] == 3: # 提取数据操作
node["parameters"]["recordASField"] = 0
try:
params = node["parameters"]["params"]
except:
node["parameters"]["params"] = node["parameters"]["paras"] # 兼容0.5.0及以下版本的EasySpider
params = node["parameters"]["params"]
try:
clear = node["parameters"]["clear"]
except:
node["parameters"]["clear"] = 0
try:
newLine = node["parameters"]["newLine"]
except:
node["parameters"]["newLine"] = 1
parameters["xpath"] = ""
self.print_and_log(f"您的任务版本号为{self.task_version}循环点击不支持相对XPath写法已自动切换为纯循环的XPath")
elif option == GraphOption.Extract.value: # 提取数据操作
parameters["recordASField"] = 0
parameters["params"] = parameters.get("params", parameters.get("paras")) # 兼容0.5.0及以下版本的EasySpider
parameters["clear"] = parameters.get("clear", 0)
parameters["newLine"] = parameters.get("newLine", 1)
params = parameters["params"]
for param in params:
try:
iframe = param["iframe"]
except:
param["iframe"] = False
try:
param["iframe"] = param.get("iframe", False)
if param.get("relativeXPath"):
param["relativeXPath"] = lowercase_tags_in_xpath(param["relativeXPath"])
except:
pass
try:
node["parameters"]["recordASField"] = param["recordASField"]
except:
node["parameters"]["recordASField"] = 1
try:
splitLine = int(param["splitLine"])
except:
param["splitLine"] = 0
if param["contentType"] == 8:
self.print_and_log(
"默认的ddddocr识别功能如果觉得不好用可以自行修改源码get_content函数->contentType == 8的位置换成自己想要的OCR模型然后自己编译运行或者可以先设置采集内容类型为“元素截图”把图片保存下来然后用自定义操作调用自己写的程序程序的功能是读取这个最新生成的图片然后用好用的模型如PaddleOCR把图片识别出来然后把返回值返回给程序作为参数输出。")
self.print_and_log(
"If you think the default ddddocr function is not good enough, you can modify the source code get_content function -> contentType == 8 position to your own OCR model and then compile and run it; or you can first set the content type of the crawler to \"Element Screenshot\" to save the picture, and then call your own program with custom operations. The function of the program is to read the latest generated picture, then use a good model, such as PaddleOCR to recognize the picture, and then return the return value as a parameter output to the program.")
parameters["recordASField"] = param.get("recordASField", 1)
param["splitLine"] = 0 if not param.get("splitLine") else param.get("splitLine")
if param.get("contentType") == 8:
self.print_and_log("默认的ddddocr识别功能如果觉得不好用可以自行修改源码get_content函数->contentType =="
"8的位置换成自己想要的OCR模型然后自己编译运行或者可以先设置采集内容类型为“元素截图”把图片"
"保存下来,然后用自定义操作调用自己写的程序,程序的功能是读取这个最新生成的图片,然后用好用"
"的模型如PaddleOCR把图片识别出来然后把返回值返回给程序作为参数输出。")
self.print_and_log("If you think the default ddddocr function is not good enough, you can "
"modify the source code get_content function -> contentType == 8 position "
"to your own OCR model and then compile and run it; or you can first set "
"the content type of the crawler to \"Element Screenshot\" to save the "
"picture, and then call your own program with custom operations. The "
"function of the program is to read the latest generated picture, then use "
"a good model, such as PaddleOCR to recognize the picture, and then return "
"the return value as a parameter output to the program.")
param["optimizable"] = detect_optimizable(param)
elif node["option"] == 4: # 输入文字
try:
index = node["parameters"]["index"] # 索引值
except:
node["parameters"]["index"] = 0
elif node["option"] == 5: # 自定义操作
try:
clear = node["parameters"]["clear"]
except:
node["parameters"]["clear"] = 0
try:
newLine = node["parameters"]["newLine"]
except:
node["parameters"]["newLine"] = 1
elif node["option"] == 7: # 移动到元素
if node["parameters"]["useLoop"]:
if self.task_version <= "0.3.5":
# 0.3.5及以下版本的EasySpider下的循环点击不支持相对XPath
node["parameters"]["xpath"] = ""
self.print_and_log("您的任务版本号为" + self.task_version +
"循环点击不支持相对XPath写法已自动切换为纯循环的XPath")
elif node["option"] == 8: # 循环操作
try:
exitElement = node["parameters"]["exitElement"]
if exitElement == "":
node["parameters"]["exitElement"] = "//body"
except:
node["parameters"]["exitElement"] = "//body"
node["parameters"]["quickExtractable"] = False # 是否可以快速提取
try:
skipCount = node["parameters"]["skipCount"]
except:
node["parameters"]["skipCount"] = 0
elif option == GraphOption.Input.value: # 输入文字
parameters['index'] = parameters.get('index', 0)
elif option == GraphOption.Custom.value: # 自定义操作
parameters['clear'] = parameters.get('clear', 0)
parameters['newLine'] = parameters.get('newLine', 1)
elif option == GraphOption.Move.value: # 移动到元素
if parameters.get('useLoop'):
if self.task_version <= "0.3.5": # 0.3.5及以下版本的EasySpider下的循环点击不支持相对XPath
parameters["xpath"] = ""
self.print_and_log(f"您的任务版本号为{self.task_version}循环点击不支持相对XPath写法已自动切换为纯循环的XPath")
elif option == GraphOption.Loop.value: # 循环操作
parameters['exitElement'] = "//body" if not parameters.get('exitElement') or parameters.get('exitElement') == "" else parameters.get('exitElement')
parameters["quickExtractable"] = False # 是否可以快速提取
parameters['skipCount'] = parameters.get('skipCount', 0)
# 如果(不)固定元素列表循环中只有一个提取数据操作,且提取数据操作的提取内容为元素截图,那么可以快速提取
if len(node["sequence"]) == 1 and self.procedure[node["sequence"][0]]["option"] == 3 and (int(node["parameters"]["loopType"]) == 1 or int(node["parameters"]["loopType"]) == 2):
try:
params = self.procedure[node["sequence"][0]]["parameters"]["params"]
except:
params = self.procedure[node["sequence"][0]]["parameters"]["paras"] # 兼容0.5.0及以下版本的EasySpider
try:
waitElement = self.procedure[node["sequence"][0]]["parameters"]["waitElement"]
except:
waitElement = ""
if node["parameters"]["iframe"]:
node["parameters"]["quickExtractable"] = False # 如果是iframe那么不可以快速提取
if len(node["sequence"]) == 1 and self.procedure[node["sequence"][0]]["option"] == 3 \
and (int(node["parameters"]["loopType"]) == 1 or int(node["parameters"]["loopType"]) == 2):
params = self.procedure[node["sequence"][0]].get("parameters").get("params")
if not params:
params = self.procedure[node["sequence"][0]]["parameters"]["paras"] # 兼容0.5.0及以下版本的EasySpider
waitElement = self.procedure[node["sequence"][0]]["parameters"].get("waitElement", "")
if parameters["iframe"]:
parameters["quickExtractable"] = False # 如果是iframe那么不可以快速提取
else:
node["parameters"]["quickExtractable"] = True # 先假设可以快速提取
if node["parameters"]["skipCount"] > 0:
node["parameters"]["quickExtractable"] = False # 如果有跳过的元素,那么不可以快速提取
parameters["quickExtractable"] = True # 先假设可以快速提取
if parameters["skipCount"] > 0:
parameters["quickExtractable"] = False # 如果有跳过的元素,那么不可以快速提取
for param in params:
optimizable = detect_optimizable(param, ignoreWaitElement=False, waitElement=waitElement)
try:
iframe = param["iframe"]
except:
param["iframe"] = False
if param["iframe"] and not param["relative"]: # 如果是iframe那么不可以快速提取
param['iframe'] = param.get('iframe', False)
if param["iframe"] and not param["relative"]: # 如果是iframe那么不可以快速提取
optimizable = False
if not optimizable: # 如果有一个不满足优化条件,那么就不能快速提取
node["parameters"]["quickExtractable"] = False
if not optimizable: # 如果有一个不满足优化条件,那么就不能快速提取
parameters["quickExtractable"] = False
break
if node["parameters"]["quickExtractable"]:
self.print_and_log("循环操作<" + node["title"] + ">可以快速提取数据")
self.print_and_log("Loop operation <" + node["title"] + "> can extract data quickly")
try:
node["parameters"]["clear"] = self.procedure[node["sequence"][0]]["parameters"]["clear"]
except:
node["parameters"]["clear"] = 0
try:
node["parameters"]["newLine"] = self.procedure[node["sequence"][0]]["parameters"]["newLine"]
except:
node["parameters"]["newLine"] = 1
if int(node["parameters"]["loopType"]) == 1: # 不固定元素列表
if parameters["quickExtractable"]:
self.print_and_log(f"循环操作<{node['title']}>可以快速提取数据")
self.print_and_log(f"Loop operation <{node['title']}> can extract data quickly")
parameters["clear"] = self.procedure[node["sequence"][0]]["parameters"].get("clear", 0)
parameters["newLine"] = self.procedure[node["sequence"][0]]["parameters"].get("newLine", 1)
if int(node["parameters"]["loopType"]) == 1: # 不固定元素列表
node["parameters"]["baseXPath"] = node["parameters"]["xpath"]
elif int(node["parameters"]["loopType"]) == 2: # 固定元素列表
elif int(node["parameters"]["loopType"]) == 2: # 固定元素列表
node["parameters"]["baseXPath"] = node["parameters"]["pathList"]
node["parameters"]["quickParams"] = []
for param in params:
content_type = ""
if param["relativeXPath"].find("/@href") >= 0 or param["relativeXPath"].find("/text()") >= 0 or param["relativeXPath"].find(
"::text()") >= 0:
if param["relativeXPath"].find("/@href") >= 0 or param["relativeXPath"].find("/text()") >= 0 \
or param["relativeXPath"].find("::text()") >= 0:
content_type = ""
elif param["nodeType"] == 2:
content_type = "//@href"
elif param["nodeType"] == 4: # 图片链接
elif param["nodeType"] == 4: # 图片链接
content_type = "//@src"
elif param["contentType"] == 1:
content_type = "/text()"
elif param["contentType"] == 0:
content_type = "//text()"
if param["relative"]: # 如果是相对XPath
if param["relative"]: # 如果是相对XPath
xpath = "." + param["relativeXPath"] + content_type
else:
xpath = param["relativeXPath"] + content_type
@ -422,6 +369,7 @@ class BrowserThread(Thread):
"nodeType": param["nodeType"],
"default": param["default"],
})
self.procedure[index_node]["parameters"] = parameters
self.print_and_log("预处理完成|Preprocess completed")
def readFromExcel(self):
@ -521,7 +469,7 @@ class BrowserThread(Thread):
"/", len(self.links))
self.executeNode(0)
self.urlId = self.urlId + 1
files = os.listdir("Data/Task_" + str(self.id) + "/" + self.saveName)
# files = os.listdir("Data/Task_" + str(self.id) + "/" + self.saveName)
# 如果目录为空,则删除该目录
# if not files:
# os.rmdir("Data/Task_" + str(self.id) + "/" + self.saveName)
@ -538,12 +486,16 @@ class BrowserThread(Thread):
self.print_and_log(f"任务执行完毕,将在{quitWaitTime}秒后自动退出浏览器并清理临时用户目录,等待时间可在保存任务对话框中设置。")
self.print_and_log(f"The task is completed, the browser will exit automatically and the temporary user directory will be cleaned up after {quitWaitTime} seconds, the waiting time can be set in the save task dialog.")
time.sleep(quitWaitTime)
self.browser.quit()
try:
self.browser.quit()
except:
pass
self.print_and_log("正在清理临时用户目录……|Cleaning up temporary user directory...")
try:
shutil.rmtree(self.option["tmp_user_data_folder"])
except:
pass
self.monitor_event.set()
self.print_and_log("清理完成!|Clean up completed!")
self.print_and_log("您现在可以安全的关闭此窗口了。|You can safely close this window now.")
@ -753,28 +705,32 @@ class BrowserThread(Thread):
self.browser.set_script_timeout(max_wait_time)
try:
output = self.browser.execute_script(code)
except:
except Exception as e:
output = ""
self.recordLog("JavaScript execution failed")
self.print_and_log("执行下面的代码时出错:" + code, ",错误为:", str(e))
self.print_and_log("Error executing the following code:" + code, ", error is:", str(e))
elif int(codeMode) == 2:
self.recordLog("Execute JavaScript for element:" + code)
self.recordLog("对元素执行JavaScript:" + code)
self.browser.set_script_timeout(max_wait_time)
try:
output = self.browser.execute_script(code, element)
except:
except Exception as e:
output = ""
self.recordLog("JavaScript execution failed")
self.print_and_log("执行下面的代码时出错:" + code, ",错误为:", str(e))
self.print_and_log("Error executing the following code:" + code, ", error is:", str(e))
elif int(codeMode) == 5:
try:
code = readCode(code)
# global_namespace = globals().copy()
# global_namespace["self"] = self
output = exec(code)
self.recordLog("执行下面的代码:" + code)
self.recordLog("Execute the following code:" + code)
except Exception as e:
self.print_and_log("执行下面的代码时出错:" + code, ",错误为:", e)
self.print_and_log("执行下面的代码时出错:" + code, ",错误为:", str(e))
self.print_and_log("Error executing the following code:" +
code, ", error is:", e)
code, ", error is:", str(e))
elif int(codeMode) == 6:
try:
code = readCode(code)
@ -847,6 +803,23 @@ class BrowserThread(Thread):
self.print_and_log("根据设置的自定义操作,任务已刷新页面|Task refreshed page according to custom operation")
elif codeMode == 9: # 发送邮件
send_email(node["parameters"]["emailConfig"])
elif codeMode == 10: # 清空所有字段值
self.clearOutputParameters()
elif codeMode == 11: # 生成新的数据行
line = new_line(self.outputParameters,
self.maxViewLength, self.outputParametersRecord)
self.OUTPUT.append(line)
elif codeMode == 12: # 退出程序
self.print_and_log("根据设置的自定义操作,任务已退出|Task exited according to custom operation")
self.saveData(exit=True)
self.browser.quit()
self.print_and_log("正在清理临时用户目录……|Cleaning up temporary user directory...")
try:
shutil.rmtree(self.option["tmp_user_data_folder"])
except:
pass
self.print_and_log("清理完成!|Clean up completed!")
os._exit(0)
else: # 0 1 5 6
output = self.execute_code(
codeMode, code, max_wait_time, iframe=params["iframe"])
@ -1106,7 +1079,25 @@ class BrowserThread(Thread):
self.recordLog(
"判断条件内所有条件分支的条件都不满足|None of the conditions in the judgment condition are met")
def handleHistory(self, node, xpath, thisHistoryURL, thisHistoryLength, index, element=None, elements=None):
def handleHistory(self, node, xpath, thisHandle, thisHistoryURL, thisHistoryLength, index, element=None, elements=None):
try:
changed_handle = self.browser.current_window_handle != thisHandle
except: # 如果网页被意外关闭了的情况下
self.browser.switch_to.window(
self.browser.window_handles[-1])
changed_handle = self.browser.window_handles[-1] != thisHandle
if changed_handle: # 如果执行完一次循环之后标签页的位置发生了变化
try:
while True: # 一直关闭窗口直到当前标签页
self.browser.close() # 关闭使用完的标签页
self.browser.switch_to.window(
self.browser.window_handles[-1])
if self.browser.current_window_handle == thisHandle:
break
except Exception as e:
self.print_and_log("关闭标签页发生错误:", e)
self.print_and_log(
"Error occurred while closing tab: ", e)
if self.history["index"] != thisHistoryLength and self.history["handle"] == self.browser.current_window_handle: # 如果执行完一次循环之后历史记录发生了变化,注意当前页面的判断
difference = thisHistoryLength - self.history["index"] # 计算历史记录变化差值
self.browser.execute_script('history.go(' + str(difference) + ')') # 回退历史记录
@ -1132,12 +1123,13 @@ class BrowserThread(Thread):
if self.browser.current_url == thisHistoryURL or ti > thisHistoryLength: # 如果执行完一次循环之后网址发生了变化
break
time.sleep(2)
if element == None: # 不固定元素列表
element = self.browser.find_elements(By.XPATH, xpath, iframe=node["parameters"]["iframe"])
else: # 固定元素列表
element = self.browser.find_element(By.XPATH, xpath, iframe=node["parameters"]["iframe"])
# if index > 0:
# index -= 1 # 如果是data:开头的网址,就要重试一次
if xpath != "":
if element == None: # 不固定元素列表
element = self.browser.find_elements(By.XPATH, xpath, iframe=node["parameters"]["iframe"])
else: # 固定元素列表
element = self.browser.find_element(By.XPATH, xpath, iframe=node["parameters"]["iframe"])
# if index > 0:
# index -= 1 # 如果是data:开头的网址,就要重试一次
else:
if element == None:
element = elements
@ -1156,6 +1148,14 @@ class BrowserThread(Thread):
self.history["handle"] = thisHandle
thisHistoryURL = self.browser.current_url
# 快速提取处理
# start = time.time()
try:
tree = html.fromstring(self.browser.page_source)
except Exception as e:
self.print_and_log("解析页面时出错,将切换普通提取模式|Error parsing page, will switch to normal extraction mode")
node["parameters"]["quickExtractable"] = False
# end = time.time()
# print("解析页面秒数:", end - start)
if node["parameters"]["quickExtractable"]:
self.browser.switch_to.default_content() # 切换到主页面
tree = html.fromstring(self.browser.page_source)
@ -1321,25 +1321,7 @@ class BrowserThread(Thread):
if self.BREAK:
self.BREAK = False
break
try:
changed_handle = self.browser.current_window_handle != thisHandle
except: # 如果网页被意外关闭了的情况下
self.browser.switch_to.window(
self.browser.window_handles[-1])
changed_handle = self.browser.window_handles[-1] != thisHandle
if changed_handle: # 如果执行完一次循环之后标签页的位置发生了变化
try:
while True: # 一直关闭窗口直到当前标签页
self.browser.close() # 关闭使用完的标签页
self.browser.switch_to.window(
self.browser.window_handles[-1])
if self.browser.current_window_handle == thisHandle:
break
except Exception as e:
self.print_and_log("关闭标签页发生错误:", e)
self.print_and_log(
"Error occurred while closing tab: ", e)
index, elements = self.handleHistory(node, xpath, thisHistoryURL, thisHistoryLength, index, elements=elements)
index, elements = self.handleHistory(node, xpath, thisHandle, thisHistoryURL, thisHistoryLength, index, elements=elements)
if int(node["parameters"]["breakMode"]) > 0: # 如果设置了退出循环的脚本条件
output = self.execute_code(int(
node["parameters"]["breakMode"]) - 1, node["parameters"]["breakCode"],
@ -1381,25 +1363,7 @@ class BrowserThread(Thread):
if self.BREAK:
self.BREAK = False
break
try:
changed_handle = self.browser.current_window_handle != thisHandle
except: # 如果网页被意外关闭了的情况下
self.browser.switch_to.window(
self.browser.window_handles[-1])
changed_handle = self.browser.window_handles[-1] != thisHandle
if changed_handle: # 如果执行完一次循环之后标签页的位置发生了变化
try:
while True: # 一直关闭窗口直到当前标签页
self.browser.close() # 关闭使用完的标签页
self.browser.switch_to.window(
self.browser.window_handles[-1])
if self.browser.current_window_handle == thisHandle:
break
except Exception as e:
self.print_and_log("关闭标签页发生错误:", e)
self.print_and_log(
"Error occurred while closing tab: ", e)
index, element = self.handleHistory(node, path, thisHistoryURL, thisHistoryLength, index, element=element)
index, element = self.handleHistory(node, path, thisHandle, thisHistoryURL, thisHistoryLength, index, element=element)
except NoSuchElementException:
self.print_and_log("Loop element not found: ", path)
self.print_and_log("找不到循环元素:", path)
@ -1447,6 +1411,7 @@ class BrowserThread(Thread):
code = get_output_code(output)
if code <= 0:
break
index, _ = self.handleHistory(node, "", thisHandle, thisHistoryURL, thisHistoryLength, index)
elif int(node["parameters"]["loopType"]) == 4: # 固定网址列表
# tempList = node["parameters"]["textList"].split("\r\n")
urlList = list(
@ -1696,8 +1661,11 @@ class BrowserThread(Thread):
try:
actions = ActionChains(self.browser) # 实例化一个action对象
if newTab == 1: # 在新标签页打开
# Ctrl + Click
actions.key_down(Keys.CONTROL).click(element).key_up(Keys.CONTROL).perform()
if sys.platform == "darwin": # Mac
actions.key_down(Keys.COMMAND).click(element).key_up(Keys.COMMAND).perform()
else:
# Ctrl + Click
actions.key_down(Keys.CONTROL).click(element).key_up(Keys.CONTROL).perform()
else:
actions.click(element).perform()
except Exception as e:
@ -1715,6 +1683,21 @@ class BrowserThread(Thread):
script = 'var result = document.evaluate(`' + path + \
'`, document, null, XPathResult.ANY_TYPE, null);for(let i=0;i<arguments[0];i++){result.iterateNext();} result.iterateNext().click();'
self.browser.execute_script(script, str(index)) # 用js的点击方法
elif click_way == 2: # 双击
try:
actions = ActionChains(self.browser) # 实例化一个action对象
actions.double_click(element).perform()
except Exception as e:
self.browser.execute_script("arguments[0].scrollIntoView();", element)
try:
actions = ActionChains(self.browser) # 实例化一个action对象
actions.double_click(element).perform()
except Exception as e:
self.print_and_log(f"Selenium双击元素{path}失败将尝试使用JavaScript双击")
self.print_and_log(f"Failed to double click element {path} with Selenium, will try to double click with JavaScript")
script = 'var result = document.evaluate(`' + path + \
'`, document, null, XPathResult.ANY_TYPE, null);for(let i=0;i<arguments[0];i++){result.iterateNext();} result.iterateNext().click();'
self.browser.execute_script(script, str(index)) # 用js的点击方法
self.recordLog("点击元素|Click element: " + path)
except TimeoutException:
self.print_and_log(
@ -1797,7 +1780,6 @@ class BrowserThread(Thread):
self.print_and_log("History Length Error")
self.history["index"] = 0
self.scrollDown(param) # 根据参数配置向下滚动
# rt.end()
def get_content(self, p, element):
content = ""
@ -1824,7 +1806,7 @@ class BrowserThread(Thread):
downloadPic = 0
if downloadPic == 1:
download_image(self, content, "Data/Task_" +
str(self.id) + "/" + self.saveName + "/", element)
str(self.id) + "/" + self.saveName + "/images", element)
else: # 普通节点
if p["splitLine"] == 1:
text = extract_text_from_html(element.get_attribute('outerHTML'))
@ -1853,7 +1835,7 @@ class BrowserThread(Thread):
downloadPic = 0
if downloadPic == 1:
download_image(self, content, "Data/Task_" +
str(self.id) + "/" + self.saveName + "/", element)
str(self.id) + "/" + self.saveName + "/images", element)
else:
command = 'var arr = [];\
var content = arguments[0];\
@ -1965,6 +1947,8 @@ class BrowserThread(Thread):
content = element.get_attribute(attribute_name)
except:
content = ""
elif p["contentType"] == 15: # 常量值
content = p["JS"]
if content == None:
content = ""
return content
@ -2208,7 +2192,9 @@ if __name__ == '__main__':
"server_address": "http://localhost:8074",
"keyboard": True, # 是否监听键盘输入
"pause_key": "p", # 暂停键
"version": "0.6.0",
"version": "0.6.3",
"docker_driver": "",
"user_folder": "",
}
c = Config(config)
print(c)
@ -2283,7 +2269,9 @@ if __name__ == '__main__':
options.add_argument(
"--disable-blink-features=AutomationControlled") # TMALL 反扒
# 阻止http -> https的重定向
options.add_argument("--disable-features=CrossSiteDocumentBlockingIfIsolating,CrossSiteDocumentBlockingAlways,IsolateOrigins,site-per-process")
options.add_argument("--disable-web-security") # 禁用同源策略
options.add_argument('-ignore-certificate-errors')
options.add_argument('-ignore -ssl-errors')
@ -2302,35 +2290,43 @@ if __name__ == '__main__':
os.mkdir(tmp_user_folder_parent)
characters = string.ascii_letters + string.digits
for i in range(len(c.ids)):
id = c.ids[i]
# 从字符集中随机选择字符构成字符串
random_string = ''.join(random.choice(characters) for i in range(10))
tmp_user_data_folder = os.path.join(tmp_user_folder_parent, "user_data_" + str(id) + "_" + str(time.time()).replace(".","") + "_" + random_string)
tmp_options[i]["tmp_user_data_folder"] = tmp_user_data_folder
if os.path.exists(tmp_user_data_folder):
try:
shutil.rmtree(tmp_user_data_folder)
except:
pass
print(f"Copying user data folder to: {tmp_user_data_folder}, please wait...")
print(f"正在复制用户信息目录到: {tmp_user_data_folder},请稍等...")
if os.path.exists(absolute_user_data_folder):
try:
shutil.copytree(absolute_user_data_folder, tmp_user_data_folder)
print("User data folder copied successfully, if you exit the program before it finishes, please delete the temporary user data folder manually.")
print("用户信息目录复制成功,如果程序在运行过程中被手动退出,请手动删除临时用户信息目录。")
except:
tmp_user_data_folder = absolute_user_data_folder
print("Copy user data folder failed, use the original folder.")
print("复制用户信息目录失败,使用原始目录。")
else:
tmp_user_data_folder = absolute_user_data_folder
print("Cannot find user data folder, create a new folder.")
print("未找到用户信息目录,创建新目录。")
options = tmp_options[i]["options"]
options.add_argument(
f'--user-data-dir={tmp_user_data_folder}') # TMALL 反扒
options.add_argument("--profile-directory=Default")
if c.user_folder == "":
id = c.ids[i]
# 从字符集中随机选择字符构成字符串
random_string = ''.join(random.choice(characters) for i in range(10))
tmp_user_data_folder = os.path.join(tmp_user_folder_parent, "user_data_" + str(id) + "_" + str(time.time()).replace(".","") + "_" + random_string)
tmp_options[i]["tmp_user_data_folder"] = tmp_user_data_folder
if os.path.exists(tmp_user_data_folder):
try:
shutil.rmtree(tmp_user_data_folder)
except:
pass
print(f"Copying user data folder to: {tmp_user_data_folder}, please wait...")
print(f"正在复制用户信息目录到: {tmp_user_data_folder},请稍等...")
if os.path.exists(absolute_user_data_folder):
try:
shutil.copytree(absolute_user_data_folder, tmp_user_data_folder)
print("User data folder copied successfully, if you exit the program before it finishes, please delete the temporary user data folder manually.")
print("用户信息目录复制成功,如果程序在运行过程中被手动退出,请手动删除临时用户信息目录。")
except:
tmp_user_data_folder = absolute_user_data_folder
print("Copy user data folder failed, use the original folder.")
print("复制用户信息目录失败,使用原始目录。")
else:
tmp_user_data_folder = absolute_user_data_folder
print("Cannot find user data folder, create a new folder.")
print("未找到用户信息目录,创建新目录。")
options.add_argument(
f'--user-data-dir={tmp_user_data_folder}') # TMALL 反扒
print(f"Use local user data folder: {tmp_user_data_folder}")
print(f"使用本地用户信息目录: {tmp_user_data_folder}")
else:
options.add_argument(
f'--user-data-dir={c.user_folder}')
print(f"Use specifed user data folder: {c.user_folder}, please note if you are using docker, this user folder path should be the path inside the docker container.")
print(f"使用指定的用户信息目录: {c.user_folder}请注意如果您正在使用docker此用户文件夹路径应是容器内的路径。")
print(
"如果报错Selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally说明有之前运行的Chrome实例没有正常关闭请关闭之前打开的所有Chrome实例后再运行程序即可。")
print(
@ -2343,9 +2339,13 @@ if __name__ == '__main__':
print("id: ", id)
if c.read_type == "remote":
print("remote")
content = requests.get(
try:
content = requests.get(
c.server_address + "/queryExecutionInstance?id=" + str(id))
service = json.loads(content.text) # 加载服务信息
service = json.loads(content.text) # 加载服务信息
except:
print("Cannot connect to the server, please make sure that the EasySpider Main Program is running, or you can change the --read_type parameter to 'local' to read the task information from the local task file without keeping the EasySpider Main Program running.")
print("无法连接到服务器请确保EasySpider主程序正在运行或者您可以将--read_type参数更改为'local'以实现从本地任务文件中读取任务信息而无需保持EasySpider主程序运行。")
else:
print("local")
local_folder = os.path.join(os.getcwd(), "execution_instances")
@ -2370,8 +2370,8 @@ if __name__ == '__main__':
cloudflare = 0
if cloudflare == 0:
options.add_argument('log-level=3') # 隐藏日志
path = os.path.join(os.path.abspath("./"), "Data", "Task_" + str(id))
print("Data path:", path)
path = os.path.join(os.path.abspath("./"), "Data", "Task_" + str(id), "files")
print("文件下载路径|File Download path:", path)
options.add_experimental_option("prefs", {
# 设置文件下载路径
"download.default_directory": path,
@ -2396,8 +2396,17 @@ if __name__ == '__main__':
except:
browser = "chrome"
if browser == "chrome":
selenium_service = Service(executable_path=driver_path)
browser_t = MyChrome(service=selenium_service, options=options)
if c.docker_driver == "":
print("Using local driver")
selenium_service = Service(executable_path=driver_path)
browser_t = MyChrome(service=selenium_service, options=options, mode='local_driver')
else:
print("Using remote driver")
# Use docker driver, default address is http://localhost:4444/wd/hub
# Headless mode
# options.add_argument("--headless")
# print("Headless mode")
browser_t = MyChrome(command_executor=c.docker_driver, options=options, mode='remote_driver')
elif browser == "edge":
from selenium.webdriver.edge.service import Service as EdgeService
from selenium.webdriver.edge.options import Options as EdgeOptions
@ -2458,6 +2467,7 @@ if __name__ == '__main__':
# print("Passing the Cloudflare verification mode is sometimes unstable. If the verification fails, you need to try again every few minutes, or you can change to a new user information folder and then execute the task.")
# 使用监听器监听键盘输入
try:
from pynput.keyboard import Key, Listener
if c.keyboard:
with Listener(on_press=on_press_creator(press_time, event),
on_release=on_release_creator(event, press_time)) as listener:

View File

@ -19,11 +19,16 @@ desired_capabilities["pageLoadStrategy"] = "none"
class MyChrome(webdriver.Chrome):
class MyChrome(webdriver.Chrome, webdriver.Remote):
def __init__(self, *args, **kwargs):
def __init__(self, mode='local_driver', *args, **kwargs):
self.iframe_env = False # 现在的环境是root还是iframe
super().__init__(*args, **kwargs) # 调用父类的 __init__
self.mode = mode
if mode == "local_driver":
webdriver.Chrome.__init__(self, *args, **kwargs)
elif mode == "remote_driver":
webdriver.Remote.__init__(self, *args, **kwargs)
# super().__init__(*args, **kwargs) # 调用父类的 __init__
# def find_element(self, by=By.ID, value=None, iframe=False):
# # 在这里改变查找元素的行为

View File

@ -59,7 +59,31 @@ def send_email(config):
smtp_server.quit()
except:
pass
def rename_downloaded_file(download_dir, stop_event):
original_files = set(os.listdir(download_dir))
while not stop_event.is_set():
files = os.listdir(download_dir)
for file in files:
if file in original_files:
continue # 跳过原始文件和已重命名的文件
full_path = os.path.join(download_dir, file)
if not full_path.endswith('.crdownload') and not full_path.endswith('.htm') and not full_path.endswith('.html') and not full_path.startswith('esfile_'):
new_name = "esfile_" + file.split('/')[-1] + '_' + str(uuid.uuid4()) + '_' + file.split('/')[-1]
new_path = os.path.join(download_dir, new_name)
try:
os.rename(full_path, new_path)
original_files.add(new_name) # 记录新文件名以避免再次重命名
print(f"文件已重命名为|File has been renamed to: {new_path}")
except:
print("文件重命名失败|File rename failed")
time.sleep(1) # 每一秒检查一次
# print("下载文件重命名监控中,请等待...|Download file rename monitoring, please wait...")
print("下载文件重命名监控已停止。|Download file rename monitoring has stopped.")
def is_valid_url(url):
try:
@ -505,10 +529,17 @@ def write_to_excel(file_name, data, types, record):
for i in range(len(line)):
if record[i]:
to_write.append(line[i])
ws.append(to_write)
try:
ws.append(to_write)
except:
print("写入Excel文件失败请检查数据类型是否正确。")
print("Failed to write to Excel file, please check if the data type is correct.")
# 保存工作簿
wb.save(file_name)
try:
wb.save(file_name)
except:
print("保存Excel文件失败请检查文件是否被其他程序打开。")
print("Failed to save Excel file, please check if the file is opened by other programs.")
class Time:
def __init__(self, type1=""):

View File

@ -1,88 +1,17 @@
Official Site: https://www.easyspider.net
Welcome to promote this software to other friends.
Welcome to promote this software to other friends and star our Github Repository!
This version is for Windows 10 x64 and above.
This version is for Windows 10/Windows Server 2016 x64 and above.
The Windows version supports **Windows 10 and above**. If you want to use EasySpider on windows 7, please download the Windows x32 version of EasySpider.
If you want to use EasySpider on windows 7, please download the Windows x32 version of EasySpider. There is no version support for Windows Server 2012 and below. These systems require manual compilation for execution.
The software's open-source code repository on GitHub: https://github.com/NaiboWang/EasySpider
Official documentation can be found at: https://github.com/NaiboWang/EasySpider/wiki
Video Tutorial: https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp
The software is totally not trojan/virus! If mistaken by antivirus software such as Windows Defender as a virus, please recover it, or open "EasySpider.bat" to run our software instead.
Tasks can be imported from other machines by simply placing the .json files from the "tasks" folder of those machines into the "tasks" folder of this directory. Similarly, execution instance files can be imported by copying the .json files from the "execution_instances" folder. Note that only files named with a number greater than 0 are supported in both folders.
======Version Update Instructions======
Please see more new features for version greater than v0.3.2 at github release page: https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## Update Instruction
1. Selected child element operations can delete fields and unmark deleted fields in real-time in the browser.
2. Selecting child elements adds a selection mode that allows you to choose only the child elements that are present in all blocks or the child elements that are the same as the first selected block.
3. In the text input and webpage open options, you can use the extracted field value as a variable for text input, represented by Field["field_name"].
4. Files can be downloaded, such as PDF files.
5. Fixed a bug where the software could display a blank screen for about 10 seconds after opening, making it usable in intranets, darknets, and any local network.
6. Fixed a bug where the current page URL and title could not be extracted.
7. Fixed a bug where OCR recognition could fail to extract information.
8. Updated extraction logic to save locally every 10 records collected.
9. When modifying a task, the default anchor position is set to after the last operation in the task flow.
10. Updated Chrome version to 114.
-----v0.3.1-----
## Update Instruction
1. Advanced Operations:
- Custom scripts can be executed in the workflow, including executing JavaScript commands in the browser and invoking scripts at the operating system level. The command's return value can be obtained and recorded, greatly expanding the scope of operations.
- Before and after each operation, you can specify a JavaScript command to be executed targeting the current located element.
2. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
3. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
4. Added the functionality to extract the background image URL of elements, current page title, and current page URL.
5. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
6. Added the functionality to download images.
7. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
8. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
9. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
10. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
11. Added instructions on how to execute tasks from the command line.
12. Added headless mode configuration, allowing the software to run without a browser interface.
13. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
14. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
15. Fixed the issue where the input box would freeze after saving a task.
16. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
17. Added the functionality to move the mouse to an element.
18. Displays a prompt when an element cannot be found.
19. Fixed the webpage scrolling bug.
20. The task name is initialized with the value of the page title upon the first visit.
21. Added version update prompts.
22. Added the information of the publisher as requested.
23. Updated Chrome version to 113.

View File

@ -0,0 +1,10 @@
打开报错DiscardVirtualMemory...KERNEL32.dll说明如下
64位版本的易采集EasySpider只支持支持Windows 10/Windows Server 2016 x64及以上版本。
对于Windows 7任意版本包括x64和x32版本以及Windows 10 x32版本请下载Windows的32位版本使用。无任何版本支持Windows Server 2012及以下版本系统这些系统下需要自行编译运行。
If you open the software and see an error like: DiscardVirtualMemory...KERNEL32.dll, the reason is:
This 64-bit version of EasySpider is for Windows 10/Windows Server 2016 x64 and above.
If you want to use EasySpider on windows 7, please download the Windows x32 version of EasySpider. There is no version support for Windows Server 2012 and below. These systems require manual compilation for execution.

View File

@ -23,7 +23,7 @@ For more complex operations, please download the source code and compile it for
"""
# 请在下面编写你的代码,不要有代码缩进!!! | Please write your code below, do not indent the code!!!
print(globals())
# 导包 | Import packages
from selenium.common.exceptions import ElementClickInterceptedException
@ -56,3 +56,20 @@ finally:
print("All parameters:", self.outputParameters)
print(test(3))
print("执行完毕|Execution completed")
import time
time.sleep(3)
def new_line(outputParameters, maxViewLength, record):
line = []
print("Use this function to print a new line in the console")
i = 0
for value in outputParameters.values():
line.append(value)
if record[i]:
print(value[:maxViewLength], " ", end="")
i += 1
print("")
return line
new_line(self.outputParameters, 10, [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])

File diff suppressed because one or more lines are too long

View File

@ -1 +1 @@
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"12/7/2023, 2:56:47 AM","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}}]}
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"2024-01-05 22:08:46","version":"0.6.0","saveThreshold":10,"quitWaitTime":3,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"TTT","dataWriteMode":3,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"},{"id":1,"name":"loopTimes_1","nodeId":5,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":10,"value":10}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,5],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":2,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":3,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":2,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":4,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":3,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":2,"index":5,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[2],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"//body","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":10,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

View File

@ -1 +1 @@
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"07/12/2023, 03:43:34","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"desc":"https://www.zhihu.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}}]}
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"2023-12-27 20:05:50","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"知了个乎","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"},{"id":1,"name":"loopTimes_1","nodeId":4,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":0,"value":0}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,4,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":2,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":4,"index":3,"parentId":3,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}},{"id":2,"index":4,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

View File

@ -1 +1 @@
{"id":70,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/24/2023, 8:21:45 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":7,"title":"移动到元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":2,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"allXPaths":"","loopType":1}}]}
{"id":-2,"name":"京东全球版-专业的综合网上购物商城","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/24/2023, 8:21:45 PM","version":"0.3.1","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":""}},{"id":3,"index":3,"parentId":2,"type":0,"option":7,"title":"移动到元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":true,"xpath":"/html/body/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div","wait":2,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"allXPaths":"","loopType":1}}]}

File diff suppressed because one or more lines are too long

View File

@ -1,105 +1,17 @@
欢迎将软件宣传给更多需要的朋友!
欢迎将软件宣传给更多需要的朋友和Star我们的Github仓库
官方网址: https://www.easyspider.cn
支持Windows 10 x64及以上版本。
支持Windows 10/Windows Server 2016 x64及以上版本。
Windows 7任意版本包括x64和x32版本以及Windows 10 x32版本请下载Windows的32位版本使用。
Windows 7任意版本包括x64和x32版本以及Windows 10 x32版本请下载Windows的32位版本使用。无任何版本支持Windows Server 2012及以下版本系统这些系统下需要自行编译运行。
软件开源代码Github库地址https://github.com/NaiboWang/EasySpider
官方文档地址https://github.com/NaiboWang/EasySpider/wiki
视频教程https://www.bilibili.com/video/BV1th411A7ey/
这个软件绝对不是特洛伊木马/病毒如果被像Windows Defender这样的杀毒软件误认为是病毒请进行恢复或者打开“EasySpider.bat”来运行我们的软件。
可以从其他机器导入任务只需要把其他机器的tasks文件夹里的.json文件放入此目录的tasks文件夹里即可。同理执行号文件可以通过复制execution_instances文件夹中的.json文件来导入。注意两个文件夹里的.json文件只支持命名为大于0的数字。
======版本更新说明======
v0.3.2以上版本更新说明请查看Github Release Pages页面https://github.com/NaiboWang/EasySpider/releases
-----v0.3.2-----
## 更新说明
1. 选中子元素操作可删除字段并在浏览器中实时取消标记被删除的字段。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/e016c832-6ff9-4814-b86c-38787e73aa30" width=50% />
2. 选中子元素增加选择模式,可以只选择所有块都有的子元素,或者所有块中和第一个选中的块相同的子元素。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/0082b11d-96bc-43f1-acdb-8280decb48b4" width=50% />
3. 输入文字和打开网页选项中可以使用最后一次提取到的字段值**作为变量**进行文字输入,用`Field["字段名"]`表示此变量。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/d81cd082-e01a-490e-85f7-9baac93510d8)
4. 可下载文件如PDF。
5. 修复打开后有可能会白屏10秒左右的Bug使得在内网暗网以及任意局域网都可以使用软件。
6. 修复提取当前页面URL和标题时可能提取不到的bug。
7. 修复OCR识别可能提取不到的bug。
8. 提取逻辑更新为每采集10条本地保存一次。
9. 修改任务时默认锚点位置为任务流程的最后操作后。
10. 更新Chrome版本为114。
-----v0.3.1-----
### 强烈建议大家观看新特性讲解视频
B站最新版特性视频已上传新视频非常有用推荐大家观看。
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)
[如何同时执行多个任务(并行多开)](https://www.bilibili.com/video/BV13c411G7LE/)
[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)
[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)
[如何对元素和网页截图及(无头模式)命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)
[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)
注意v0.3.1版本任务task文件夹内`.json`文件和之前所有版本均不兼容请重新设计v0.3.1版本任务。
## 更新说明
1. 自定义操作:
- 可以在任务流程中**执行自定义脚本**,包括在浏览器中**执行Javascript指令**以及**操作系统级别的脚本调用**并可**得到命令返回值并记录**,大大扩展了可操作空间。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/06e63a06-328d-4339-b40b-2d57c94cee66)
- 在每一个操作执行前和执行后都可以指定执行一段针对当前定位元素的JavaScript指令。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/dde64388-5668-40ff-951e-fb8f60655c49" height=50% width=50%>
2. **判断条件和循环条件**中同样增加了**执行自定义脚本**并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定自定义操作可以操作循环内元素。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/9dea0564-1a1c-487d-9fa4-427c5e284796)
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/5ce7cf50-e5c9-4714-a83b-9c65934e9c68" width=50%></img>
3. 可同时生成多种XPath供用户选择并**预装了XPath Helper扩展**供大家调试XPath。
4. 增加采集元素背景图片地址当前页面标题当前页面URL地址功能。
5. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
6. 增加下载图片功能。
7. 增加OCR识别元素功能使用此功能需首先自行安装Tesseract库[https://blog.csdn.net/u010454030/article/details/80515501](https://blog.csdn.net/u010454030/article/details/80515501)
8. 可直接提取对元素执行JavaScript代码后的返回值实现如正则表达式获得元素背景颜色等功能。
9. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
<img src="https://github.com/NaiboWang/EasySpider/assets/30287768/c0b2bec1-2a97-4516-930e-1b310697212b" width=50%></img>
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/42cc0009-00d1-4c5c-af47-0fa6340fba80)
10. 大幅增加使用提示和说明使软件更易用如增加了iframe标签的处理方式说明各个选项的参数意义以及循环项XPath的修改说明等等
11. 执行命令时增加了如何用命令行执行任务的提示:[https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction](https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction)。
![image](https://github.com/NaiboWang/EasySpider/assets/30287768/a9e774df-e345-4d51-b7c9-2c4dac0ec624)
12. 增加并行多开模式。
13. 增加无头模式,即无浏览器界面模式配置。
14. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
15. 修复了条件分支没有无条件分支时会卡死的问题。
16. 修复了保存任务后会输入框卡死的问题。
17. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
18. 增加了鼠标移动到元素功能。
19. 找不到元素时会提示。
20. 修复网页滚动Bug。
21. 增加新增提取数据字段操作。
22. 任务名称初始化为第一次进入页面的标题值。
23. 增加版本更新提示。
24. 应要求增加出品方信息。
25. 更新chrome版本为113。

View File

@ -1 +1 @@
Note: The various folders within this directory are not directly usable software, but temporary folders used by the author at the time of release. Please visit the official website to download readily usable software packages: https://www.easyspider.cn
Note: The various folders within this directory are not directly usable software, but temporary folders used by the author at the time of release. Please visit the official website to download readily usable software packages: https://www.easyspider.net

View File

@ -64,49 +64,49 @@ def compress_folder_to_7z_split(folder_path, output_file):
except:
subprocess.call(["7zz", "a", "-v95m", output_file, folder_path])
easyspider_version = "0.6.0"
easyspider_version = "0.6.3"
if __name__ == "__main__":
if sys.platform == "win32" and platform.architecture()[0] == "64bit":
file_name = f"EasySpider_{easyspider_version}_windows_x64.7z"
if os.path.exists("./EasySpider_windows_x64/user_data"):
shutil.rmtree("./EasySpider_windows_x64/user_data")
if os.path.exists("./EasySpider_windows_x64/Data"):
shutil.rmtree("./EasySpider_windows_x64/Data")
if os.path.exists("./EasySpider_windows_x64/execution_instances"):
shutil.rmtree("./EasySpider_windows_x64/execution_instances")
if os.path.exists("./EasySpider_windows_x64/config.json"):
os.remove("./EasySpider_windows_x64/config.json")
if os.path.exists("./EasySpider_windows_x64/mysql_config.json"):
os.remove("./EasySpider_windows_x64/mysql_config.json")
if os.path.exists("./EasySpider_windows_x64/TempUserDataFolder"):
shutil.rmtree("./EasySpider_windows_x64/TempUserDataFolder")
os.mkdir("./EasySpider_windows_x64/Data")
os.mkdir("./EasySpider_windows_x64/execution_instances")
# compress_folder_to_7z_split("./EasySpider_windows_x64", file_name)
file_name = f"EasySpider_{easyspider_version}_Windows_x64.7z"
if os.path.exists("./EasySpider_Windows_x64/user_data"):
shutil.rmtree("./EasySpider_Windows_x64/user_data")
if os.path.exists("./EasySpider_Windows_x64/Data"):
shutil.rmtree("./EasySpider_Windows_x64/Data")
if os.path.exists("./EasySpider_Windows_x64/execution_instances"):
shutil.rmtree("./EasySpider_Windows_x64/execution_instances")
if os.path.exists("./EasySpider_Windows_x64/config.json"):
os.remove("./EasySpider_Windows_x64/config.json")
if os.path.exists("./EasySpider_Windows_x64/mysql_config.json"):
os.remove("./EasySpider_Windows_x64/mysql_config.json")
if os.path.exists("./EasySpider_Windows_x64/TempUserDataFolder"):
shutil.rmtree("./EasySpider_Windows_x64/TempUserDataFolder")
os.mkdir("./EasySpider_Windows_x64/Data")
os.mkdir("./EasySpider_Windows_x64/execution_instances")
# compress_folder_to_7z_split("./EasySpider_Windows_x64", file_name)
# print(f"Compress {file_name} Split successfully!")
compress_folder_to_7z("./EasySpider_windows_x64", file_name)
compress_folder_to_7z("./EasySpider_Windows_x64", file_name)
print(f"Compress {file_name} successfully!")
elif sys.platform == "win32" and platform.architecture()[0] == "32bit":
file_name = f"EasySpider_{easyspider_version}_windows_x32.7z"
if os.path.exists("./EasySpider_windows_x32/user_data"):
shutil.rmtree("./EasySpider_windows_x32/user_data")
if os.path.exists("./EasySpider_windows_x32/Data"):
shutil.rmtree("./EasySpider_windows_x32/Data")
if os.path.exists("./EasySpider_windows_x32/execution_instances"):
shutil.rmtree("./EasySpider_windows_x32/execution_instances")
if os.path.exists("./EasySpider_windows_x32/config.json"):
os.remove("./EasySpider_windows_x32/config.json")
if os.path.exists("./EasySpider_windows_x32/mysql_config.json"):
os.remove("./EasySpider_windows_x32/mysql_config.json")
if os.path.exists("./EasySpider_windows_x32/TempUserDataFolder"):
shutil.rmtree("./EasySpider_windows_x32/TempUserDataFolder")
os.mkdir("./EasySpider_windows_x32/Data")
os.mkdir("./EasySpider_windows_x32/execution_instances")
# compress_folder_to_7z_split("./EasySpider_windows_x32", file_name)
file_name = f"EasySpider_{easyspider_version}_Windows_x32.7z"
if os.path.exists("./EasySpider_Windows_x32/user_data"):
shutil.rmtree("./EasySpider_Windows_x32/user_data")
if os.path.exists("./EasySpider_Windows_x32/Data"):
shutil.rmtree("./EasySpider_Windows_x32/Data")
if os.path.exists("./EasySpider_Windows_x32/execution_instances"):
shutil.rmtree("./EasySpider_Windows_x32/execution_instances")
if os.path.exists("./EasySpider_Windows_x32/config.json"):
os.remove("./EasySpider_Windows_x32/config.json")
if os.path.exists("./EasySpider_Windows_x32/mysql_config.json"):
os.remove("./EasySpider_Windows_x32/mysql_config.json")
if os.path.exists("./EasySpider_Windows_x32/TempUserDataFolder"):
shutil.rmtree("./EasySpider_Windows_x32/TempUserDataFolder")
os.mkdir("./EasySpider_Windows_x32/Data")
os.mkdir("./EasySpider_Windows_x32/execution_instances")
# compress_folder_to_7z_split("./EasySpider_Windows_x32", file_name)
# print(f"Compress {file_name} Split successfully!")
compress_folder_to_7z("./EasySpider_windows_x32", file_name)
compress_folder_to_7z("./EasySpider_Windows_x32", file_name)
print(f"Compress {file_name} successfully!")
elif sys.platform == "linux" and platform.architecture()[0] == "64bit":
file_name = f"EasySpider_{easyspider_version}_Linux_x64.tar.xz"

Binary file not shown.

Binary file not shown.

View File

@ -1,4 +1,8 @@
# 环境编译说明|Environment Compilation Instruction
## 视频教程
[从源代码编译程序并设计运行和调试任务指南基于Ubuntu24.04](https://www.bilibili.com/video/BV1VE421P7yj/)
# 环境编译说明 | Environment Compilation Instruction
EasySpider分三部分
@ -19,35 +23,35 @@ EasySpider is divided into three parts:
This section covers the compilation instructions for the `main program`.
## 建议编译顺序|Suggested Compilation Order
## 建议编译顺序 | Suggested Compilation Order
1. 编译浏览器扩展,否则在主程序执行时会提示找不到`EasySpider_zh.crx`的错误。
2. 编译主程序,此时主程序可以正常运行,但无法执行任务,只能设计任务。
3. 编译执行阶段程序,否则无法执行程序,只能设计程序
3. 编译执行阶段程序,否则无法执行任务,只能设计任务
-----
1. Compile the browser extension, otherwise an error will be prompted when the main program is executed that `EasySpider_en.crx` cannot be found.
2. Compile the main program, at this time the main program can run normally, but can not execute the task, can only design the task.
3. Compile the execution stage program, otherwise the program cannot be executed, can only design the program.
3. Compile the execution stage program, otherwise the task cannot be executed, can only design the task.
## 注意事项|Note
## 注意事项 | Note
请记住每当EasySpider扩展程序和执行程序更新时都要更新`EasySpider.crx``easyspider_executestage`文件。
Remember to update the `EasySpider.crx` and `easyspider_executestage` files whenever the EasySpider extension and execution program are updated.
## 环境构建|Environment Setup
## 环境构建 | Environment Setup
以下以Windows x64版本为例。
Taking the example of Windows x64 version.
### 浏览器和驱动|Browser and Driver
### 浏览器和驱动 | Browser and Driver
实在搞不定本节的情况下下载一个直接能用的EasySpider并把文件夹内的`EasySpider\resources\app\chrome_win64`文件夹拷贝到此`ElectronJS`文件夹下即可。
实在搞不定本节的情况下下载一个直接能用的EasySpider并把文件夹内的`EasySpider\resources\app\chrome_win64`文件夹拷贝到此`ElectronJS`文件夹下,并把`chrome_win64`文件夹下的`execute.sh`在原文件夹下复制一份并命名为`execute_win64.sh`即可。
If you're unable to handle the tasks in this section, you can download a ready-to-use EasySpider. Simply copy the `EasySpider\resources\app\chrome_win64` folder from the downloaded files and paste it into the ElectronJS folder.
If you're unable to handle the tasks in this section, you can download a ready-to-use EasySpider, and copy the `EasySpider\resources\app\chrome_win64` folder to this `ElectronJS` folder, then copy the `execute.sh` script found in the `chrome_win64` folder and rename it as `execute_win64.sh` in the same location.
------
@ -66,7 +70,7 @@ chrome_linux64/ # for linux x64
chrome_mac64/ # for mac x64
```
然后,从下面的页面下载和**自己安装的Chrome版本一致**的Chromedriver[https://chromedriver.chromium.org/downloads](https://chromedriver.chromium.org/downloads)把chromedriver放入刚刚的`chrome`文件夹内,并更名为下面的格式:
然后,从下面的页面下载和**自己安装的Chrome版本一致**的Chromedriver[https://googlechromelabs.github.io/chrome-for-testing/](https://googlechromelabs.github.io/chrome-for-testing/)把chromedriver放入刚刚的`chrome`文件夹内,并更名为下面的格式:
```
chromedriver_win32.exe # for windows x32
@ -77,7 +81,7 @@ chromedriver_mac64 # for mac x64
例如如果您想在Windows x64平台上构建此软件那么您首先需要下载适用于Windows x64的Chrome浏览器并将整个`chrome`文件夹复制到`ElectronJS`文件夹中,然后将文件夹重命名为`chrome_win64`。假设您下载的Chrome版本是110。接下来下载一个适用于Windows x64的110版本的ChromeDriver并将其放入`chrome_win64`文件夹中,然后将其重命名为`chromedriver_win64.exe`
最后,把此文件夹内的`stealth.min.js``execute.bat`文件拷贝入`chrome`文件夹内。
最后,把此`ElectronJS`文件夹内的`stealth.min.js``execute_win64.bat`文件拷贝入`chrome_win64`文件夹内**这一步不要忘**
Download a Chrome from the Internet: https://www.google.com/chrome/, and then put them into this folder, with name format of the following:
@ -100,33 +104,31 @@ chromedriver_mac64 # for mac x64
For example, if you want to build this software on Windows x64 platform, then you should first download a Chrome for Windows x64, then copy the whole `chrome` folder to this `ElectronJS` folder and rename the folder to `chrome_win64`, assume the Chrome version you downloaded is 110; then, download a `chromedriver.exe` with version 110 for Windows x64, and put it into the `chrome_win64` folder, then rename it to `chromedriver_win64.exe`.
Finally, copy the `stealth.min.js` and `execute.bat` (for Windows x64) file in this folder to these `chrome` folders.
Finally, copy the `stealth.min.js` and `execute_win64.bat` file in this `ElectronJS` folder to the `chrome_win64` folder **(do not forget this step)**.
### NodeJS环境|NodeJS Environment
### NodeJS环境 | NodeJS Environment
1. Windows环境下需要先安装`VS Build Tools 2017` [https://aka.ms/vs/15/release/vs_buildtools.exe](https://aka.ms/vs/15/release/vs_buildtools.exe))的`Visual C++ Build Tools`组件,不然下面的命令无法执行,其他系统不需要。
1. Windows环境下需要先下载`VS Build Tools 2017` [https://aka.ms/vs/15/release/vs_buildtools.exe](https://aka.ms/vs/15/release/vs_buildtools.exe)并勾选安装其中`Visual C++ Build ToolsVisual C++生成工具)`组件以便`node-gyp`模块来安装`node-windows-manager`,不然下面的命令无法执行,其他系统不需要。同时,`Python3`也需要安装在系统中并配置好环境变量。
2. 安装`NodeJS`[https://nodejs.org/zh-cn/download/](https://nodejs.org/zh-cn/download/)。
3. 运行下面的命令来安装依赖:
```
npm install
npm install @electron-forge/cli -g
```
如果上面的命令运行速度很慢可以参考NodeJS换源说明[https://blog.csdn.net/qq_23211463/article/details/123769061](https://blog.csdn.net/qq_23211463/article/details/123769061)。
如果上面的命令运行速度很慢可以参考使用NodeJS和Electron包的换源说明来加速安装[https://blog.csdn.net/qq_38463737/article/details/140277803](https://blog.csdn.net/qq_38463737/article/details/140277803)。
-----
1. On Windows, you need to install `VS Build Tools 2017` (https://aka.ms/vs/15/release/vs_buildtools.exe, select and install the `Visual C++ Build Tools` component) first for node-gyp to install `node-windows-manager` (No need for other OS).
1. On Windows, you need to download `VS Build Tools 2017` (https://aka.ms/vs/15/release/vs_buildtools.exe, select and install the `Visual C++ Build Tools` component) first for the module `node-gyp` to install `node-windows-manager` (No need for other OS). Meanwhile, `Python3` needs to be installed and the environment variables need to be configured.
2. Install `NodeJS`: [https://nodejs.org/en/download/](https://nodejs.org/en/download/).
3. Run the following commands to install NodeJS packages:
```
npm install
npm install @electron-forge/cli -g
```
## 运行说明|Run Instruction
## 运行说明 | Run Instruction
在当前文件夹执行以下命令即可在开发模式下运行程序:
@ -146,25 +148,23 @@ npm run start_direct
But so far can only design the task, can not execute the task, want to execute the task also need to complete the 'ExecuteStage' folder of the execution of the task program compilation instructions can be executed.
## 打包发布说明|Package Instruction
## 打包发布说明 | Package Instruction
打包发布前,确保执行阶段程序`easyspider_executestage(.exe)`已放入`chrome(_win64)`文件夹内,且浏览器插件`EasySpider_zh.crx`已经是最新版本。
执行下面的命令即可打包:
执行下面的命令即可打包(需要安装`Git`
```
npx electron-forge import
npm run package
```
-----
Before packaging and releasing, make sure that the task execution program `easyspider_executestage(.exe)` is placed inside the `chrome(_win64)` folder and that the browser extension `EasySpider_en.crx` is the latest version.
Before packaging and releasing, make sure that the task execution program `easyspider_executestage(.exe)` is placed inside the `chrome(_win64)` folder and that the browser extension `EasySpider_en.crx` is the latest version.
After finishing developing, package software by the following command:
After finishing developing, package software by the following command (`Git` is required):
```
npx electron-forge import
npm run package
```
@ -186,8 +186,43 @@ package_win64.cmd
clean_and_release_win64.cmd
```
### (可选)编译成安装包|(Optional) Compile to an installation package
## 可能出现的问题 | Troubleshooting
以下命令一般不需要执行,但打包时可能会用到:
```sh
npm install @electron-forge/cli -g
npx electron-forge import
```
npm run make
```
如果任务执行到`npm install electron-squirrel-startup`的步骤时卡死,请参考下面的换源教程:[https://blog.csdn.net/qq_38463737/article/details/140277803](https://blog.csdn.net/qq_38463737/article/details/140277803)。
Windows端如果在运行`npm run package`的时候提示`node-gyp`相关的错误,可以安装`electron-rebuild`并重新编译相关模块:
```sh
npm install --save-dev electron-rebuild
npx electron-rebuild
```
然后再次运行`npm run package`
-----
The following commands are generally not required, but may be used during packaging:
```sh
npm install @electron-forge/cli -g
npx electron-forge import
```
If the task is stuck at the `npm install electron-squirrel-startup` step, please refer to the following tutorial on changing the source: [https://blog.csdn.net/qq_38463737/article/details/140277803](https://blog.csdn.net/qq_38463737/article/details/140277803).
If you encounter `node-gyp` related errors when running `npm run package` on Windows, you can install `electron-rebuild` and recompile the relevant modules:
```sh
npm install --save-dev electron-rebuild
npx electron-rebuild
```
Then run `npm run package` again.

View File

@ -30,7 +30,7 @@ def update_file_version(file_path, new_version, key="当前版本/Current Versio
file.write(line)
version = "0.6.0"
version = "0.6.3"
# py html js
@ -47,7 +47,8 @@ if __name__ == "__main__":
# index.html
file_path = "./src/index.html"
update_file_version(file_path, version, key="当前版本/Current Version: <b>v")
update_file_version(file_path, version, key="软件当前版本:<b>v")
update_file_version(file_path, version, key="Current Version: <b>v")
# package.json
file_path = "./package.json"

View File

@ -11,9 +11,10 @@ del out\EasySpider\resources\app\vs_BuildTools.exe
move out\EasySpider ..\.temp_to_pub\EasySpider_windows_x32\EasySpider
rmdir /s /q ..\.temp_to_pub\EasySpider_windows_x32\Code
mkdir ..\.temp_to_pub\EasySpider_windows_x32\Code
copy ..\ExecuteStage\easyspider_executestage.py ..\.temp_to_pub\EasySpider_windows_x32\Code
copy ..\ExecuteStage\myChrome.py ..\.temp_to_pub\EasySpider_windows_x32\Code
copy ..\ExecuteStage\utils.py ..\.temp_to_pub\EasySpider_windows_x32\Code
@REM copy ..\ExecuteStage\easyspider_executestage.py ..\.temp_to_pub\EasySpider_windows_x32\Code
@REM copy ..\ExecuteStage\myChrome.py ..\.temp_to_pub\EasySpider_windows_x32\Code
@REM copy ..\ExecuteStage\utils.py ..\.temp_to_pub\EasySpider_windows_x32\Code
copy ..\ExecuteStage\*.py ..\.temp_to_pub\EasySpider_windows_x32\Code
copy ..\ExecuteStage\requirements.txt ..\.temp_to_pub\EasySpider_windows_x32\Code
copy ..\ExecuteStage\Readme.md ..\.temp_to_pub\EasySpider_windows_x32\Code
copy ..\ExecuteStage\myCode.py ..\.temp_to_pub\EasySpider_windows_x32

View File

@ -11,9 +11,10 @@ del out\EasySpider\resources\app\vs_BuildTools.exe
move out\EasySpider ..\.temp_to_pub\EasySpider_windows_x64\EasySpider
rmdir /s /Q ..\.temp_to_pub\EasySpider_windows_x64\Code
mkdir ..\.temp_to_pub\EasySpider_windows_x64\Code
copy ..\ExecuteStage\easyspider_executestage.py ..\.temp_to_pub\EasySpider_windows_x64\Code
copy ..\ExecuteStage\myChrome.py ..\.temp_to_pub\EasySpider_windows_x64\Code
copy ..\ExecuteStage\utils.py ..\.temp_to_pub\EasySpider_windows_x64\Code
@REM copy ..\ExecuteStage\easyspider_executestage.py ..\.temp_to_pub\EasySpider_windows_x64\Code
@REM copy ..\ExecuteStage\myChrome.py ..\.temp_to_pub\EasySpider_windows_x64\Code
@REM copy ..\ExecuteStage\utils.py ..\.temp_to_pub\EasySpider_windows_x64\Code
copy ..\ExecuteStage\*.py ..\.temp_to_pub\EasySpider_windows_x64\Code
copy ..\ExecuteStage\requirements.txt ..\.temp_to_pub\EasySpider_windows_x64\Code
copy ..\ExecuteStage\Readme.md ..\.temp_to_pub\EasySpider_windows_x64\Code
copy ..\ExecuteStage\myCode.py ..\.temp_to_pub\EasySpider_windows_x64

View File

@ -1 +1 @@
{"webserver_address":"http://localhost","webserver_port":8074,"user_data_folder":"./user_data","debug":false,"copyright":1,"sys_version":"x64","mysql_config_path":"./mysql_config.json","absolute_user_data_folder":"/Users/naibo/Documents/EasySpider/ElectronJS/user_data"}
{"webserver_address":"http://localhost","webserver_port":8074,"user_data_folder":"./user_data","debug":false,"copyright":1,"sys_version":"x64","mysql_config_path":"./mysql_config.json","absolute_user_data_folder":"D:\\Documents\\Projects\\EasySpider\\ElectronJS\\user_data","lang":"zh"}

View File

@ -50,7 +50,9 @@ if (config.debug) {
}
let allWindowSockets = [];
let allWindowScoketNames = [];
task_server.start(config.webserver_port); //start local server
if(config.webserver_address.includes("localhost") || config.webserver_address.includes("127.0.0.1")) {
task_server.start(config.webserver_port); //start local server
}
let server_address = `${config.webserver_address}:${config.webserver_port}`;
const websocket_port = 8084; //目前只支持8084端口写死因为扩展里面写死了
console.log("server_address: " + server_address);
@ -84,11 +86,11 @@ console.log(process.arch);
if (process.platform === "win32" && process.arch === "ia32") {
driverPath = path.join(__dirname, "chrome_win32/chromedriver_win32.exe");
chromeBinaryPath = path.join(__dirname, "chrome_win32/chrome.exe");
execute_path = path.join(__dirname, "chrome_win32/execute.bat");
execute_path = path.join(__dirname, "chrome_win32/execute_win32.bat");
} else if (process.platform === "win32" && process.arch === "x64") {
driverPath = path.join(__dirname, "chrome_win64/chromedriver_win64.exe");
chromeBinaryPath = path.join(__dirname, "chrome_win64/chrome.exe");
execute_path = path.join(__dirname, "chrome_win64/execute.bat");
execute_path = path.join(__dirname, "chrome_win64/execute_win64.bat");
} else if (process.platform === "darwin") {
driverPath = path.join(__dirname, "chromedriver_mac64");
chromeBinaryPath = path.join(
@ -99,7 +101,7 @@ if (process.platform === "win32" && process.arch === "ia32") {
} else if (process.platform === "linux") {
driverPath = path.join(__dirname, "chrome_linux64/chromedriver_linux64");
chromeBinaryPath = path.join(__dirname, "chrome_linux64/chrome");
execute_path = path.join(__dirname, "chrome_linux64/execute.sh");
execute_path = path.join(__dirname, "chrome_linux64/execute_linux64.sh");
}
console.log(driverPath, chromeBinaryPath, execute_path);
let language = "en";
@ -112,6 +114,7 @@ let handle_pairs = {};
let socket_window = null;
let socket_start = null;
let socket_flowchart = null;
let socket_popup = null;
let invoke_window = null;
// var ffi = require('ffi-napi');
@ -148,8 +151,8 @@ function createWindow() {
server_address +
"/index.html?user_data_folder=" +
config.user_data_folder +
"&copyright=" +
config.copyright,
"&copyright=" + config.copyright +
"&lang=" + config.lang,
{extraHeaders: "pragma: no-cache\n"}
);
// 隐藏菜单栏
@ -160,9 +163,8 @@ function createWindow() {
app.quit();
}
});
//调试模式
// mainWindow.webContents.openDevTools();
// Open the DevTools.
// mainWindow.webContents.openDevTools()
}
async function findElementRecursive(driver, by, value, frames) {
@ -243,6 +245,7 @@ async function findElementAcrossAllWindows(
let handles = await driver.getAllWindowHandles();
// console.log("handles", handles);
let content_handle = current_handle;
let old_handle = current_handle;
let id = -1;
try {
id = msg.message.id;
@ -289,12 +292,12 @@ async function findElementAcrossAllWindows(
xpath = msg.xpath;
}
}
if (xpath.indexOf("Field(") >= 0 || xpath.indexOf("eval(") >= 0) {
if (xpath.indexOf("Field[") >= 0 || xpath.indexOf("eval(") >= 0) {
//两秒后通知浏览器
await new Promise((resolve) => setTimeout(resolve, 2000));
notify_browser(
'检测到XPath中包含Field("")或eval(""),试运行时无法正常定位到包含此两项表达式的元素,请在任务正式运行阶段测试是否有效。',
'Field("") or eval("") is detected in xpath, and the element containing these two expressions cannot be located normally during trial operation. Please test whether it is valid in the formal call stage.',
'检测到XPath中包含Field[""]或eval(""),试运行时无法正常定位到包含此两项表达式的元素,请在任务正式运行阶段测试是否有效。',
'Field[""] or eval("") is detected in xpath, and the element containing these two expressions cannot be located normally during trial operation. Please test whether it is valid in the formal call stage.',
"warning"
);
return null;
@ -308,7 +311,7 @@ async function findElementAcrossAllWindows(
if (h != null && handles.includes(h)) {
await driver.switchTo().window(h);
current_handle = h;
console.log("switch to handle: ", h);
console.log("Switch to handle: ", h);
}
element = await findElement(driver, By.xpath, xpath, iframe);
break;
@ -325,6 +328,12 @@ async function findElementAcrossAllWindows(
}
}
if (element == null && notifyBrowser) {
// 如果找不到元素,切换回原来的窗口
if (old_handle != null && handles.includes(old_handle)) {
await driver.switchTo().window(old_handle);
current_handle = old_handle;
console.log("Switch to handle: ", old_handle);
}
notify_browser(
"无法找到元素请检查XPath是否正确" + xpath,
"Cannot find the element, please check if the XPath is correct: " + xpath,
@ -651,7 +660,15 @@ async function beginInvoke(msg, ws) {
if (parameters.xpath.includes("point(")) {
await click_element(element, point);
} else {
await click_element(element);
if (parameters.clickWay == 2){ //双击
await click_element(element, "double");
} else {
if (parameters.newTab == 1){
await click_element(element, "loopClickEvery"); //新标签页打开
} else {
await click_element(element); //单击
}
}
}
let alertHandleType = parameters.alertHandleType;
if (alertHandleType == 1) {
@ -757,12 +774,12 @@ async function beginInvoke(msg, ws) {
keyInfo = keyInfo.replace(match[0], jsReplacedText.toString());
}
}
if (keyInfo.indexOf("Field(") >= 0 || keyInfo.indexOf("eval(") >= 0) {
if (keyInfo.indexOf("Field[") >= 0 || keyInfo.indexOf("eval(") >= 0) {
//两秒后通知浏览器
await new Promise((resolve) => setTimeout(resolve, 2000));
notify_browser(
'检测到文字中包含Field("")或eval(""),试运行时无法输入两项表达式的替换值,请在任务正式运行阶段测试是否有效。',
'Field("") or eval("") is detected in the text, and the replacement value of the two expressions cannot be entered during trial operation. Please test whether it is valid in the formal call stage.',
'检测到文字中包含Field[""]或eval(""),试运行时无法输入两项表达式的替换值,请在任务正式运行阶段测试是否有效。',
'Field[""] or eval("") is detected in the text, and the replacement value of the two expressions cannot be entered during trial operation. Please test whether it is valid in the formal call stage.',
"warning"
);
}
@ -787,7 +804,41 @@ async function beginInvoke(msg, ws) {
let waitTime = parameters.waitTime;
let element = await driver.findElement(By.tagName("body"));
if (codeMode == 0) {
await execute_js(code, element, waitTime);
let result = await execute_js(code, element, waitTime);
let level = "success";
if (result == -1) {
level = "info";
}
if (result != null) {
notify_browser(
"JavaScript操作返回结果" + result,
"JavaScript operation returns result: " + result,
level
);
}
} else if (codeMode == 2) { // 循环内的JS代码
let parent_node = JSON.parse(msg.message.parentNode);
let parent_xpath = parent_node.parameters.xpath;
if (parent_node.parameters.loopType == 2) {
parent_xpath = parent_node.parameters.pathList
.split("\n")[0]
.trim();
}
let elementInfo = {iframe: parameters.iframe, xpath: parent_xpath, id: -1};
let element = await findElementAcrossAllWindows(
elementInfo, notifyBrowser = false); //通过此函数找到元素并切换到对应的窗口
let result = await execute_js(code, element, waitTime);
let level = "success";
if (result == -1) {
level = "info";
}
if (result != null) {
notify_browser(
"JavaScript操作返回结果" + result,
"JavaScript operation returns result: " + result,
level
);
}
} else if (codeMode == 8) {
//刷新页面
try {
@ -859,11 +910,11 @@ async function beginInvoke(msg, ws) {
execute_js(afterJS, element, afterJSWaitTime);
} else if (option == 11) {
//单个提取数据参数
notify_browser(
"提示提取数据操作只能试运行设置的JavaScript语句且只针对第一个匹配的元素。",
"Hint: can only test JavaScript statement set in the data extraction operation, and only for the first matching element.",
"info"
);
// notify_browser(
// "提示:提取数据字段的试运行操作只针对第一个匹配的元素。",
// "Hint: can only test the trial operation of the data extraction field for the first matching element.",
// "info"
// );
let params = parameters.params; //所有的提取数据参数
let i = parameters.index;
let param = params[i];
@ -879,12 +930,111 @@ async function beginInvoke(msg, ws) {
xpath = parent_xpath + xpath;
}
let elementInfo = {iframe: param.iframe, xpath: xpath, id: -1};
let element = await findElementAcrossAllWindows(
elementInfo,
(notifyBrowser = false)
);
let element = await findElementAcrossAllWindows(elementInfo);
if (element != null) {
await execute_js(param.beforeJS, element, param.beforeJSWaitTime);
if (param.contentType == 0) {
let result = await element.getText(); // 获取元素及其子元素的文本内容
if (param.nodeType == 2) { //链接地址
result = await element.getAttribute("href");
notify_browser("获取的链接地址:" + result, "Link URL obtained: " + result, "success")
} else if (param.nodeType == 3) { //表单值
result = await element.getAttribute("value");
notify_browser("获取的表单值:" + result, "Form value obtained: " + result, "success")
} else if (param.nodeType == 4) { //图片地址
result = await element.getAttribute("src");
notify_browser("获取的图片地址:" + result, "Image URL obtained: " + result, "success")
} else {
notify_browser("获取的文本内容:" + result, "Text content obtained: " + result, "success");
}
} else if (param.contentType == 1) {
// 对于Selenium获取不包括子元素的文本可能需要特殊处理这里假设element是父元素
let command = 'var arr = [];\
var content = arguments[0];\
for(var i = 0, len = content.childNodes.length; i < len; i++) {\
if(content.childNodes[i].nodeType === 3){ \
arr.push(content.childNodes[i].nodeValue);\
}\
}\
var str = arr.join(" "); \
return str;'
let result = await execute_js(command, element, 0);
result = result.replace(/\n/g, "").replace(/\s+/g, " ");
notify_browser("获取的内容:" + result, "Content obtained: " + result, "success");
} else if (param.contentType == 2) {
let result = await element.getAttribute('innerHTML'); // 获取元素的内部HTML内容
notify_browser("获取的innerHTML" + result, "innerHTML obtained: " + result, "success");
} else if (param.contentType == 3) {
let result = await element.getAttribute('outerHTML'); // 获取元素及其内容的HTML表示
notify_browser("获取的outerHTML" + result, "outerHTML obtained: " + result, "success");
} else if (param.contentType == 4) {
let result = await element.getCssValue('background-image'); // 获取元素的背景图片地址
notify_browser("获取的背景图片地址:" + result, "Background image URL obtained: " + result, "success");
} else if (param.contentType == 5) {
let result = await driver.getCurrentUrl(); // 获取页面的网址
notify_browser("获取的页面网址:" + result, "Page URL obtained: " + result, "success");
} else if (param.contentType == 6) { //页面标题
let result = await driver.getTitle();
notify_browser("获取的页面标题:" + result, "Page title obtained: " + result, "success");
} else if (param.contentType == 9) { //针对元素的JavaScript代码返回值
let result = await execute_js(param.JS, element);
let level = "success";
if (result == -1) {
level = "info";
}
if (result != null) {
notify_browser(
"JavaScript操作返回结果" + result,
"JavaScript operation returns result: " + result,
level
);
}
} else if (param.contentType == 10) {
// 当前选择框选中的选项值
let result = await element.getAttribute("value");
notify_browser(
"获取的选项值:" + result,
"Option value obtained: " + result,
"success"
);
} else if (param.contentType == 11) {
// 当前选择框选中的选项文本
let selectElement = new Select(element);
// 等待选项变得可选,这是可选的,根据页面加载情况
await driver.wait(until.elementIsEnabled(element));
// 获取当前选中的选项元素
let selectedOption = await selectElement.getFirstSelectedOption();
// 获取选项的文本内容
let content = await selectedOption.getText();
notify_browser(
"获取的选项文本:" + content,
"Option text obtained: " + content,
"success"
);
} else if (param.contentType == 14) {
//元素的属性值
let result = await element.getAttribute(param.JS);
notify_browser(
"获取的属性值:" + result,
"Attribute value obtained: " + result,
"success"
);
} else if(param.contentType == 15) {
//元素的属性值
let result = param.JS;
notify_browser(
"获取的常量值:" + result,
"Constant value obtained: " + result,
"success"
);
} else {
//其他暂不支持
notify_browser(
"暂不支持测试此类型的数据提取,请在任务正式运行阶段测试是否有效。",
"This type of data extraction is not supported for testing. Please test whether it is valid in the formal call stage.",
"warning"
);
}
await execute_js(param.afterJS, element, param.afterJSWaitTime);
}
}
@ -982,18 +1132,41 @@ async function beginInvoke(msg, ws) {
} catch {
console.log("Cannot get Cookies");
}
} else if (msg.type == 30) {
send_message_to_browser(
JSON.stringify({
type: "showAllToolboxes"
})
);
console.log("Show all toolboxes");
} else if (msg.type == 31) {
send_message_to_browser(
JSON.stringify({
type: "hideAllToolboxes"
})
);
console.log("Hide all toolboxes");
}
}
async function click_element(element, type = "click") {
try {
if (type == "loopClickEvery") {
await driver
if (process.platform === "darwin") {
await driver
.actions()
.keyDown(Key.COMMAND)
.click(element)
.keyUp(Key.COMMAND)
.perform();
} else {
await driver
.actions()
.keyDown(Key.CONTROL)
.click(element)
.keyUp(Key.CONTROL)
.perform();
}
} else if (type.includes("point(")) {
//point(10, 20)表示点击坐标为(10, 20)的位置
let point = type.substring(6, type.length - 1).split(",");
@ -1005,6 +1178,8 @@ async function click_element(element, type = "click") {
// await actions.click().perform();
let script = `document.elementFromPoint(${x}, ${y}).click();`;
await driver.executeScript(script);
} else if (type == "double") {
await driver.actions().doubleClick(element).perform();
} else {
await element.click();
}
@ -1038,12 +1213,12 @@ async function execute_js(js, element, wait_time = 3) {
);
outcome = -1;
}
if (js.indexOf("Field(") >= 0 || js.indexOf("eval(") >= 0) {
if (js.indexOf("Field[") >= 0 || js.indexOf("eval(") >= 0) {
//两秒后通知浏览器
await new Promise((resolve) => setTimeout(resolve, 2000));
notify_browser(
'检测到JavaScript中包含Field("")或eval(""),试运行时无法执行两项表达式,请在任务正式运行阶段测试是否有效。',
'Field("") or eval("") is detected in JavaScript, and the two expressions cannot be executed during trial operation. Please test whether it is valid in the formal call stage.',
'检测到JavaScript中包含Field[""]或eval(""),试运行时无法执行两项表达式,请在任务正式运行阶段测试是否有效。',
'Field[""] or eval("") is detected in JavaScript, and the two expressions cannot be executed during trial operation. Please test whether it is valid in the formal call stage.',
"warning"
);
}
@ -1063,6 +1238,9 @@ function notify_flowchart(msg_zh, msg_en, level = "info") {
}
function notify_browser(msg_zh, msg_en, level = "info") {
if (msg_zh.split("").length > 1 && msg_zh.split("")[1].includes("null")) {
level = "warning";
}
send_message_to_browser(
JSON.stringify({
type: "notify",
@ -1111,6 +1289,9 @@ wss.on("connection", function (ws) {
// console.log("socket_flowchart closed");
// });
console.log("set socket_flowchart at time: ", new Date());
} else if (msg.message.id == 3) {
socket_popup = ws;
console.log("set socket_popup at time: ", new Date());
} else {
//其他的ID是用来标识不同的浏览器标签页的
// await new Promise(resolve => setTimeout(resolve, 200));
@ -1213,6 +1394,8 @@ async function runBrowser(lang = "en", user_data_folder = "", mobile = false) {
let options = new chrome.Options();
options.addArguments("--disable-blink-features=AutomationControlled");
options.addArguments("--disable-infobars");
options.addArguments("--disable-web-security");
options.addArguments("--disable-features=CrossSiteDocumentBlockingIfIsolating,CrossSiteDocumentBlockingAlways,IsolateOrigins,site-per-process");
// 添加实验性选项以排除'enable-automation'开关
options.set("excludeSwitches", ["enable-automation"]);
options.excludeSwitches("enable-automation");
@ -1399,6 +1582,17 @@ app.whenReady().then(() => {
path.join(task_server.getDir(), "config.json"),
JSON.stringify(config)
);
//重新读取配置文件
config = JSON.parse(fs.readFileSync(path.join(task_server.getDir(), "config.json")));
});
ipcMain.on("change-lang", function (event, arg) {
config.lang = arg;
fs.writeFileSync(
path.join(task_server.getDir(), "config.json"),
JSON.stringify(config)
);
//重新读取配置文件
config = JSON.parse(fs.readFileSync(path.join(task_server.getDir(), "config.json")));
});
createWindow();

View File

@ -1,24 +1,24 @@
{
"name": "easy-spider",
"version": "0.6.0",
"version": "0.6.3",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "easy-spider",
"version": "0.6.0",
"version": "0.6.3",
"license": "AGPL-3.0",
"dependencies": {
"cors": "^2.8.5",
"electron-squirrel-startup": "^1.0.0",
"express": "^4.18.2",
"express": "^4.21.2",
"formidable": "^3.5.0",
"http": "^0.0.1-security",
"multer": "^1.4.5-lts.1",
"node-abi": "^3.52.0",
"node-window-manager": "^2.2.4",
"selenium-webdriver": "^4.16.0",
"ws": "^8.12.0",
"selenium-webdriver": "^4.27.0",
"ws": "^8.18.0",
"xlsx": "^0.18.5"
},
"devDependencies": {
@ -30,6 +30,11 @@
"electron": "^27.1.3"
}
},
"node_modules/@bazel/runfiles": {
"version": "6.3.1",
"resolved": "https://registry.npmjs.org/@bazel/runfiles/-/runfiles-6.3.1.tgz",
"integrity": "sha512-1uLNT5NZsUVIGS4syuHwTzZ8HycMPyr6POA3FCE4GbMtc4rhoJk8aZKtNIRthJYfL+iioppi+rTfH3olMPr9nA=="
},
"node_modules/@electron-forge/cli": {
"version": "6.2.1",
"dev": true,
@ -1203,6 +1208,7 @@
},
"node_modules/balanced-match": {
"version": "1.0.2",
"dev": true,
"license": "MIT"
},
"node_modules/base64-js": {
@ -1253,20 +1259,20 @@
"license": "MIT"
},
"node_modules/body-parser": {
"version": "1.20.1",
"resolved": "https://registry.npmjs.org/body-parser/-/body-parser-1.20.1.tgz",
"integrity": "sha512-jWi7abTbYwajOytWCQc37VulmWiRae5RyTpaCyDcS5/lMdtwSz5lOpDE67srw/HYe35f1z3fDQw+3txg7gNtWw==",
"version": "1.20.3",
"resolved": "https://registry.npmjs.org/body-parser/-/body-parser-1.20.3.tgz",
"integrity": "sha512-7rAxByjUMqQ3/bHJy7D6OGXvx/MMc4IqBn/X0fcM1QUcAItpZrBEYhWGem+tzXH90c+G01ypMcYJBO9Y30203g==",
"dependencies": {
"bytes": "3.1.2",
"content-type": "~1.0.4",
"content-type": "~1.0.5",
"debug": "2.6.9",
"depd": "2.0.0",
"destroy": "1.2.0",
"http-errors": "2.0.0",
"iconv-lite": "0.4.24",
"on-finished": "2.4.1",
"qs": "6.11.0",
"raw-body": "2.5.1",
"qs": "6.13.0",
"raw-body": "2.5.2",
"type-is": "~1.6.18",
"unpipe": "1.0.0"
},
@ -1307,6 +1313,7 @@
},
"node_modules/brace-expansion": {
"version": "1.1.11",
"dev": true,
"license": "MIT",
"dependencies": {
"balanced-match": "^1.0.0",
@ -1314,11 +1321,12 @@
}
},
"node_modules/braces": {
"version": "3.0.2",
"version": "3.0.3",
"resolved": "https://registry.npmjs.org/braces/-/braces-3.0.3.tgz",
"integrity": "sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==",
"dev": true,
"license": "MIT",
"dependencies": {
"fill-range": "^7.0.1"
"fill-range": "^7.1.1"
},
"engines": {
"node": ">=8"
@ -1480,12 +1488,18 @@
}
},
"node_modules/call-bind": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/call-bind/-/call-bind-1.0.2.tgz",
"integrity": "sha512-7O+FbCihrB5WGbFYesctwmTKae6rOiIzmz1icreWJ+0aA7LJfuqhEso2T9ncpcFtzMQtzXf2QGGueWJGTYsqrA==",
"version": "1.0.7",
"resolved": "https://registry.npmjs.org/call-bind/-/call-bind-1.0.7.tgz",
"integrity": "sha512-GHTSNSYICQ7scH7sZ+M2rFopRoLh8t2bLSW6BbgrtLsahOIB5iyAVJf9GjWK3cYTDaMj4XdBpM1cA6pIS0Kv2w==",
"dependencies": {
"function-bind": "^1.1.1",
"get-intrinsic": "^1.0.2"
"es-define-property": "^1.0.0",
"es-errors": "^1.3.0",
"function-bind": "^1.1.2",
"get-intrinsic": "^1.2.4",
"set-function-length": "^1.2.1"
},
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
@ -1661,6 +1675,7 @@
},
"node_modules/concat-map": {
"version": "0.0.1",
"dev": true,
"license": "MIT"
},
"node_modules/concat-stream": {
@ -1721,9 +1736,9 @@
}
},
"node_modules/cookie": {
"version": "0.5.0",
"resolved": "https://registry.npmjs.org/cookie/-/cookie-0.5.0.tgz",
"integrity": "sha512-YZ3GUyn/o8gfKJlnlX7g7xq4gyO6OSuhGPKaaGssGB2qgDUS0gPgtTvoyZLTt9Ab6dC4hfc9dV5arkvc/OCmrw==",
"version": "0.7.1",
"resolved": "https://registry.npmjs.org/cookie/-/cookie-0.7.1.tgz",
"integrity": "sha512-6DnInpx7SJ2AK3+CTUE/ZM0vWTUboZCegxhC2xiIydHR9jNuTAASBrfEpHhiGOZw/nX51bHt6YQl8jsGo4y/0w==",
"engines": {
"node": ">= 0.6"
}
@ -1898,6 +1913,22 @@
"node": ">=10"
}
},
"node_modules/define-data-property": {
"version": "1.1.4",
"resolved": "https://registry.npmjs.org/define-data-property/-/define-data-property-1.1.4.tgz",
"integrity": "sha512-rBMvIzlpA8v6E+SJZoo++HAYqsLrkg7MSfIinMPFhmkorw7X+dOXVJQs+QT69zGkzMyfDnIMN2Wid1+NbL3T+A==",
"dependencies": {
"es-define-property": "^1.0.0",
"es-errors": "^1.3.0",
"gopd": "^1.0.1"
},
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/define-properties": {
"version": "1.2.0",
"dev": true,
@ -2166,9 +2197,9 @@
"license": "MIT"
},
"node_modules/encodeurl": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-1.0.2.tgz",
"integrity": "sha512-TPJXq8JqFaVYm2CWmPvnP2Iyo4ZSM7/QKcSmuMLDObfpH5fi7RUGmd/rTDf+rut/saiDiQEeVTNgAmJEdAOx0w==",
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-2.0.0.tgz",
"integrity": "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==",
"engines": {
"node": ">= 0.8"
}
@ -2211,6 +2242,25 @@
"is-arrayish": "^0.2.1"
}
},
"node_modules/es-define-property": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.0.tgz",
"integrity": "sha512-jxayLKShrEqqzJ0eumQbVhTYQM27CfT1T35+gCgDFoL82JLsXqTJ76zv6A0YLOgEnLUMvLzsDsGIrl8NFpT2gQ==",
"dependencies": {
"get-intrinsic": "^1.2.4"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/es-errors": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz",
"integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==",
"engines": {
"node": ">= 0.4"
}
},
"node_modules/es6-error": {
"version": "4.1.1",
"dev": true,
@ -2356,36 +2406,36 @@
"license": "Apache-2.0"
},
"node_modules/express": {
"version": "4.18.2",
"resolved": "https://registry.npmjs.org/express/-/express-4.18.2.tgz",
"integrity": "sha512-5/PsL6iGPdfQ/lKM1UuielYgv3BUoJfz1aUwU9vHZ+J7gyvwdQXFEBIEIaxeGf0GIcreATNyBExtalisDbuMqQ==",
"version": "4.21.2",
"resolved": "https://registry.npmjs.org/express/-/express-4.21.2.tgz",
"integrity": "sha512-28HqgMZAmih1Czt9ny7qr6ek2qddF4FclbMzwhCREB6OFfH+rXAnuNCwo1/wFvrtbgsQDb4kSbX9de9lFbrXnA==",
"dependencies": {
"accepts": "~1.3.8",
"array-flatten": "1.1.1",
"body-parser": "1.20.1",
"body-parser": "1.20.3",
"content-disposition": "0.5.4",
"content-type": "~1.0.4",
"cookie": "0.5.0",
"cookie": "0.7.1",
"cookie-signature": "1.0.6",
"debug": "2.6.9",
"depd": "2.0.0",
"encodeurl": "~1.0.2",
"encodeurl": "~2.0.0",
"escape-html": "~1.0.3",
"etag": "~1.8.1",
"finalhandler": "1.2.0",
"finalhandler": "1.3.1",
"fresh": "0.5.2",
"http-errors": "2.0.0",
"merge-descriptors": "1.0.1",
"merge-descriptors": "1.0.3",
"methods": "~1.1.2",
"on-finished": "2.4.1",
"parseurl": "~1.3.3",
"path-to-regexp": "0.1.7",
"path-to-regexp": "0.1.12",
"proxy-addr": "~2.0.7",
"qs": "6.11.0",
"qs": "6.13.0",
"range-parser": "~1.2.1",
"safe-buffer": "5.2.1",
"send": "0.18.0",
"serve-static": "1.15.0",
"send": "0.19.0",
"serve-static": "1.16.2",
"setprototypeof": "1.2.0",
"statuses": "2.0.1",
"type-is": "~1.6.18",
@ -2394,6 +2444,10 @@
},
"engines": {
"node": ">= 0.10.0"
},
"funding": {
"type": "opencollective",
"url": "https://opencollective.com/express"
}
},
"node_modules/express/node_modules/debug": {
@ -2515,9 +2569,10 @@
}
},
"node_modules/fill-range": {
"version": "7.0.1",
"version": "7.1.1",
"resolved": "https://registry.npmjs.org/fill-range/-/fill-range-7.1.1.tgz",
"integrity": "sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg==",
"dev": true,
"license": "MIT",
"dependencies": {
"to-regex-range": "^5.0.1"
},
@ -2526,12 +2581,12 @@
}
},
"node_modules/finalhandler": {
"version": "1.2.0",
"resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-1.2.0.tgz",
"integrity": "sha512-5uXcUVftlQMFnWC9qu/svkWv3GTd2PfUhK/3PLkYNAe7FbqJMt3515HaxE6eRL74GdsriiwujiawdaB1BpEISg==",
"version": "1.3.1",
"resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-1.3.1.tgz",
"integrity": "sha512-6BN9trH7bp3qvnrRyzsBz+g3lZxTNZTbVO2EV1CS0WIcDbawYVdYvGflME/9QP0h0pYlCDBCTjYa9nZzMDpyxQ==",
"dependencies": {
"debug": "2.6.9",
"encodeurl": "~1.0.2",
"encodeurl": "~2.0.0",
"escape-html": "~1.0.3",
"on-finished": "2.4.1",
"parseurl": "~1.3.3",
@ -2695,11 +2750,16 @@
},
"node_modules/fs.realpath": {
"version": "1.0.0",
"dev": true,
"license": "ISC"
},
"node_modules/function-bind": {
"version": "1.1.1",
"license": "MIT"
"version": "1.1.2",
"resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz",
"integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==",
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/galactus": {
"version": "0.2.1",
@ -2780,13 +2840,18 @@
}
},
"node_modules/get-intrinsic": {
"version": "1.2.1",
"license": "MIT",
"version": "1.2.4",
"resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.2.4.tgz",
"integrity": "sha512-5uYhsJH8VJBTv7oslg4BznJYhDoRI6waYCxMmCdnTrcCrHA/fCFKoTFz2JKKE0HdDFUF7/oQuhzumXJK7paBRQ==",
"dependencies": {
"function-bind": "^1.1.1",
"has": "^1.0.3",
"es-errors": "^1.3.0",
"function-bind": "^1.1.2",
"has-proto": "^1.0.1",
"has-symbols": "^1.0.3"
"has-symbols": "^1.0.3",
"hasown": "^2.0.0"
},
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
@ -2835,6 +2900,7 @@
},
"node_modules/glob": {
"version": "7.2.3",
"dev": true,
"license": "ISC",
"dependencies": {
"fs.realpath": "^1.0.0",
@ -2933,6 +2999,17 @@
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/gopd": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/gopd/-/gopd-1.0.1.tgz",
"integrity": "sha512-d65bNlIadxvpb/A2abVdlqKqV563juRnZ1Wtk6s1sIR8uNsXR70xqIzVqxVf1eTqDunwT2MkczEeaezCKTZhwA==",
"dependencies": {
"get-intrinsic": "^1.1.3"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/got": {
"version": "11.8.6",
"dev": true,
@ -2964,6 +3041,7 @@
},
"node_modules/has": {
"version": "1.0.3",
"dev": true,
"license": "MIT",
"dependencies": {
"function-bind": "^1.1.1"
@ -2981,12 +3059,11 @@
}
},
"node_modules/has-property-descriptors": {
"version": "1.0.0",
"dev": true,
"license": "MIT",
"optional": true,
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/has-property-descriptors/-/has-property-descriptors-1.0.2.tgz",
"integrity": "sha512-55JNKuIW+vq4Ke1BjOTjM2YctQIvCT7GFzHwmfZPGo5wnrgkid0YQtnAleFSqumZm4az3n2BS+erby5ipJdgrg==",
"dependencies": {
"get-intrinsic": "^1.1.1"
"es-define-property": "^1.0.0"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
@ -3017,6 +3094,17 @@
"dev": true,
"license": "ISC"
},
"node_modules/hasown": {
"version": "2.0.2",
"resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
"integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==",
"dependencies": {
"function-bind": "^1.1.2"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/hexoid": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/hexoid/-/hexoid-1.0.0.tgz",
@ -3162,6 +3250,7 @@
},
"node_modules/inflight": {
"version": "1.0.6",
"dev": true,
"license": "ISC",
"dependencies": {
"once": "^1.3.0",
@ -3186,9 +3275,10 @@
}
},
"node_modules/ip": {
"version": "2.0.0",
"dev": true,
"license": "MIT"
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/ip/-/ip-2.0.1.tgz",
"integrity": "sha512-lJUL9imLTNi1ZfXT+DU6rBBdbiKGBuay9B6xGSPVjUeQwaH1RIGqef8RZkUtHioLmSNpPR5M4HVKJGm1j8FWVQ==",
"dev": true
},
"node_modules/ipaddr.js": {
"version": "1.9.1",
@ -3270,8 +3360,9 @@
},
"node_modules/is-number": {
"version": "7.0.0",
"resolved": "https://registry.npmjs.org/is-number/-/is-number-7.0.0.tgz",
"integrity": "sha512-41Cifkg6e8TylSpdtTpeLVMqvSBEVzTttHvERD741+pnZ8ANv0004MRL43QKPDlK9cGvNp6NZWZUBlbGXYxxng==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=0.12.0"
}
@ -3640,9 +3731,12 @@
}
},
"node_modules/merge-descriptors": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-1.0.1.tgz",
"integrity": "sha512-cCi6g3/Zr1iqQi6ySbseM1Xvooa98N0w31jzUYrXPX2xqObmFGHJ0tQ5u74H3mVh7wLouTseZyYIq39g8cNp1w=="
"version": "1.0.3",
"resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-1.0.3.tgz",
"integrity": "sha512-gaNvAS7TZ897/rVaZ0nMtAyxNyi/pdbjbAwUpFQpN70GqnVfOiXpeUUMKRBmzXaSQ8DdTX4/0ms62r2K+hE6mQ==",
"funding": {
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/merge2": {
"version": "1.4.1",
@ -3720,6 +3814,7 @@
},
"node_modules/minimatch": {
"version": "3.1.2",
"dev": true,
"license": "ISC",
"dependencies": {
"brace-expansion": "^1.1.7"
@ -4086,9 +4181,12 @@
}
},
"node_modules/object-inspect": {
"version": "1.12.3",
"resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.12.3.tgz",
"integrity": "sha512-geUvdk7c+eizMNUDkRpW1wJwgfOiOeHbxBR/hLXK1aT6zmVSO0jsQcs7fj6MGw89jC/cjGfLcNOrtMYtGqm81g==",
"version": "1.13.2",
"resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.2.tgz",
"integrity": "sha512-IRZSRuzJiynemAXPYtPe5BoI/RESNYR7TYm50MC5Mqbd3Jmw5y790sErYw3V6SryFJD64b74qQQs9wn5Bg/k3g==",
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
@ -4284,6 +4382,7 @@
},
"node_modules/path-is-absolute": {
"version": "1.0.1",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=0.10.0"
@ -4326,9 +4425,9 @@
}
},
"node_modules/path-to-regexp": {
"version": "0.1.7",
"resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-0.1.7.tgz",
"integrity": "sha512-5DFkuoqlv1uYQKxy8omFBeJPQcdoE07Kv2sferDCrAq1ohOU+MSDswDIbnx3YAM60qIOnYa53wBhXW0EbMonrQ=="
"version": "0.1.12",
"resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-0.1.12.tgz",
"integrity": "sha512-RA1GjUVMnvYFxuqovrEqZoxxW5NUZqbwKtYz/Tt7nXerk0LbLblQmrsgdeOxV5SFHf0UDggjS/bSeOZwt1pmEQ=="
},
"node_modules/path-type": {
"version": "2.0.0",
@ -4498,11 +4597,11 @@
}
},
"node_modules/qs": {
"version": "6.11.0",
"resolved": "https://registry.npmjs.org/qs/-/qs-6.11.0.tgz",
"integrity": "sha512-MvjoMCJwEarSbUYk5O+nmoSzSutSsTwF85zcHPQ9OrlFoZOYIjaqBAJIqIXjptyD5vThxGq52Xu/MaJzRkIk4Q==",
"version": "6.13.0",
"resolved": "https://registry.npmjs.org/qs/-/qs-6.13.0.tgz",
"integrity": "sha512-+38qI9SOr8tfZ4QmJNplMUxqjbe7LKvvZgWdExBOmd+egZTtjLB67Gu0HRX3u/XOq7UU2Nx6nsjvS16Z9uwfpg==",
"dependencies": {
"side-channel": "^1.0.4"
"side-channel": "^1.0.6"
},
"engines": {
"node": ">=0.6"
@ -4550,9 +4649,9 @@
}
},
"node_modules/raw-body": {
"version": "2.5.1",
"resolved": "https://registry.npmjs.org/raw-body/-/raw-body-2.5.1.tgz",
"integrity": "sha512-qqJBtEyVgS0ZmPGdCFPWJ3FreoqvG4MVQln/kCgF7Olq95IbOp0/BWyMwbdtn4VTvkM8Y7khCQ2Xgk/tcrCXig==",
"version": "2.5.2",
"resolved": "https://registry.npmjs.org/raw-body/-/raw-body-2.5.2.tgz",
"integrity": "sha512-8zGqypfENjCIqGhgXToC8aB2r7YrBX+AQAfIPs/Mlk+BtPTztOvTS01NRW/3Eh60J+a48lt8qsCzirQ6loCVfA==",
"dependencies": {
"bytes": "3.1.2",
"http-errors": "2.0.0",
@ -4734,6 +4833,7 @@
},
"node_modules/rimraf": {
"version": "3.0.2",
"dev": true,
"license": "ISC",
"dependencies": {
"glob": "^7.1.3"
@ -4801,16 +4901,27 @@
"license": "MIT"
},
"node_modules/selenium-webdriver": {
"version": "4.16.0",
"resolved": "https://registry.npmjs.org/selenium-webdriver/-/selenium-webdriver-4.16.0.tgz",
"integrity": "sha512-IbqpRpfGE7JDGgXHJeWuCqT/tUqnLvZ14csSwt+S8o4nJo3RtQoE9VR4jB47tP/A8ArkYsh/THuMY6kyRP6kuA==",
"version": "4.27.0",
"resolved": "https://registry.npmjs.org/selenium-webdriver/-/selenium-webdriver-4.27.0.tgz",
"integrity": "sha512-LkTJrNz5socxpPnWPODQ2bQ65eYx9JK+DQMYNihpTjMCqHwgWGYQnQTCAAche2W3ZP87alA+1zYPvgS8tHNzMQ==",
"funding": [
{
"type": "github",
"url": "https://github.com/sponsors/SeleniumHQ"
},
{
"type": "opencollective",
"url": "https://opencollective.com/selenium"
}
],
"dependencies": {
"@bazel/runfiles": "^6.3.1",
"jszip": "^3.10.1",
"tmp": "^0.2.1",
"ws": ">=8.14.2"
"tmp": "^0.2.3",
"ws": "^8.18.0"
},
"engines": {
"node": ">= 14.20.0"
"node": ">= 14.21.0"
}
},
"node_modules/semver": {
@ -4843,9 +4954,9 @@
}
},
"node_modules/send": {
"version": "0.18.0",
"resolved": "https://registry.npmjs.org/send/-/send-0.18.0.tgz",
"integrity": "sha512-qqWzuOjSFOuqPjFe4NOsMLafToQQwBSOEpS+FwEt3A2V3vKubTquT3vmLTQpFgMXp8AlFWFuP1qKaJZOtPpVXg==",
"version": "0.19.0",
"resolved": "https://registry.npmjs.org/send/-/send-0.19.0.tgz",
"integrity": "sha512-dW41u5VfLXu8SJh5bwRmyYUbAoSB3c9uQh6L8h/KtsFREPWpbX1lrljJo186Jc4nmci/sGUZ9a0a0J2zgfq2hw==",
"dependencies": {
"debug": "2.6.9",
"depd": "2.0.0",
@ -4878,6 +4989,14 @@
"resolved": "https://registry.npmjs.org/ms/-/ms-2.0.0.tgz",
"integrity": "sha512-Tpp60P6IUJDTuOq/5Z8cdskzJujfwqfOTkrwIwj7IRISpnkJnT6SyJ4PCPnGMoFjC9ddhal5KVIYtAt97ix05A=="
},
"node_modules/send/node_modules/encodeurl": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-1.0.2.tgz",
"integrity": "sha512-TPJXq8JqFaVYm2CWmPvnP2Iyo4ZSM7/QKcSmuMLDObfpH5fi7RUGmd/rTDf+rut/saiDiQEeVTNgAmJEdAOx0w==",
"engines": {
"node": ">= 0.8"
}
},
"node_modules/send/node_modules/ms": {
"version": "2.1.3",
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
@ -4911,14 +5030,14 @@
}
},
"node_modules/serve-static": {
"version": "1.15.0",
"resolved": "https://registry.npmjs.org/serve-static/-/serve-static-1.15.0.tgz",
"integrity": "sha512-XGuRDNjXUijsUL0vl6nSD7cwURuzEgglbOaFuZM9g3kwDXOWVTck0jLzjPzGD+TazWbboZYu52/9/XPdUgne9g==",
"version": "1.16.2",
"resolved": "https://registry.npmjs.org/serve-static/-/serve-static-1.16.2.tgz",
"integrity": "sha512-VqpjJZKadQB/PEbEwvFdO43Ax5dFBZ2UECszz8bQ7pi7wt//PWe1P6MN7eCnjsatYtBT6EuiClbjSWP2WrIoTw==",
"dependencies": {
"encodeurl": "~1.0.2",
"encodeurl": "~2.0.0",
"escape-html": "~1.0.3",
"parseurl": "~1.3.3",
"send": "0.18.0"
"send": "0.19.0"
},
"engines": {
"node": ">= 0.8.0"
@ -4929,6 +5048,22 @@
"dev": true,
"license": "ISC"
},
"node_modules/set-function-length": {
"version": "1.2.2",
"resolved": "https://registry.npmjs.org/set-function-length/-/set-function-length-1.2.2.tgz",
"integrity": "sha512-pgRc4hJ4/sNjWCSS9AmnS40x3bNMDTknHgL5UaMBTMyJnU90EgWh1Rz+MC9eFu4BuN/UwZjKQuY/1v3rM7HMfg==",
"dependencies": {
"define-data-property": "^1.1.4",
"es-errors": "^1.3.0",
"function-bind": "^1.1.2",
"get-intrinsic": "^1.2.4",
"gopd": "^1.0.1",
"has-property-descriptors": "^1.0.2"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/setimmediate": {
"version": "1.0.5",
"license": "MIT"
@ -4958,13 +5093,17 @@
}
},
"node_modules/side-channel": {
"version": "1.0.4",
"resolved": "https://registry.npmjs.org/side-channel/-/side-channel-1.0.4.tgz",
"integrity": "sha512-q5XPytqFEIKHkGdiMIrY10mvLRvnQh42/+GoBlFW3b2LXLE2xxJpZFdm94we0BaoV3RwJyGqg5wS7epxTv0Zvw==",
"version": "1.0.6",
"resolved": "https://registry.npmjs.org/side-channel/-/side-channel-1.0.6.tgz",
"integrity": "sha512-fDW/EZ6Q9RiO8eFG8Hj+7u/oW+XrPTIChwCOM2+th2A6OblDtYYIpve9m+KvI9Z4C9qSEXlaGR6bTEYHReuglA==",
"dependencies": {
"call-bind": "^1.0.0",
"get-intrinsic": "^1.0.2",
"object-inspect": "^1.9.0"
"call-bind": "^1.0.7",
"es-errors": "^1.3.0",
"get-intrinsic": "^1.2.4",
"object-inspect": "^1.13.1"
},
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
@ -5321,13 +5460,11 @@
"license": "MIT"
},
"node_modules/tmp": {
"version": "0.2.1",
"license": "MIT",
"dependencies": {
"rimraf": "^3.0.0"
},
"version": "0.2.3",
"resolved": "https://registry.npmjs.org/tmp/-/tmp-0.2.3.tgz",
"integrity": "sha512-nZD7m9iCPC5g0pYmcaxogYKggSfLsdxl8of3Q/oIbqCqLLIO9IAF0GWjX1z9NZRHPiXv8Wex4yDCaZsgEw0Y8w==",
"engines": {
"node": ">=8.17.0"
"node": ">=14.14"
}
},
"node_modules/tmp-promise": {
@ -5341,8 +5478,9 @@
},
"node_modules/to-regex-range": {
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/to-regex-range/-/to-regex-range-5.0.1.tgz",
"integrity": "sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"is-number": "^7.0.0"
},
@ -5600,9 +5738,9 @@
"license": "ISC"
},
"node_modules/ws": {
"version": "8.14.2",
"resolved": "https://registry.npmjs.org/ws/-/ws-8.14.2.tgz",
"integrity": "sha512-wEBG1ftX4jcglPxgFCMJmZ2PLtSbJ2Peg6TmpJFTbe9GZYOQCDPdMYu/Tm0/bGZkw8paZnJY45J4K2PZrLYq8g==",
"version": "8.18.0",
"resolved": "https://registry.npmjs.org/ws/-/ws-8.18.0.tgz",
"integrity": "sha512-8VbfWfHLbbwu3+N6OKsOMpBdT4kXPDDB9cJk2bJ6mh9ucxdlnNvH1e+roYkKmN9Nxw2yjz7VzeO9oOz2zJ04Pw==",
"engines": {
"node": ">=10.0.0"
},

View File

@ -1,7 +1,7 @@
{
"name": "easy-spider",
"productName": "EasySpider",
"version": "0.6.0",
"version": "0.6.3",
"icon": "./favicon",
"description": "NoCode Visual Web Crawler",
"main": "main.js",
@ -33,14 +33,14 @@
"dependencies": {
"cors": "^2.8.5",
"electron-squirrel-startup": "^1.0.0",
"express": "^4.18.2",
"express": "^4.21.2",
"formidable": "^3.5.0",
"http": "^0.0.1-security",
"multer": "^1.4.5-lts.1",
"node-abi": "^3.52.0",
"node-window-manager": "^2.2.4",
"selenium-webdriver": "^4.16.0",
"ws": "^8.12.0",
"selenium-webdriver": "^4.27.0",
"ws": "^8.18.0",
"xlsx": "^0.18.5"
},
"config": {
@ -67,7 +67,7 @@
],
"packagerConfig": {
"icon": "./favicon",
"appVersion": "0.6.0",
"appVersion": "0.6.3",
"name": "EasySpider",
"executableName": "EasySpider",
"appCopyright": "Naibo Wang (naibowang@foxmail.com)",
@ -80,4 +80,4 @@
"publishers": []
}
}
}
}

View File

@ -20,9 +20,10 @@ rm out/EasySpider/resources/app/vs_BuildTools.exe
mv out/EasySpider ../.temp_to_pub/EasySpider_Linux_x64/EasySpider
rm -rf ../.temp_to_pub/EasySpider_Linux_x64/Code
mkdir ../.temp_to_pub/EasySpider_Linux_x64/Code
cp ../ExecuteStage/easyspider_executestage.py ../.temp_to_pub/EasySpider_Linux_x64/Code
cp ../ExecuteStage/myChrome.py ../.temp_to_pub/EasySpider_Linux_x64/Code
cp ../ExecuteStage/utils.py ../.temp_to_pub/EasySpider_Linux_x64/Code
# cp ../ExecuteStage/easyspider_executestage.py ../.temp_to_pub/EasySpider_Linux_x64/Code
# cp ../ExecuteStage/myChrome.py ../.temp_to_pub/EasySpider_Linux_x64/Code
# cp ../ExecuteStage/utils.py ../.temp_to_pub/EasySpider_Linux_x64/Code
cp ../ExecuteStage/*.py ../.temp_to_pub/EasySpider_Linux_x64/Code
cp ../ExecuteStage/requirements.txt ../.temp_to_pub/EasySpider_Linux_x64/Code
cp ../ExecuteStage/Readme.md ../.temp_to_pub/EasySpider_Linux_x64/Code
cp ../ExecuteStage/myCode.py ../.temp_to_pub/EasySpider_Linux_x64

View File

@ -20,9 +20,10 @@ rm -r ../.temp_to_pub/EasySpider_MacOS/EasySpider.app/Contents/Resources/app/use
rm -r ../.temp_to_pub/EasySpider_MacOS/EasySpider.app/Contents/Resources/app/TempUserDataFolder
rm -rf ../.temp_to_pub/EasySpider_MacOS/Code
mkdir ../.temp_to_pub/EasySpider_MacOS/Code
cp ../ExecuteStage/easyspider_executestage.py ../.temp_to_pub/EasySpider_MacOS/Code
cp ../ExecuteStage/myChrome.py ../.temp_to_pub/EasySpider_MacOS/Code
cp ../ExecuteStage/utils.py ../.temp_to_pub/EasySpider_MacOS/Code
# cp ../ExecuteStage/easyspider_executestage.py ../.temp_to_pub/EasySpider_MacOS/Code
# cp ../ExecuteStage/myChrome.py ../.temp_to_pub/EasySpider_MacOS/Code
# cp ../ExecuteStage/utils.py ../.temp_to_pub/EasySpider_MacOS/Code
cp ../ExecuteStage/*.py ../.temp_to_pub/EasySpider_MacOS/Code
cp ../ExecuteStage/requirements.txt ../.temp_to_pub/EasySpider_MacOS/Code
cp ../ExecuteStage/Readme.md ../.temp_to_pub/EasySpider_MacOS/Code
cp ../ExecuteStage/myCode.py ../.temp_to_pub/EasySpider_MacOS

View File

@ -66,6 +66,7 @@ if (!fs.existsSync(path.join(getDir(), "config.json"))) {
webserver_port: 8074,
user_data_folder: "./user_data",
debug: false,
lang: "-",
copyright: 0,
sys_arch: require("os").arch(),
mysql_config_path: "./mysql_config.json",
@ -121,6 +122,12 @@ exports.start = function (port = 8074) {
res.setHeader("Access-Control-Allow-Origin", "*"); // 设置可访问的源
// 解析参数
const pathName = url.parse(req.url).pathname;
const safeBase = path.join(__dirname, "src");
const safeJoin = (base, target) => {
const targetPath = "." + path.posix.normalize("/" + target);
return path.join(base, targetPath);
};
if (pathName == "/excelUpload" && req.method.toLowerCase() === "post") {
// // parse a file upload
// let form = new formidable.IncomingForm();
@ -160,8 +167,16 @@ exports.start = function (port = 8074) {
else {
//如果有后缀名, 则为前端请求
// console.log(path.join(__dirname,"src/taskGrid", pathName));
const filePath = safeJoin(safeBase, pathName);
if (!filePath.startsWith(safeBase)) {
res.writeHead(400, { "Content-Type": 'text/html;charset="utf-8"' });
res.end("Invalid path");
return;
}
fs.readFile(
path.join(__dirname, "src", pathName),
filePath,
async (err, data) => {
if (err) {
res.writeHead(404, {
@ -200,8 +215,10 @@ exports.start = function (port = 8074) {
let item = {
id: task.id,
name: task.name,
url: task.url,
url: task.links.split("\n")[0],
mtime: stat.mtime,
links: task.links,
desc: task.desc,
};
if (item.id != -2) {
output.push(item);
@ -443,6 +460,10 @@ exports.start = function (port = 8074) {
"utf8"
);
config_file = JSON.parse(config_file);
let lang = config_file["lang"];
if(lang == undefined){
lang = "-";
}
res.write(JSON.stringify(config_file));
res.end();
} else if (pathName == "/setUserDataFolder") {

View File

@ -32,7 +32,7 @@
<body>
<div id="app">
<div style="padding: 10px; text-align: center;vertical-align: middle;" v-if="init">
<div style="padding: 10px; text-align: center;vertical-align: middle;" v-if="lang=='-'">
<h5 style="margin-top: 20px">选择语言/Select Language</h5>
<p><a @click="changeLang('zh')" class="btn btn-outline-primary btn-lg"
@ -40,15 +40,15 @@
<p><a @click="changeLang('en')" class="btn btn-outline-primary btn-lg"
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;">English</a></p>
<p style="font-size: 17px">当前版本/Current Version: <b>v0.6.0</b></p>
<p style="font-size: 17px"><a href="https://github.com/NaiboWang/EasySpider/releases" target="_blank">Github</a>最新版本/Newest Version<b>{{newest_version}}</b></p>
<!-- <p>如发现新版本更新可从以下Github仓库下载最新版本使用/If a new version is found, you can download the latest version from the following Github repository:</p>-->
<!-- <p></p>-->
<!-- <p>如发现新版本更新可从以下Github仓库下载最新版本使用/If a new version is found, you can download the latest version from the following Github repository:</p>-->
<!-- <p></p>-->
<div class="img-container">
<!-- <h5>出品方/Producer</h5>-->
<!-- <h5>出品方/Producer</h5>-->
<a href="https://www.zju.edu.cn" alt="浙江大学 Zhejiang University" target="_blank"><img src="img/zju.png"></a>
<a href="https://www.nus.edu.sg" alt="新加坡国立大学 National University of Singpaore" target="_blank"><img src="img/nuslogo.png"></a>
<a href="https://www.xidian.edu.cn" alt="西安电子科技大学 Xidian University" target="_blank"><img src="img/xidian.png"></a>
<a href="https://www.nus.edu.sg" alt="新加坡国立大学 National University of Singpaore" target="_blank"><img
src="img/nuslogo.png"></a>
<a href="https://www.xidian.edu.cn" alt="西安电子科技大学 Xidian University" target="_blank"><img
src="img/xidian.png"></a>
</div>
</div>
@ -58,9 +58,11 @@
<div v-if="step == -1">
<h4 style="margin-top: 20px">Copyright and Disclaimer</h4>
<p>Please carefully read the following instructions regarding the use of the software and commercial payments. If you agree, please accept the agreement.</p>
<textarea class="form-control" style="margin:0 auto;width:90%; color:black; height: 450px; min-height: 200px; background: white" readonly>
This software is intended for educational and communication purposes only. It is strictly prohibited to use the software for any illegal activities or operations, such as crawling government/military websites that are not allowed to be crawled. The user bears all consequences resulting from the use of this software and the author shall not be held responsible or liable in any way. Furthermore, the software is protected by patent rights. If you intend to use it for commercial purposes or profit-making activities, such as using the software for client orders, selling the collected data, please contact author: naibowang@foxmail.com for patent authorization and payment operations: https://www.patentguru.com/cn/search?q=一种自定义提取流程的服务封装系统
For individual users, EasySpider is a completely free and ad-free open-source software. The development and maintenance of the software rely solely on the author's voluntary efforts. Therefore, you can choose to support the author, allowing them to have more enthusiasm and energy to maintain this software. Alternatively, if you have profited from using this software, you are welcome to support the author through the following methods:
<textarea class="form-control"
style="margin:0 auto;width:90%; color:black; height: 450px; min-height: 200px; background: white"
readonly>
This software is intended for educational and communication purposes only. It is strictly prohibited to use the software for any illegal activities or operations, such as crawling government/military websites that are not allowed to be crawled. The user bears all consequences resulting from the use of this software and the author shall not be held responsible or liable in any way.
EasySpider is a completely free and ad-free open-source software. The development and maintenance of the software rely solely on the author's voluntary efforts. Therefore, you can choose to support the author, allowing them to have more enthusiasm and energy to maintain this software. Alternatively, if you have profited from using this software, you are welcome to support the author through the following methods:
1. PayPal account: naibowang, or scan the QR code provided in the software package.
2. Alipay account: naibowang@foxmail.com, or scan the QR code provided in the software package.
@ -68,7 +70,8 @@ For individual users, EasySpider is a completely free and ad-free open-source so
</textarea>
<p><a @click="acceptAgreement" class="btn btn-primary btn-lg"
style="margin-top: 30px; width: 300px;height:60px;padding-top:12px;color:white">Agree and Start</a></p>
style="margin-top: 30px; width: 300px;height:60px;padding-top:12px;color:white">Agree and Start</a>
</p>
</div>
<div v-if="step == 0">
<p style="margin-top: 20px">Hint: Click Button below to start.</p>
@ -83,13 +86,20 @@ For individual users, EasySpider is a completely free and ad-free open-source so
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;color:white">View/Manage/Execute
Tasks</a></p>
<p>
<a href="https://www.easyspider.cn/index_english.html" target="_blank" style="text-align: center; font-size: 18px">Browse official website to watch tutorials</a>
<a href="https://www.easyspider.cn/index_english.html" target="_blank"
style="text-align: center; font-size: 18px">Browse official website to watch tutorials</a>
</p>
<p style="font-size: 17px">Current Version: <b>v0.6.3</b></p>
<p style="font-size: 17px"><a href="https://github.com/NaiboWang/EasySpider/releases"
target="_blank">Newest</a> Version: <b>{{newest_version}}</b></p>
<div class="img-container">
<!-- <h5>Producer</h5>-->
<a href="https://www.zju.edu.cn" alt="Zhejiang University" target="_blank"><img src="img/zju.png"></a>
<a href="https://www.nus.edu.sg" alt="National University of Singapore" target="_blank"><img src="img/nuslogo.png"></a>
<a href="https://www.xidian.edu.cn" alt="Xidian University" target="_blank"><img src="img/xidian.png"></a>
<!-- <h5>Producer</h5>-->
<a href="https://www.zju.edu.cn" alt="Zhejiang University" target="_blank"><img
src="img/zju.png"></a>
<a href="https://www.nus.edu.sg" alt="National University of Singapore" target="_blank"><img
src="img/nuslogo.png"></a>
<a href="https://www.xidian.edu.cn" alt="Xidian University" target="_blank"><img
src="img/xidian.png"></a>
</div>
</div>
<div v-else-if="step == 1">
@ -113,7 +123,8 @@ For individual users, EasySpider is a completely free and ad-free open-source so
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;color:white">Start Data Mode</a>
</p>
<a @click="step = 0" class="btn btn-outline-primary btn-lg"style="margin-top: 10px; width: 302px;height:45px;padding-top:5px">Go to Home Page</a>
<a @click="step = 0" class="btn btn-outline-primary btn-lg"
style="margin-top: 10px; width: 302px;height:45px;padding-top:5px">Go to Home Page</a>
</div>
<div v-else-if="step == 2">
@ -121,7 +132,11 @@ For individual users, EasySpider is a completely free and ad-free open-source so
<div style="margin: 0 auto; width:90%">
<p style="margin-top: 20px; text-align: justify">
Please specify the directory of user data below. Once set, the browser will load cookies and other contents such as user login information from this directory. The browser will load data from this directory every time it is designed and executed, as long as the directory remains the same. </p>
<p style="text-align: justify">For example, if the <b>./user_data</b> folder is set and you log in at <b>ebay.com</b> during the design process, then the previous login status will still be retained when you specify the <b>./user_data</b> folder again for the next design or task execution when you open <b>ebay.com</b>.</p>
<p style="text-align: justify">For example, if the
<b>./user_data</b> folder is set and you log in at
<b>ebay.com</b> during the design process, then the previous login status will still be retained when you specify the
<b>./user_data</b> folder again for the next design or task execution when you open
<b>ebay.com</b>.</p>
<p style="text-align: justify">If there are multiple configurations, different directories can be set for each configuration. Each directory will be treated as a separate configuration set, and if a directory does not exist, it will be created automatically.</p>
<p><textarea class="form-control" style="min-height: 50px;"
v-model="user_data_folder"></textarea>
@ -129,13 +144,16 @@ For individual users, EasySpider is a completely free and ad-free open-source so
</div>
<p><a @click="startDesign('en', true)"
class="btn btn-primary btn-lg"
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;color:white">Start Design</a></p>
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;color:white">Start Design</a>
</p>
<p>
<p><a @click="startDesign('en', true, true)"
class="btn btn-primary btn-lg"
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;color:white">Start Design (Mobile)</a></p>
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;color:white">Start Design (Mobile)</a>
</p>
<p>
<a @click="step = 0" class="btn btn-outline-primary btn-lg"style="margin-top: 10px; width: 302px;height:45px;padding-top:5px">Go to Home Page</a>
<a @click="step = 0" class="btn btn-outline-primary btn-lg"
style="margin-top: 10px; width: 302px;height:45px;padding-top:5px">Go to Home Page</a>
</p>
</div>
</div>
@ -143,36 +161,46 @@ For individual users, EasySpider is a completely free and ad-free open-source so
<div v-if="step == -1">
<h4 style="margin-top: 20px">版权声明和注意事项</h4>
<p>请接受下方使用协议以使用软件,不同意请退出。</p>
<textarea class="form-control" style="margin:0 auto;width:90%; color:black; height: 480px; min-height: 200px; background: white" readonly>
本软件仅供学习交流使用,严禁使用软件进行任何违法违规的操作,如爬取不允许爬取的政府/军事机关网站等。使用本软件所造成的一切后果由使用者自负与作者本人无关作者不会承担任何责任。同时软件受到专利权保护如要用于商业用途如使用软件进行盈利接单用于公司业务或出售采集到的数据等请邮件联系作者naibowang@foxmail.com进行专利授权等付费操作https://www.patentguru.com/cn/search?q=一种自定义提取流程的服务封装系统
<textarea class="form-control"
style="margin:0 auto;width:90%; color:black; height: 480px; min-height: 200px; background: white"
readonly>
本软件仅供学习交流使用,严禁使用软件进行任何违法违规的操作,如爬取不允许爬取的政府/军事机关网站等。使用本软件所造成的一切后果由使用者自负,与作者本人无关,作者不会承担任何责任。
对于个人使用者来说易采集EasySpider是一款完全免费无广告的开源软件软件开发和维护全靠作者用爱发电因此您可以选择支持作者让作者有更多的热情和精力维护此软件或者您使用了此软件进行了盈利欢迎您通过下面的方式支持作者
易采集EasySpider是一款完全免费无广告的开源软件软件开发和维护全靠作者用爱发电因此您可以选择支持作者让作者有更多的热情和精力维护此软件或者您使用了此软件进行了盈利欢迎您通过下面的方式支持作者
1、支付宝账号naibowang@foxmail.com也可以扫描软件包中带的二维码。
2、微信收款扫描软件包中带的二维码。
3、PayPal账号naibowang或扫描软件包中带的二维码。
</textarea>
<p><a @click="acceptAgreement" class="btn btn-primary btn-lg"
style="margin-top: 30px; width: 300px;height:60px;padding-top:12px;color:white">同意并开始使用</a></p>
style="margin-top: 30px; width: 300px;height:60px;padding-top:12px;color:white">同意并开始使用</a>
</p>
</div>
<div v-if="step == 0">
<p style="margin-top: 20px">提示:点击下方按钮开始使用。</p>
<p><a @click="step = 1" class="btn btn-primary btn-lg"
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;color:white">设计/修改任务</a></p>
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;color:white">设计/修改任务</a>
</p>
<p><a @click="startInvoke('zh')"
@click class="btn btn-primary btn-lg"
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;color:white">查看/管理/执行任务</a>
</p>
<p>
<a href="https://www.easyspider.cn?lang=zh" target="_blank" style="text-align: center; font-size: 18px">点此访问官网查看文档/视频教程</a>
<a href="https://www.easyspider.cn?lang=zh" target="_blank"
style="text-align: center; font-size: 18px">点此访问官网查看文档/视频教程</a>
</p>
<p style="font-size: 17px">软件当前版本:<b>v0.6.3</b></p>
<p style="font-size: 17px"><a href="https://github.com/NaiboWang/EasySpider/releases"
target="_blank">官网</a>最新版本:<b>{{newest_version}}</b></p>
<div class="img-container">
<!-- <h5>出品方</h5>-->
<!-- <h5>出品方</h5>-->
<a href="https://www.zju.edu.cn" alt="浙江大学" target="_blank"><img src="img/zju.png"></a>
<a href="https://www.nus.edu.sg" alt= "新加坡国立大学" target="_blank"><img src="img/nuslogo.png"></a>
<a href="https://www.xidian.edu.cn" alt="西安电子科技大学" target="_blank"><img src="img/xidian.png"></a>
<a href="https://www.nus.edu.sg" alt="新加坡国立大学" target="_blank"><img
src="img/nuslogo.png"></a>
<a href="https://www.xidian.edu.cn" alt="西安电子科技大学" target="_blank"><img
src="img/xidian.png"></a>
</div>
</div>
<div v-else-if="step == 1">
@ -194,7 +222,8 @@ For individual users, EasySpider is a completely free and ad-free open-source so
style="margin-top: 15px; width: 320px;height:60px;padding-top:12px;color:white">使用带用户信息浏览器设计</a>
</p>
<p>
<a @click="step = 0" class="btn btn-outline-primary btn-lg"style="margin-top: 10px; width: 322px;height:45px;padding-top:5px">返回首页</a>
<a @click="step = 0" class="btn btn-outline-primary btn-lg"
style="margin-top: 10px; width: 322px;height:45px;padding-top:5px">返回首页</a>
</p>
@ -216,9 +245,11 @@ For individual users, EasySpider is a completely free and ad-free open-source so
<p>
<p><a @click="startDesign('zh', true, true)"
class="btn btn-primary btn-lg"
style="margin-top: 15px; width: 320px;height:60px;padding-top:12px;color:white">开始设计(手机模式)</a></p>
style="margin-top: 15px; width: 320px;height:60px;padding-top:12px;color:white">开始设计(手机模式)</a>
</p>
<p>
<a @click="step = 0" class="btn btn-outline-primary btn-lg"style="margin-top: 10px; width: 322px;height:45px;padding-top:5px">返回首页</a>
<a @click="step = 0" class="btn btn-outline-primary btn-lg"
style="margin-top: 10px; width: 322px;height:45px;padding-top:5px">返回首页</a>
</p>
</div>

View File

@ -22,7 +22,7 @@ let app = Vue.createApp({
data() {
return {
init: true,
lang: 'zh',
lang: '-',
user_data_folder: getUrlParam("user_data_folder"),
copyright: 0,
step: 0,
@ -34,6 +34,10 @@ let app = Vue.createApp({
if(this.copyright == 0){
this.step = -1;
}
this.lang = getUrlParam("lang");
if (this.lang == 'undefined' || this.lang == '') {
this.lang = '-';
}
// 发送GET请求获取GitHub的Release API响应
const request = new XMLHttpRequest();
request.open('GET', `https://api.github.com/repos/NaiboWang/EasySpider/releases/latest`);
@ -52,8 +56,9 @@ let app = Vue.createApp({
},
methods: {
changeLang(lang = 'zh') {
this.init = false;
// this.init = false;
this.lang = lang;
window.electronAPI.changeLang(lang);
},
acceptAgreement() {
this.step = 0;

View File

@ -11,4 +11,5 @@ contextBridge.exposeInMainWorld('electronAPI', {
startDesign: (lang="en", user_data_folder = '', mobile=false) => ipcRenderer.send('start-design', lang, user_data_folder, mobile),
startInvoke: (lang="en") => ipcRenderer.send('start-invoke', lang),
acceptAgreement: () => ipcRenderer.send('accept-agreement'),
changeLang: (lang="en") => ipcRenderer.send('change-lang', lang)
})

View File

@ -89,7 +89,7 @@
</div>
<div>
<label>Tip: Hover over the smiley face to view hints, <b>double-click</b> on an action in the flowchart to test run, <b>right-click</b> on an action to see more options.</label>
<label>Tip: Hover over the smiley face to view hints, <b>double-click</b> on an action in the flowchart to <b>trial run</b> (can only run when the webpage is fully loaded), <b>right-click</b> on an action to see more options.</label>
<label>Option Name:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='list.nl[index.nowNodeIndex]["title"]'></input>
</div>
@ -145,39 +145,12 @@
</div>
<div>
<label>XPath (Or use "point(10,10)" to represent clicking on the web page at coordinate position (10, 10), suitable for the situation when need to click on a blank area to leave popup dialog): <span style="font-size: 30px!important;" title="Relative XPATH writing: start with /, e.g. the loop item XPATH is /html/body/div[1], your input is /*[@id='tab-customer'], then the final addressed xpath is: /html/body/div[1]/*[@id='tab-customer']"></span></label>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='nowNode["parameters"]["xpath"]'></textarea>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='xpath'></textarea>
<p><button type="button" data-toggle="modal" data-target="#myModal_XPath" @click="changeXPaths(nowNode['parameters']['allXPaths'])" class="btn btn-primary" style="margin-top: 10px">Click here to view other equivalent XPath expressions</button></p>
<label>The final XPath of this element when the task is running:</label>
<textarea v-model="getFinalXPath(nowNode['parameters']['xpath'], useLoop)" spellcheck="false" onkeydown="inputDelete(event)" class="form-control" rows="2" readonly style="background:ghostwhite"></textarea>
</div>
<label>Maximum wait time for page load after clicking (in seconds):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['maxWaitTime']" type="number" required></input>
<label>Click Type:</label>
<select v-model='nowNode["parameters"]["clickWay"]' class="form-control">
<option :value = 0>Selenium</option>
<option :value = 1>JavaScript</option>
</select>
<label>Open link in new tab:</label>
<select v-model='nowNode["parameters"]["newTab"]' class="form-control">
<option :value = 1>Yes</option>
<option :value = 0>No</option>
</select>
<label>Whether to scroll down after clicking:</label>
<select v-model='nowNode["parameters"]["scrollType"]' class="form-control">
<option :value = 0>No Scrolling</option>
<option :value = 1>Scroll one screen</option>
<option :value = 2>Scroll to the end</option>
<option :value = 3>Keep scrolling until the page data does not change</option>
</select>
<label>Scroll Times:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['scrollCount']" type="number" required></input>
<label>Wait time after scrolling (in seconds):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['scrollWaitTime']" type="number" required></input>
<label>Way to handle pop-up windows after clicking:</label>
<p><select v-model='nowNode["parameters"]["alertHandleType"]' class="form-control">
<option :value = 0>No pop-up window</option>
<option :value = 1>Accept pop-up window</option>
<option :value = 2>Reject pop-up window (only for Confirm pop-up window)</option>
</select></p>
<p style="margin-top: 10px">
<p style="margin-top: 10px">
<a class="btn btn-primary" data-toggle="collapse" href="#collapseExample" role="button" aria-expanded="false" aria-controls="collapseExample">
Click here to expand/collapse advanced operations
</a>
@ -195,6 +168,38 @@
<input spellcheck=false onkeydown="inputDelete(event)" required class="form-control" type="number" v-model.number='nowNode["parameters"]["afterJSWaitTime"]'></input>
</div>
</div>
<label>Maximum wait time for page load after clicking (in seconds):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['maxWaitTime']" type="number" required></input>
<label>Click Type (including double-click):</label>
<select v-model='nowNode["parameters"]["clickWay"]' class="form-control">
<option :value = 0>Selenium</option>
<option :value = 1>JavaScript</option>
<option :value = 2>Double-click</option>
</select>
<label>Open link in new tab:</label>
<select v-model='nowNode["parameters"]["newTab"]' class="form-control">
<option :value = 1>Yes</option>
<option :value = 0>No</option>
</select>
<label>Whether to scroll down after clicking:</label>
<select v-model='nowNode["parameters"]["scrollType"]' class="form-control">
<option :value = 0>No Scrolling</option>
<option :value = 1>Scroll one screen</option>
<option :value = 2>Scroll to the end</option>
<option :value = 3>Keep scrolling until the page data does not change</option>
</select>
<label>Scroll Times:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['scrollCount']" type="number" required></input>
<label>Wait time after scrolling (in seconds):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['scrollWaitTime']" type="number" required></input>
<label>Maximum file download wait time (in seconds):</label>
<input spellcheck="false" onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['downloadWaitTime']" type="number" required></input>
<label>Way to handle pop-up windows after clicking:</label>
<p><select v-model='nowNode["parameters"]["alertHandleType"]' class="form-control">
<option :value = 0>No pop-up window</option>
<option :value = 1>Accept pop-up window</option>
<option :value = 2>Reject pop-up window (only for Confirm pop-up window)</option>
</select></p>
@ -237,6 +242,9 @@
<p>XPATH (Field["FieldName"] and eval("your code") can be used in any XPATHS): <span style="font-size: 30px!important;" title="Relative XPATH writing: start with /, e.g. the loop item XPATH is /html/body/div[1], your input is /*[@id='tab-customer'], then the final addressed xpath is: /html/body/div[1]/*[@id='tab-customer']"></span></p>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='params.parameters[paraIndex]["relativeXPath"]' placeholder="If you want to write the XPath relative to the current element in the loop, you can write as *../div[1] which matches the first div child element of the parent of the current element in the loop."></textarea>
<p><button type="button" data-toggle="modal" data-target="#myModal_XPath" @click="changeXPaths(params.parameters[paraIndex]['allXPaths'])" class="btn btn-primary" style="margin-top: 10px">Click here to view other equivalent XPath expressions</button></p>
<label>Final XPath of this field when the task is running:</label>
<textarea spellcheck="false" onkeydown="inputDelete(event)" class="form-control" rows="2" readonly style="background:ghostwhite">{{getFinalXPath(params.parameters[paraIndex]['relativeXPath'], params.parameters[paraIndex]['relative'])}}</textarea>
<div style="margin-top: 10px"><a href="#" v-on:mousedown="trailParam(paraIndex)" style="text-decoration: none">Trail Run (only test the first matched element)</a></div>
<p style="margin-top: 10px">
<a class="btn btn-primary" data-toggle="collapse" href="#elementAdvanced" role="button" aria-expanded="false" aria-controls="collapseExample">
Click here to expand/collapse advanced operations
@ -244,7 +252,6 @@
</p>
<div :class="{collapse: true, 'show': params.parameters[paraIndex]['beforeJS'].length!=0 || params.parameters[paraIndex]['afterJS'].length!=0}" id="elementAdvanced">
<div>
<div><a href="#" v-on:mousedown="trailParam(paraIndex)" style="text-decoration: none">Trail Run</a></div>
<label>Execute a JavaScript script <strong>before</strong> extracting data from this element: </label>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2"
placeholder='The element should be represented by arguments[0]. Here is an example JavaScript code: arguments[0].innerText = arguments[0].innerText.replace("United States","US"). This code replaces occurrences of "United States" with "US" in the text of the element. Subsequently, when extracting data, you will obtain the replaced value.' v-model='params.parameters[paraIndex]["beforeJS"]'></textarea>
@ -256,21 +263,6 @@
<input spellcheck=false onkeydown="inputDelete(event)" required class="form-control" type="number" v-model.number='params.parameters[paraIndex]["afterJSWaitTime"]'></input>
</div>
</div>
<label>Parameter type conversion (for Excel and Database):</label>
<select v-model='params.parameters[paraIndex]["paraType"]' class="form-control">
<option value = "text">Text (for single values estimated to exceed 10,000 in length, please choose Large Text)</option>
<option value = "int">Integer (up to 9 digits)</option>
<option value = "double">Floating Number (Decimal)</option>
<option value = "mediumText">Large Text (single value length exceeding 10,000 but less than 1,000,000)</option>
<option value = "datetime">Date Time</option>
<option value = "date">Date</option>
<option value = "time">Time</option>
<option value = "varchar">Small Text (single value length less than 50)</option>
<option value = "longText">Extra Large Text (single value length exceeding 1,000,000)</option>
<option value = "bigInt">Large Integer (more than 9 digits)</option>
</select>
<label>Default value when cannot find this element:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='params.parameters[paraIndex]["default"]'></textarea>
<label>Extract Type</label>
<select v-model='params.parameters[paraIndex]["contentType"]' class="form-control">
<option :value = 0>Text (include child element)</option>
@ -280,6 +272,7 @@
<option :value = 4>Background Image Address</option>
<option :value = 5>Webpage URL</option>
<option :value = 6>Webpage Title</option>
<option :value = 15>Constant String</option>
<option :value = 7>Element Screenshot</option>
<option :value = 8>OCR Results</option>
<option :value = 14>Properties of elements</option>
@ -289,7 +282,11 @@
<option :value = 10>Selected value of the current select box</option>
<option :value = 11>Selected text of the current select box</option>
</select>
<div v-if='params.parameters[paraIndex]["contentType"] == 14'>
<div v-if='params.parameters[paraIndex]["contentType"] == 15'>
<label>Constant String:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='params.parameters[paraIndex]["JS"]' placeholder="This field type is usually used for remarks"></input>
</div>
<div v-else-if='params.parameters[paraIndex]["contentType"] == 14'>
<label>Attribute Name:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='params.parameters[paraIndex]["JS"]' placeholder="Attribute names, such as href to represent the href attribute of the current element, that is, the link address."></input>
</div>
@ -320,6 +317,21 @@
<!-- <option :value = 0>普通提取</option>-->
<!-- <option :value = 1>OCR提取</option>-->
<!-- </select>-->
<label>Parameter type conversion (for Excel and Database):</label>
<select v-model='params.parameters[paraIndex]["paraType"]' class="form-control">
<option value = "text">Text (for single values estimated to exceed 10,000 in length, please choose Large Text)</option>
<option value = "int">Integer (up to 9 digits)</option>
<option value = "double">Floating Number (Decimal)</option>
<option value = "mediumText">Large Text (single value length exceeding 10,000 but less than 1,000,000)</option>
<option value = "datetime">Date Time</option>
<option value = "date">Date</option>
<option value = "time">Time</option>
<option value = "varchar">Small Text (single value length less than 50)</option>
<option value = "longText">Extra Large Text (single value length exceeding 1,000,000)</option>
<option value = "bigInt">Large Integer (more than 9 digits)</option>
</select>
<label>Default value when cannot find this element:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='params.parameters[paraIndex]["default"]'></textarea>
<label style="margin-top: 15px">Wrap content to new line (set when collecting long articles and wanting to wrap):</label>
<select v-model='params.parameters[paraIndex]["splitLine"]' class="form-control">
<option :value="0">No</option>
@ -388,8 +400,11 @@
<option :value = 5>Run Python code on current environment (the "exec" operation)</option>
<option :value = 6>Get value of a Python expression (the "eval" operation)</option>
<option :value = 7>Pause program execution (such as when the captcha box appears)</option>
<option :value = 12>Exit Program</option>
<option :value = 8>Refresh page</option>
<option :value = 9>Send Email</option>
<option :value = 10>Clear all field values</option>
<option :value = 11>Generate new data row</option>
</select>
<div v-if='nowNode["parameters"]["codeMode"] < 3 || nowNode["parameters"]["codeMode"] >= 5 && nowNode["parameters"]["codeMode"] <=6'>
<label>Code (Use Field["FieldName"] to input the lastest value of a field): </label>
@ -480,7 +495,12 @@ Please note that this feature does not support assigning values to variables. In
<label>Email content:</label>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='nowNode["parameters"]["emailConfig"]["content"]' placeholder="Write the email content here"></textarea>
</div>
<div v-if='nowNode["parameters"]["codeMode"] == 10'>
<label>This action can clear all field values, such as when used before starting a web scraping task to clear all values.</label>
</div>
<div v-if='nowNode["parameters"]["codeMode"] == 11'>
<label>This action can generate a new row of data, such as when designing a web scraping task to not generate rows of data temporarily, and instead generate a new row of data once all fields have been extracted.</label>
</div>
</div>
<div class="elements" v-if="nodeType==6">
@ -519,6 +539,8 @@ Please note that this feature does not support assigning values to variables. In
<label>XPath: </label>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='nowNode["parameters"]["xpath"]'></textarea>
<p><button type="button" data-toggle="modal" data-target="#myModal_XPath" @click="changeXPaths(nowNode['parameters']['allXPaths'])" class="btn btn-primary" style="margin-top: 10px">Click here to view other equivalent XPath expressions</button></p>
<label>The final XPath of this element when the task is running:</label>
<textarea v-model="getFinalXPath(nowNode['parameters']['xpath'], useLoop)" spellcheck="false" onkeydown="inputDelete(event)" class="form-control" rows="2" readonly style="background:ghostwhite"></textarea>
</div>
@ -558,7 +580,7 @@ Please note that this feature does not support assigning values to variables. In
Loop based on the expression value of Python code. Here are some examples:
1. Return relevant values of the current browser object. Use `self.browser` to refer to the current browser being operated. You can directly use Selenium's API to perform operations, such as `self.browser.find_element(By.CSS_SELECTOR, "body").text=="123"`, which checks whether the current page contains the text "123".
2. Return the value of a custom global variable: `self.myVar`
3. Return the result of a conditional statement: `self.myVar == 1`
3. Return the result of a conditional statement: `self.myVar > 1`
4. Determining whether the value extracted from a certain field is equal to the value of a certain variable: self.outputParameters["field name"] == self.myVar
If the expression returns a value greater than 0 or evaluates to True, the loop continues; otherwise, it stops.
</pre>
@ -700,8 +722,8 @@ If the expression returns a value greater than 0 or evaluates to True, the opera
<input spellcheck=false onkeydown="inputDelete(event)" id="serviceDescription" name="serviceDescription" class="form-control"></input>
<label>Export Data Format (Excel/CSV/TXT/Database):</label>
<select id="outputFormat" class="form-control">
<option value="xlsx">XLSX (Excel file, recommended use CSV format when single cell exceeds 500 characters)</option>
<option value="csv">CSV (Recommended for collecting long articles)</option>
<option value="xlsx">XLSX (Excel file, recommended use CSV format when single cell exceeds 500 characters)</option>
<option value="txt">TXT</option>
<option value="json">JSON</option>
<option value="mysql">MySQL Database (recommended for large amounts of data)</option>
@ -712,6 +734,7 @@ If the expression returns a value greater than 0 or evaluates to True, the opera
<select id="dataWriteMode" name="dataWriteMode" class="form-control">
<option value="1">Append (If the file exists, append to it)</option>
<option value="2">Overwrite (If the file exists, overwrite it)</option>
<option value=3>Rename on Write (renames file if it already exists)</option>
</select>
<!-- <label>Is it an extreme anti-scraping website like Cloudflare (<a href="https://www.bilibili.com/video/BV1Ph4y1E7R9/" target="_blank">Watch Tutorial</a>)?</label>-->
<!-- <select id="cloudflare" name="cloudflare" class="form-control">-->

View File

@ -46,6 +46,7 @@ let app = new Vue({
index: vueData,
nodeType: 0, // 当前元素的类型
nowNode: null, // 用来临时存储元素的节点
parentNode: null, // 用来临时存储元素的父节点
codeMode: -1, //代码模式
loopType: -1, //点击循环时候用来循环选项
useLoop: false, //记录是否使用循环内元素
@ -53,6 +54,7 @@ let app = new Vue({
params: {"parameters": []}, //提取数据的参数列表
TClass: -1, //条件分支的条件类别
paraIndex: 0, //当前参数的index
xpath: "", //当前操作的xpath
XPaths: "", //xpath列表
},
mounted: function () {
@ -62,6 +64,12 @@ let app = new Vue({
// console.log("scroll")
// }, 500);
},
// computed: {
// finalXPath: function () {
// console.log("Call finalXPath")
// return this.getFinalXPath(this.nowNode["parameters"]["xpath"], this.nowNode["parameters"]["useLoop"]);
// }
// },
watch: {
nowArrow: { //变量发生变化的时候进行一些操作
deep: true,
@ -91,6 +99,11 @@ let app = new Vue({
updateUI();
}
},
'nowNode.parameters.xpath': { //xpath发生变化的时候更新参数值
handler: function (newVal, oldVal) {
console.log("xpath changed", newVal, oldVal);
}
},
loopType: { //循环类型发生变化的时候更新参数值
handler: function (newVal, oldVal) {
// this.nowNode["parameters"]["loopType"] = newVal;
@ -106,6 +119,11 @@ let app = new Vue({
this.nowNode["parameters"]["useLoop"] = newVal;
}
},
xpath: {
handler: function (newVal, oldVal) {
this.nowNode["parameters"]["xpath"] = newVal;
}
},
params: {
handler: function (newVal, oldVal) {
this.nowNode["parameters"]["params"] = newVal["parameters"];
@ -123,6 +141,26 @@ let app = new Vue({
}
},
methods: {
getFinalXPath: function (xpath, useLoop) { //获取最终的xpath
// console.log(xpath, useLoop, this.parentNode);
if (this.parentNode == null || this.parentNode.parameters == null || this.parentNode.parameters.xpath == null) {
return xpath;
} else if (useLoop) {
let parent_xpath = this.parentNode.parameters.xpath;
let final_xpath = "";
final_xpath = parent_xpath + xpath;
if (this.parentNode.parameters.loopType == 2) {
parent_xpath = this.parentNode.parameters.pathList.split("\n");
final_xpath = "";
for (let i = 0; i < parent_xpath.length; i++) {
final_xpath += parent_xpath[i] + xpath + "\n";
}
}
return final_xpath;
} else {
return xpath;
}
},
handleCodeModeChange: function () {
// if (this.codeMode == undefined || this.codeMode == null || this.codeMode == -1) {
// return;
@ -137,7 +175,7 @@ let app = new Vue({
this.nowNode["title"] = LANG("运行操作系统命令", "Run OS Command");
break;
case 2:
this.nowNode["title"] = LANG("执行JavaScript", "Run JavaScript");
this.nowNode["title"] = LANG("循环内元素执行JS", "Run JS in Loop");
break;
case 3:
this.nowNode["title"] = LANG("退出循环", "Exit Loop");
@ -160,6 +198,15 @@ let app = new Vue({
case 9:
this.nowNode["title"] = LANG("发送邮件", "Send Email");
break;
case 10:
this.nowNode["title"] = LANG("清空字段值", "Clear Field Value");
break;
case 11:
this.nowNode["title"] = LANG("生成新行", "Generate New Row");
break;
case 12:
this.nowNode["title"] = LANG("退出程序", "Exit Program");
break;
case -1: // 跳转到其他操作时,不改变标题
break;
default: // 默认情况
@ -433,7 +480,7 @@ function operationChange(e, theNode) {
if (nowNode != null) {
nowNode.style.borderColor = "skyblue";
}
nowNode = theNode
nowNode = theNode;
vueData.nowNodeIndex = actionSequence[theNode.getAttribute("data")];
theNode.style.borderColor = "blue";
handleElement(); //处理元素
@ -467,7 +514,7 @@ function elementDblClick(e) {
showInfo(LANG("试运行功能不适用于循环操作,请试运行循环内部的具体操作,如点击元素。", "The trial run function is not applicable to loop operations. Please try to run the specific operations in the loop, such as clicking elements."));
}
} else {
if (nodeType == 5 && (app._data.nowNode["parameters"]["codeMode"] != 0 && app._data.nowNode["parameters"]["codeMode"] != 8)) {
if (nodeType == 5 && (app._data.nowNode["parameters"]["codeMode"] != 0 && app._data.nowNode["parameters"]["codeMode"] != 2 && app._data.nowNode["parameters"]["codeMode"] != 8)) {
showInfo(LANG("试运行自定义操作功能只适用于执行JavaScript和刷新页面操作。", "The trial run custom action function is only applicable to run JavaScript and refresh page operations."));
} else {
trailElement(app._data.nowNode, 1);
@ -505,8 +552,7 @@ function toolBoxKernel(e, param = null) {
// let tarrow = DeepClone(app.$data.nowArrow);
// refresh();
// app._data.nowArrow =tarrow;
}
else if (option == 11) { //复制操作
} else if (option == 11) { //复制操作
if (nowNode == null) {
e.stopPropagation(); //防止冒泡
} else if (nowNode.getAttribute("dataType") > 0) {
@ -528,8 +574,7 @@ function toolBoxKernel(e, param = null) {
$("#" + t["id"]).click(); //复制后点击复制后的元素
e.stopPropagation(); //防止冒泡
}
}
else if (option == 10) { //剪切操作
} else if (option == 10) { //剪切操作
if (nowNode == null) {
e.stopPropagation(); //防止冒泡
} else if ($(nowNode).is(".branch")) {
@ -574,8 +619,7 @@ function toolBoxKernel(e, param = null) {
e.stopPropagation(); //防止冒泡
}
}
}
else if (option > 0) { //新增操作
} else if (option > 0) { //新增操作
let l = nodeList.length;
let nt = null;
let nt2 = null;
@ -676,13 +720,13 @@ function toolBoxKernel(e, param = null) {
} else {
$("#" + t["id"]).click();
}
if (e != null)
if (e != null) {
e.stopPropagation(); //防止冒泡
}
option = 0;
return t;
}
option = 0;
updateParentNode();
}
$(".options").mousedown(function () {
@ -935,4 +979,4 @@ function inputDelete(e) {
e.stopPropagation(); //输入框按delete应该正常运行
//Electron中如果有showError或者confirm执行后会卡死输入框所以最好不要用
}
}
}

View File

@ -89,7 +89,7 @@
</div>
<div>
<label>提示:鼠标移到笑脸可查看提示,在流程图中<b>双击</b>操作可试运行,<b>右键</b>点击操作查看更多选项。</label>
<label>提示:鼠标移到笑脸可查看提示,在流程图中<b>双击</b>操作可<b>试运行</b>(页面完全加载完毕后)<b>右键</b>点击操作查看更多选项。</label>
<label>选项名称</span></label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='list.nl[index.nowNodeIndex]["title"]'></input>
</div>
@ -145,38 +145,11 @@
</div>
<div>
<label>XPath或者用point(10,10)表示点击网页坐标位置(10, 10)以用来点击空白区域推出弹窗对话框文本列表等): <span style="font-size: 30px!important;" title="相对XPATH写法:以/开头如循环项XPATH为/html/body/div[1],您的输入为/*[@id='tab-customer'],则最终寻址的xpath为/html/body/div[1]/*[@id='tab-customer']"></span></label>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='nowNode["parameters"]["xpath"]'></textarea>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='xpath'></textarea>
<p><button type="button" data-toggle="modal" data-target="#myModal_XPath" @click="changeXPaths(nowNode['parameters']['allXPaths'])" class="btn btn-primary" style="margin-top: 10px">点此查看其他等价的XPath</button></p>
<label>任务运行时最终定位的本元素XPath</label>
<textarea v-model="getFinalXPath(nowNode['parameters']['xpath'], useLoop)" spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" readonly style="background:ghostwhite"></textarea>
</div>
<label>点击后页面加载最长等待时间(秒):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['maxWaitTime']" type="number" required></input>
<label>点击类型:</label>
<select v-model='nowNode["parameters"]["clickWay"]' class="form-control">
<option :value = 0>Selenium点击</option>
<option :value = 1>JavaScript点击</option>
</select>
<label>在新标签页打开超链接:</label>
<select v-model='nowNode["parameters"]["newTab"]' class="form-control">
<option :value = 1></option>
<option :value = 0></option>
</select>
<label>点击后是否向下滚动页面:</label>
<select v-model='nowNode["parameters"]["scrollType"]' class="form-control">
<option :value = 0>不滚动</option>
<option :value = 1>向下滚动一屏</option>
<option :value = 2>滚动到底部</option>
<option :value = 3>一直滚动直到页面内容无变化(需设置好滚动后的等待时间用于检测页面变化)</option>
</select>
<label>滚动次数(滚动类型设置为<b>不滚动</b><b>一直滚动</b>时请忽略此项):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['scrollCount']" type="number" required></input>
<label>滚动后等待时间(秒):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['scrollWaitTime']" type="number" required></input>
<label>点击元素后如有弹窗出现,弹窗处理方式:</label>
<p><select v-model='nowNode["parameters"]["alertHandleType"]' class="form-control">
<option :value = 0>无弹窗</option>
<option :value = 1>接受弹窗(点击弹窗确定按钮)</option>
<option :value = 2>拒绝弹窗点击弹窗取消按钮仅限Confirm弹框</option>
</select></p>
<p style="margin-top: 10px">
<a class="btn btn-primary" data-toggle="collapse" href="#collapseExample" role="button" aria-expanded="false" aria-controls="collapseExample">
点此展开/折叠自定义操作
@ -195,6 +168,38 @@
<input spellcheck=false onkeydown="inputDelete(event)" required class="form-control" type="number" v-model.number='nowNode["parameters"]["afterJSWaitTime"]'></input>
</div>
</div>
<label>点击后页面加载最长等待时间(秒):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['maxWaitTime']" type="number" required></input>
<label>点击类型(如是否双击):</label>
<select v-model='nowNode["parameters"]["clickWay"]' class="form-control">
<option :value = 0>Selenium点击</option>
<option :value = 1>JavaScript点击</option>
<option :value = 2>双击</option>
</select>
<label>在新标签页打开超链接:</label>
<select v-model='nowNode["parameters"]["newTab"]' class="form-control">
<option :value = 1></option>
<option :value = 0></option>
</select>
<label>点击后是否向下滚动页面:</label>
<select v-model='nowNode["parameters"]["scrollType"]' class="form-control">
<option :value = 0>不滚动</option>
<option :value = 1>向下滚动一屏</option>
<option :value = 2>滚动到底部</option>
<option :value = 3>一直滚动直到页面内容无变化(需设置好滚动后的等待时间用于检测页面变化)</option>
</select>
<label>滚动次数(滚动类型设置为<b>不滚动</b><b>一直滚动</b>时请忽略此项):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['scrollCount']" type="number" required></input>
<label>滚动后等待时间(秒):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['scrollWaitTime']" type="number" required></input>
<label>文件下载最长等待时间(秒):</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['downloadWaitTime']" type="number" required></input>
<label>点击元素后如有弹窗出现,弹窗处理方式:</label>
<p><select v-model='nowNode["parameters"]["alertHandleType"]' class="form-control">
<option :value = 0>无弹窗</option>
<option :value = 1>接受弹窗(点击弹窗确定按钮)</option>
<option :value = 2>拒绝弹窗点击弹窗取消按钮仅限Confirm弹框</option>
</select></p>
@ -237,6 +242,9 @@
<p>XPath所有XPath内均可用Field["字段名"]表示参数值用eval("你的代码")来替换成自定义的变量): <span style="font-size: 30px!important;" title="相对XPATH写法:以/开头如循环项XPATH为/html/body/div[1],您的输入为/*[@id='tab-customer'],则最终寻址的xpath为/html/body/div[1]/*[@id='tab-customer']"></span></p>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='params.parameters[paraIndex]["relativeXPath"]' placeholder="如果要写相对循环内的xpath可以写如*../div[1]即匹配当前循环元素的父元素的第一个div子元素"></textarea>
<p><button type="button" data-toggle="modal" data-target="#myModal_XPath" @click="changeXPaths(params.parameters[paraIndex]['allXPaths'])" class="btn btn-primary" style="margin-top: 10px">点此查看其他等价的XPath</button></p>
<label>任务运行时最终定位的本字段XPath</label>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" readonly style="background:ghostwhite">{{getFinalXPath(params.parameters[paraIndex]['relativeXPath'], params.parameters[paraIndex]['relative'])}}</textarea>
<div style="margin-top: 10px"><a href="#" v-on:mousedown="trailParam(paraIndex)" style="text-decoration: none">试运行(只测试第一个匹配到的元素)</a></div>
<p style="margin-top: 10px">
<a class="btn btn-primary" data-toggle="collapse" href="#elementAdvanced" role="button" aria-expanded="false" aria-controls="collapseExample">
点此展开/折叠自定义操作
@ -244,7 +252,6 @@
</p>
<div :class="{collapse: true, 'show': params.parameters[paraIndex]['beforeJS'].length!=0 || params.parameters[paraIndex]['afterJS'].length!=0}" id="elementAdvanced">
<div>
<div><a href="#" v-on:mousedown="trailParam(paraIndex)" style="text-decoration: none">试运行</a></div>
<label>提取该元素数据<strong></strong>针对该元素执行一段JavaScript脚本 </label>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2"
placeholder='该元素用arguments[0]来表示示例JS代码arguments[0].innerText = arguments[0].innerText.replace("上海","Shanghai")即实现了将元素文字中的“上海”替换成”Shanghai“的功能然后后续如提取数据时就会提取到替换后的值。' v-model='params.parameters[paraIndex]["beforeJS"]'></textarea>
@ -256,21 +263,6 @@
<input spellcheck=false onkeydown="inputDelete(event)" required class="form-control" type="number" v-model.number='params.parameters[paraIndex]["afterJSWaitTime"]'></input>
</div>
</div>
<label>参数类型转换为用于Excel和数据库</label>
<select v-model='params.parameters[paraIndex]["paraType"]' class="form-control">
<option value = "text">文本单个值长度预估超过1万请选择大文本</option>
<option value = "int">整数位数在9位以内</option>
<option value = "double">浮点数(小数)</option>
<option value = "mediumText">大文本单个值长度超过1万低于100万</option>
<option value = "datetime">日期时间</option>
<option value = "date">日期</option>
<option value = "time">时间</option>
<option value = "varchar">小文本单个值长度小于50</option>
<option value = "longText">超大文本单个值长度超过100万</option>
<option value = "bigInt">大整数位数超过9位</option>
</select>
<label>元素找不到时的值:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='params.parameters[paraIndex]["default"]'></input>
<label>采集内容类型</label>
<select v-model='params.parameters[paraIndex]["contentType"]' class="form-control">
<option :value = 0>文本(包括子元素)</option>
@ -280,6 +272,7 @@
<option :value = 4>背景图片地址</option>
<option :value = 5>页面网址</option>
<option :value = 6>页面标题</option>
<option :value = 15>常量字符串</option>
<option :value = 7>元素截图</option>
<option :value = 8>OCR识别文字</option>
<option :value = 14>元素的属性值</option>
@ -289,7 +282,11 @@
<option :value = 10>当前选择框选中的选项值</option>
<option :value = 11>当前选择框选中的选项文本</option>
</select>
<div v-if='params.parameters[paraIndex]["contentType"] == 14'>
<div v-if='params.parameters[paraIndex]["contentType"] == 15'>
<label>常量字符串:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='params.parameters[paraIndex]["JS"]' placeholder="此字段类型通常作为备注使用"></input>
</div>
<div v-else-if='params.parameters[paraIndex]["contentType"] == 14'>
<label>属性名称:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='params.parameters[paraIndex]["JS"]' placeholder="属性名称如class表示当前元素的class属性值即元素所拥有的类名。"></input>
</div>
@ -320,6 +317,21 @@
<!-- <option :value = 0>普通提取</option>-->
<!-- <option :value = 1>OCR提取</option>-->
<!-- </select>-->
<label>参数类型转换为用于Excel和数据库</label>
<select v-model='params.parameters[paraIndex]["paraType"]' class="form-control">
<option value = "text">文本单个值长度预估超过1万请选择大文本</option>
<option value = "int">整数位数在9位以内</option>
<option value = "double">浮点数(小数)</option>
<option value = "mediumText">大文本单个值长度超过1万低于100万</option>
<option value = "datetime">日期时间</option>
<option value = "date">日期</option>
<option value = "time">时间</option>
<option value = "varchar">小文本单个值长度小于50</option>
<option value = "longText">超大文本单个值长度超过100万</option>
<option value = "bigInt">大整数位数超过9位</option>
</select>
<label>元素找不到时的值:</label>
<input spellcheck=false onkeydown="inputDelete(event)" class="form-control" v-model='params.parameters[paraIndex]["default"]'></input>
<label style="margin-top: 15px">是否将内容换行(长文章采集想要换行时设置):</label>
<select v-model='params.parameters[paraIndex]["splitLine"]' class="form-control">
<option :value = 0></option>
@ -388,8 +400,11 @@
<option :value = 5>在执行环境下运行Python代码exec操作</option>
<option :value = 6>在执行环境下获得Python表达式值eval操作</option>
<option :value = 7>暂停程序执行(如检测到验证码框出现时暂停执行)</option>
<option :value = 12>退出程序</option>
<option :value = 8>刷新页面</option>
<option :value = 9>发送邮件</option>
<option :value = 10>清空所有字段值</option>
<option :value = 11>生成新数据行</option>
</select>
<div v-if='nowNode["parameters"]["codeMode"] < 3 || nowNode["parameters"]["codeMode"] >= 5 && nowNode["parameters"]["codeMode"] <=6'>
<label>代码/脚本内容用Field["字段名"]来输入某字段/自定义操作的最新提取/返回值): </label>
@ -480,7 +495,12 @@ print(emotlib.emoji()) # 使用其中的函数。
<label>邮件内容:</label>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='nowNode["parameters"]["emailConfig"]["content"]' placeholder="这里写邮件内容"></textarea>
</div>
<div v-if='nowNode["parameters"]["codeMode"] == 10'>
<label>此操作可以清空所有字段值,如用于爬虫任务开始前清空所有字段值。</label>
</div>
<div v-if='nowNode["parameters"]["codeMode"] == 11'>
<label>此操作可以生成新数据行,如用于爬虫任务设计时暂不生成数据行,等所有字段提取结束后统一生成新数据行。</label>
</div>
</div>
<div class="elements" v-if="nodeType==6">
@ -517,8 +537,10 @@ print(emotlib.emoji()) # 使用其中的函数。
</div>
<div>
<label>XPath </label>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='nowNode["parameters"]["xpath"]'></textarea>
<textarea spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='xpath'></textarea>
<p><button type="button" data-toggle="modal" data-target="#myModal_XPath" @click="changeXPaths(nowNode['parameters']['allXPaths'])" class="btn btn-primary" style="margin-top: 10px">点此查看其他等价的XPath</button></p>
<label>任务运行时最终定位的本元素XPath</label>
<textarea v-model="getFinalXPath(nowNode['parameters']['xpath'], useLoop)" spellcheck=false onkeydown="inputDelete(event)" class="form-control" rows="2" readonly style="background:ghostwhite"></textarea>
</div>
@ -557,8 +579,8 @@ print(emotlib.emoji()) # 使用其中的函数。
<pre class="form-control" style="background: white; margin-top: 20px; min-height: 220px; font-size: 15px!important; word-wrap: break-word; white-space: pre-wrap; border-radius: 0; border: 1px solid" disabled v-if='parseInt(loopType) == 7'>请先阅读此说明再在上方输入框不是本框写具体代码如果要执行大量代码可以直接写outside:myCode.py这样程序就会读取并执行EasySpider目录下的myCode.py中的代码。
根据Python代码的表达式值来决定是否循环示例
1. 返回当前浏览器对象的相关值用self.browser表示当前操作的浏览器可直接用selenium的API进行操作如self.browser.find_element(By.CSS_SELECTOR, "body").text=="123"表示判断当前页面是否为123这个文本。
2. 返回自定义全局变量的值self.myVar,如果
3. 返回条件判断的值self.myVar == 1
2. 返回自定义全局变量的值self.myVar
3. 返回条件判断的值self.myVar > 1
4. 判断某个字段提取的值是否等于某个变量的值self.outputParameters["字段名"] == self.myVar
以上表达式返回值大于0或为真则继续循环否则停止循环。
</pre>
@ -700,8 +722,8 @@ print(emotlib.emoji()) # 使用其中的函数。
<input spellcheck=false onkeydown="inputDelete(event)" id="serviceDescription" name="serviceDescription" class="form-control"></input>
<label>导出数据格式Excel/CSV/TXT/数据库,<a href="https://www.bilibili.com/video/BV1os4y1679S/" target="_blank">查看MySQL操作教程</a></label>
<select id="outputFormat" class="form-control">
<option value = "xlsx">XLSX即EXCEL文件建议单个单元格长度超过500时使用CSV格式存储</option>
<option value = "csv">CSV采集长文章推荐使用此格式</option>
<option value = "xlsx">XLSX即EXCEL文件建议单个单元格长度超过500时使用CSV格式存储</option>
<option value = "txt">TXT</option>
<option value = "json">JSON</option>
<option value = "mysql">MySQL数据库大量数据推荐使用</option>
@ -712,6 +734,7 @@ print(emotlib.emoji()) # 使用其中的函数。
<select id="dataWriteMode" name="dataWriteMode" class="form-control">
<option value=1>追加写入(如果文件已存在则在原文件后面追加)</option>
<option value=2>覆盖写入(如果文件已存在则覆盖原文件)</option>
<option value=3>重命名写入(如果文件已存在则重命名文件)</option>
</select>
<!-- <label>是否为Cloudflare等极端反爬网站<a href="https://www.bilibili.com/video/BV1Ph4y1E7R9/" target="_blank">查看Cloudflare设计和执行教程</a></label>-->
<!-- <select id="cloudflare" name="cloudflare" class="form-control">-->

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@ -31,18 +31,20 @@
.ID {
width: 10%;
}
.excel th,.excel td {
.excel th, .excel td {
text-align: center;
font-size: 13px;
padding: 10px;
max-width: 200px!important;
max-width: 200px !important;
}
.tip {
position: fixed;
width:100%;
width: 100%;
display: none;
z-index: 1000;
top:0;
top: 0;
}
</style>
</head>
@ -59,31 +61,40 @@
提示任务执行ID对应配置文件已更新您可使用任务ID<span id="newID_ZH"></span>来执行加载了新配置的任务。
</div>
<div id="tipID_EN" class="alert alert-info alert-dismissible fade show tip">
Hint: The task execution ID corresponds to the configuration file has been updated, you can use the task ID <span id="newID_EN"></span> to execute the task with the new configuration.</div>
Hint: The task execution ID corresponds to the configuration file has been updated, you can use the task ID
<span id="newID_EN"></span> to execute the task with the new configuration.
</div>
</div>
</div>
<div class="row" style="margin-top: 40px;">
<div class="col-md-7" id="taskInfo" style="margin:0 auto" v-if="show">
<div id="tipCustom" class="alert alert-success alert-dismissible fade show" style="display: none; z-index: 1000">
{{tip | lang}}</div>
<div class="col-md-8" id="taskInfo" style="margin:0 auto" v-if="show">
<div id="tipCustom" class="alert alert-success alert-dismissible fade show"
style="display: none; z-index: 1000">
{{tip | lang}}
</div>
<div class="modal fade" id="myModal" tabindex="-1" role="dialog" aria-labelledby="myModalLabel" aria-hidden="true">
<div class="modal fade" id="myModal" tabindex="-1" role="dialog" aria-labelledby="myModalLabel"
aria-hidden="true">
<div class="modal-dialog modal-lg">
<div class="modal-content">
<div class="modal-header">
<h4 class="modal-title" id="myModalLabel">{{"Task Execution Instruction~执行任务说明" | lang}}</h4>
<h4 class="modal-title"
id="myModalLabel">{{"Task Execution Instruction~执行任务说明" | lang}}</h4>
<button type="button" class="close" data-dismiss="modal" aria-hidden="true">&times;</button>
</div>
<div class="modal-body">
<input onkeydown="inputDelete(event)" id="serviceId" type="hidden" name="serviceId" value="-1"></input>
<input onkeydown="inputDelete(event)" id="url" type="hidden" name="url" value="about:blank"></input>
<label><a href="https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction" target="_blank">{{`Click Here~点击这里` | lang}}</a> {{`Here to see argument instruction.~这里查看参数配置说明。` | lang}}</label>
<label v-if="OS=='darwin'">{{`对于MacOS系统EasySpider提供了两个不同的执行程序分别为easyspider_executestage和easyspider_executestage_full前者执行时加载速度较快并提供了除OCR识别和数据去重以外的全部功能后者则提供了包括OCR识别和数据去重在内的全部功能但运行时加载速度较慢需要等待2-10分钟才能执行程序请根据自己的需求选择执行哪个程序。~For MacOS system, EasySpider provides two different execution programs, 'easyspider_executestage' and 'easyspider_executestage_full', the former loads faster when executing, and provides all functions except OCR recognition and data deduplication; the latter provides all functions including OCR recognition and data deduplication, but the loading speed is slower when running, and it takes 2-10 minutes to wait for the program to execute, please choose which program to execute according to your needs.` | lang}}</label>
<label>{{ `Please open a terminal (For Windows, please use PowerShell instead of CMD), go to EasySpider's folder, and then copy (Command/Ctrl + c) the following command to run the task (EasySpider cannot quit when executing command, unless --read_type is set to "local"):~请在EasySpider目录下打开命令行工具Terminal Windows请使用PowerShell而不是CMD然后复制Command/Ctrl + c和运行以下命令以执行任务执行命令时不能退出EasySpider除非将--read_type设置为local` | lang }}</label>
<input onkeydown="inputDelete(event)" id="serviceId" type="hidden" name="serviceId"
value="-1"></input>
<input onkeydown="inputDelete(event)" id="url" type="hidden" name="url"
value="about:blank"></input>
<label><a href="https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction"
target="_blank">{{`Click Here~点击这里` | lang}}</a> {{`Here to see argument instruction.~这里查看参数配置说明。` | lang}}</label>
<label v-if="OS=='MacOS'">{{`对于MacOS系统EasySpider提供了两个不同的执行程序分别为easyspider_executestage和easyspider_executestage_full前者执行时加载速度较快并提供了除OCR识别和数据去重以外的全部功能后者则提供了包括OCR识别和数据去重在内的全部功能但运行时加载速度较慢需要等待2-10分钟才能执行程序请根据自己的需求选择执行哪个程序。~For MacOS system, EasySpider provides two different execution programs, 'easyspider_executestage' and 'easyspider_executestage_full', the former loads faster when executing, and provides all functions except OCR recognition and data deduplication; the latter provides all functions including OCR recognition and data deduplication, but the loading speed is slower when running, and it takes 2-10 minutes to wait for the program to execute, please choose which program to execute according to your needs.` | lang}}</label>
<label>{{ `Please open a terminal (For Windows, please use PowerShell instead of CMD), go to EasySpider's folder, and then copy (Command/Ctrl + c) the following command to run the task (EasySpider can quit when executing command for ease of timed execution, and you can set --read_type to "remote" for remote execution):~请在EasySpider目录下打开命令行工具Terminal Windows请使用PowerShell而不是CMD然后复制Command/Ctrl + c和运行以下命令以执行任务执行命令时可以退出EasySpider以方便定时执行如需要远程调用则需要将--read_type设置为remote并设置远程地址` | lang }}</label>
<textarea class="form-control" style="height:150px">cd {{easyspider_location}}
{{command}} --config_folder "{{config_folder}}" --headless 0 --read_type remote --config_file_name config.json --saved_file_name </textarea>
{{command}} --config_folder "{{config_folder}}" --headless 0 --read_type local --config_file_name config.json --saved_file_name </textarea>
</div>
<!-- <div class="modal-footer">
<button type="button" id="saveAsButton" class="btn btn-outline-primary">另存为</button>
@ -94,34 +105,38 @@
</div>
</div>
<div class="modal fade" id="excelModal" tabindex="-1" role="dialog" aria-labelledby="excelModalLabel" aria-hidden="true">
<div class="modal fade" id="excelModal" tabindex="-1" role="dialog" aria-labelledby="excelModalLabel"
aria-hidden="true">
<div class="modal-dialog modal-lg">
<div class="modal-content">
<div class="modal-header">
<h4 class="modal-title" id="excelModalLabel">{{"Read from Excel~从Excel文件读取输入参数" | lang}}</h4>
<h4 class="modal-title"
id="excelModalLabel">{{"Read from Excel~从Excel文件读取输入参数" | lang}}</h4>
<button type="button" class="close" data-dismiss="modal" aria-hidden="true">&times;</button>
</div>
<div class="modal-body">
<!-- <form action="/upload" method="post" enctype="multipart/form-data">-->
<!-- <form action="/upload" method="post" enctype="multipart/form-data">-->
<div>
<div class="form-group" style="margin-bottom: 10px">
<label>{{"Please select an Excel file (.xlsx)~请选择一个Excel文件.xlsx" | lang}}</label>
<input type="file" class="form-control-file" id="excelFile" name="file">
<label style="display: block; margin-top:10px;margin-bottom: 0">{{fileUploadStatus | lang}}</label>
</div>
<div class="form-group" style="margin-bottom: 10px">
<label>{{"Please select an Excel file (.xlsx)~请选择一个Excel文件.xlsx" | lang}}</label>
<input type="file" class="form-control-file" id="excelFile" name="file">
<label style="display: block; margin-top:10px;margin-bottom: 0">{{fileUploadStatus | lang}}</label>
</div>
<button @click="submitFile" class="btn btn-primary" style="min-width: 100px;margin-bottom:10px">{{"Upload~上传" | lang}}</button>
<button @click="submitFile" class="btn btn-primary"
style="min-width: 100px;margin-bottom:10px">{{"Upload~上传" | lang}}
</button>
<!-- </form>-->
<!-- </form>-->
</div>
<label style="margin:10px 0">{{"Please design an Excel file (.xlsx) according to the following format and upload it above.~请按照以下格式设计一个Excel文件.xlsx并在上方上传" | lang}}</label>
<label style="margin:10px 0">{{"Please design an Excel file (.xlsx) according to the following format and upload it above.~请按照以下格式设计一个Excel文件.xlsx并在上方上传" | lang}}</label>
<table class="table table-bordered excel" style="text-align: center; font-size: 13px">
<thead>
<tr>
<th scope="col">{{"Invoke Name 1~调用名称1" | lang}}</th>
<th scope="col">{{"Invoke Name 2~调用名称2" | lang}}</th>
<th scope="col">...</th>
</tr>
<tr>
<th scope="col">{{"Invoke Name 1~调用名称1" | lang}}</th>
<th scope="col">{{"Invoke Name 2~调用名称2" | lang}}</th>
<th scope="col">...</th>
</tr>
</thead>
<tbody>
<tr>
@ -145,15 +160,17 @@
<label>{{"You can just put part of the arguments in the Excel file, and the values of the rest of the arguments will be set to default. Example table for this task is:~您可以只在Excel文件中放入部分参数其余参数值将被设置为默认值。一个针对此任务的表格示例为" | lang}}</label>
<table class="table table-bordered excel">
<thead>
<tr>
<th v-for="i in Math.min(3, task.inputParameters.length)" v-if="task.inputParameters.length>0">
{{task.inputParameters[i-1]["name"]}}
</th>
</tr>
<tr>
<th v-for="i in Math.min(3, task.inputParameters.length)"
v-if="task.inputParameters.length>0">
{{task.inputParameters[i-1]["name"]}}
</th>
</tr>
</thead>
<tbody>
<tr>
<td v-for="i in Math.min(3, task.inputParameters.length)" v-if="task.inputParameters.length>0">
<td v-for="i in Math.min(3, task.inputParameters.length)"
v-if="task.inputParameters.length>0">
{{getLine(i,0)}}
</td>
<tr>
@ -171,8 +188,10 @@
<tr>
</tbody>
</table>
<label v-if='lang == "zh"' style="width: 95%">对于循环输入文字的参数loopText需要配置索引值的情况即输入文字操作用了相对循环内的索引值您可以在Excel文件中写同一个参数名写多列程序将自动合并。 例如,想要设置'loopText_1'参数值两行,分别为"A~B~C"和"D~E~F"则Excel文件可以这样设置</label>
<label v-else style="width: 95%"> For parameters that need to configure the index value of the loop text (loopText), that is, the input text operation uses the index value relative to the loop, you can write multiple columns with the same parameter name in the Excel file, and the program will automatically merge. For example, if you want to set the parameter value of 'loopText_1' to two rows, which are "A~B~C" and "D~E~F", the Excel file can be set like this:</label>
<label v-if='lang == "zh"'
style="width: 95%">对于循环输入文字的参数loopText需要配置索引值的情况即输入文字操作用了相对循环内的索引值您可以在Excel文件中写同一个参数名写多列程序将自动合并。 例如,想要设置'loopText_1'参数值两行,分别为"A~B~C"和"D~E~F"则Excel文件可以这样设置</label>
<label v-else
style="width: 95%"> For parameters that need to configure the index value of the loop text (loopText), that is, the input text operation uses the index value relative to the loop, you can write multiple columns with the same parameter name in the Excel file, and the program will automatically merge. For example, if you want to set the parameter value of 'loopText_1' to two rows, which are "A~B~C" and "D~E~F", the Excel file can be set like this:</label>
<table class="table table-bordered excel" style="text-align: center; font-size: 13px">
<thead>
<tr>
@ -204,7 +223,8 @@
<nav aria-label="breadcrumb">
<ol class="breadcrumb" style="padding-left:0;background-color: white">
<li @click="gotoHome" class="breadcrumb-item"><a href="#">{{"Home~首页" | lang}}</a></li>
<li @click="gotoInfo" aria-current="page" class="breadcrumb-item" style="color: black"><a href="#">{{"Task Information~任务信息" | lang}}</a></li>
<li @click="gotoInfo" aria-current="page" class="breadcrumb-item" style="color: black"><a
href="#">{{"Task Information~任务信息" | lang}}</a></li>
<li aria-current="page" class="breadcrumb-item active" style="color: black">{{"Task Execution~任务执行"
| lang}}
</li>
@ -215,10 +235,15 @@
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"Task Description:~任务描述:" | lang}} {{task["desc"]}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"API URL (POST):~API 调用网址POST" |
lang}} {{backEndAddressServiceWrapper}}/invokeTask?id={{task["id"]}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"URL of how to invoke task by API via POST request (Postman or JavaScript): ~通过POST方式进行API调用的示例教程Postman或JS代码" | lang}}<a target="_blank" href="https://github.com/NaiboWang/EasySpider/wiki/API-Invoke-Example">https://github.com/NaiboWang/EasySpider/wiki/API-Invoke-Example</a></p>
<p><button class="btn btn-primary" @click="readFromExcel">{{"Read parameters from Excel file~从Excel文件读取输入参数"
| lang}}
</button></p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"URL of how to invoke task by API via POST request (Postman or JavaScript): ~通过POST方式进行API调用的示例教程Postman或JS代码" | lang}}<a
target="_blank"
href="https://github.com/NaiboWang/EasySpider/wiki/API-Invoke-Example">https://github.com/NaiboWang/EasySpider/wiki/API-Invoke-Example</a>
</p>
<p>
<button class="btn btn-primary" @click="readFromExcel">{{"Read parameters from Excel file~从Excel文件读取输入参数"
| lang}}
</button>
</p>
<p>{{"Please Input Parameters:~请输入参数值:" | lang}}</p>
<form id="form">
<table class="table table-bordered">
@ -237,7 +262,8 @@
<td style="text-align: center; max-width: 250px;white-space: initial">{{task.inputParameters[i-1]["name"]}}</td>
<td style="max-width: 100px; text-align: center">{{task.inputParameters[i-1]["type"]}}</td>
<td><textarea class="form-control"
style="min-height: 50px;min-width: 300px;" v-bind:name="task.inputParameters[i-1]['name']"
style="min-height: 50px;min-width: 300px;"
v-bind:name="task.inputParameters[i-1]['name']"
v-model="task.inputParameters[i-1]['value']"></textarea></td>
</tr>
</tbody>
@ -255,10 +281,12 @@
</div>
</form>
<label style="display: block">{{"Click the button below to execute the task. Long press the pause button (default: p) on the keyboard to pause the task. Manual intervention is possible during the task execution process, ~点击以下按钮执行任务任务执行过程中可以长按暂停键默认p键暂停任务的执行以便" | lang }}<b>{{"~人工干预," | lang}}</b>{{"such as manually input a password or captcha: ~如手动输入密码,验证码等。" | lang}}</label>
<button class="btn btn-primary" v-on:click="localExecuteInstant(false)">{{"Directly Run Locally (Clean Mode)~本地直接执行(纯净模式)" |
<button class="btn btn-primary"
v-on:click="localExecuteInstant(false)">{{"Directly Run Locally (Clean Mode)~本地直接执行(纯净模式)" |
lang}}
</button>
<button class="btn btn-primary" v-on:click="localExecuteInstant(true)">{{"Directly Run Locally (Data Mode)~本地直接执行(带用户信息模式)" |
<button class="btn btn-primary"
v-on:click="localExecuteInstant(true)">{{"Directly Run Locally (Data Mode)~本地直接执行(带用户信息模式)" |
lang}}
</button>
<!-- <button style="margin-left: 5px;" v-on:click="remoteExcuteInstant" class="btn btn-primary">Directly Run Remotely</button> -->
@ -267,16 +295,21 @@
</label>
<div style="margin-bottom: 10px;">
<label style="margin-top: 10px;">{{"Execution ID (EID), execution files are stored in 'execution_instances' folder, you can write EID by yourself and the set the filename other than 'current_time to append content to the existing file from the EID to achieve incremental collection:~执行ID执行文件存放在execution_instances文件夹内提前在下方写好执行ID且文件名不为current_time时可以追加文件内容以实现增量采集" | lang}}</label>
<input class="form-control" v-model="ID" :placeholder="LANG('如果已在此处写/生成了ID号则点击执行或获得ID按钮后任务ID将保持不变且原任务文件将会被新配置覆盖','If already have ID here, the task ID will remain unchanged and the original task file will be overwritten by the new configuration after click buttons')"></input>
<input class="form-control" v-model="ID"
:placeholder="LANG('如果已在此处写/生成了ID号则点击执行或获得ID按钮后任务ID将保持不变且原任务文件将会被新配置覆盖','If already have ID here, the task ID will remain unchanged and the original task file will be overwritten by the new configuration after click buttons')"></input>
<p></p>
<!-- <p>提示点击下方按钮获得任务ID然后根据此ID进行服务执行也可自己POST调用接口得到ID具体参照POST调用文档。</p> -->
<p>{{"Hint: Click the \"Get Execution ID\" button at the bottom to get the task ID, and click the \"Execute task by commandline\" button at the back to get the prompt command on how to run this task using the command line.~提示点击下方“获得任务执行ID”按钮得到任务ID点击后面的“使用命令行执行任务”按钮获得如何使用命令行运行任务的提示命令。" | lang}}</p>
<button class="btn btn-primary" href="javascript:void(0)" v-on:click="invokeTask">{{"Get Execution ID~获得任务执行ID" |
lang}}</button>
<button class="btn btn-primary" style="margin-left: 8px;" v-on:click="localExecute(false)">{{"Execute task by commandline (Clean Mode)~使用命令行执行任务(纯净模式)"
<button class="btn btn-primary" href="javascript:void(0)"
v-on:click="invokeTask">{{"Get Execution ID~获得任务执行ID" |
lang}}
</button>
<button class="btn btn-primary" style="margin-left: 8px;"
v-on:click="localExecute(false)">{{"Execute task by commandline (Clean Mode)~使用命令行执行任务(纯净模式)"
| lang}}
</button>
<button class="btn btn-primary" style="margin-left: 8px;" v-on:click="localExecute(true)">{{"Execute task by commandline (Data Mode)~使用命令行执行任务(带用户信息模式)"
<button class="btn btn-primary" style="margin-left: 8px;"
v-on:click="localExecute(true)">{{"Execute task by commandline (Data Mode)~使用命令行执行任务(带用户信息模式)"
| lang}}
</button>
<!-- <button v-on:click="remoteExcute" style="margin-left: 8px;" class="btn btn-primary">Run remotely</button></div> -->
@ -290,14 +323,14 @@
</html>
<style>
button{
button {
margin-top: 5px;
}
</style>
<script src="global.js"></script>
<script>
var sId = getUrlParam('id');
var app = new Vue({
let sId = getUrlParam('id');
let app = new Vue({
el: '#taskInfo',
data: {
task: {},
@ -306,8 +339,8 @@
lang: getUrlParam("lang"),
type: getUrlParam("type"),
tip: "The parameter values in the Excel file have been successfully imported into the corresponding field text box~Excel文件中的参数值已成功导入到对应字段文本框中",
file:null,
user_data_folder:"",
file: null,
user_data_folder: "",
fileUploadStatus: "Status: Waiting for upload~状态:等待上传",
with_user_data: true,
backEndAddressServiceWrapper: getUrlParam("backEndAddressServiceWrapper"),
@ -315,18 +348,18 @@
config_folder: "",
easyspider_location: "",
mysql_config_path: "",
OS: "win32",
OS: "Windows",
}, mounted() {
$.get(this.backEndAddressServiceWrapper + "/getConfig", function (result) {
app.$data.user_data_folder = result.user_data_folder;
try{
app.$data.mysql_config_path = result.mysql_config_path;
} catch (e) {
app.$data.mysql_config_path = "./mysql_config.json";
}
});
//TODO 翻译 写清楚readme有关浏览器版本的问题 logo更换 提示看文档来运行
},
$.get(this.backEndAddressServiceWrapper + "/getConfig", function (result) {
app.$data.user_data_folder = result.user_data_folder;
try {
app.$data.mysql_config_path = result.mysql_config_path;
} catch (e) {
app.$data.mysql_config_path = "./mysql_config.json";
}
});
//TODO 翻译 写清楚readme有关浏览器版本的问题 logo更换 提示看文档来运行
},
methods: {
LANG: function (zh, en) {
if (this.lang == "zh") {
@ -338,40 +371,40 @@
gotoHome: function () {
let url = "";
if (getUrlParam("lang") == "zh") {
url = "taskList.html?lang=zh&type="+getUrlParam("type")+"&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
url = "taskList.html?lang=zh&type=" + getUrlParam("type") + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
} else {
url = "taskList.html?lang=en&type="+getUrlParam("type")+"&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
url = "taskList.html?lang=en&type=" + getUrlParam("type") + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
}
window.location.href = url;
}, gotoInfo: function () {
let url = "";
if (getUrlParam("lang") == "zh") {
url = "taskInfo.html?id="+getUrlParam("id")+"&lang=zh&type="+getUrlParam("type")+"&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
url = "taskInfo.html?id=" + getUrlParam("id") + "&lang=zh&type=" + getUrlParam("type") + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
} else {
url = "taskInfo.html?id="+getUrlParam("id")+"&lang=en&type="+getUrlParam("type")+"&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
url = "taskInfo.html?id=" + getUrlParam("id") + "&lang=en&type=" + getUrlParam("type") + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
}
window.location.href = url;
}, getLine: function(i, index){
const value = this.task.inputParameters[i-1]["value"].toString();
}, getLine: function (i, index) {
const value = this.task.inputParameters[i - 1]["value"].toString();
const parts = value.split("\n");
if(parts.length > index){
if (parts.length > index) {
return parts[index];
} else if(this.task.inputParameters[i-1]["name"].indexOf("url") >=0){
} else if (this.task.inputParameters[i - 1]["name"].indexOf("url") >= 0) {
return parts[0];
} else {
return "";
}
},
readFromExcel: function(){
readFromExcel: function () {
$('#excelModal').modal('show');
}, submitFile: function() {
}, submitFile: function () {
let form_data = new FormData();
this.file = $('#excelFile').prop('files')[0];
if(this.file == null || $('#excelFile').val() == ""){
if (this.file == null || $('#excelFile').val() == "") {
this.fileUploadStatus = "Status: Please select a file~状态:请选择文件";
return;
}
if (this.file.name.split('.').pop() !== 'xlsx' ) {
if (this.file.name.split('.').pop() !== 'xlsx') {
this.fileUploadStatus = "Status: Only xlsx files are allowed!~状态只允许上传xlsx文件";
return;
}
@ -379,25 +412,25 @@
form_data.append('file', $('#excelFile').prop('files')[0]);
// console.log(app.$data.backEndAddressServiceWrapper + "/excelUpload",)
$.ajax({
url: app.$data.backEndAddressServiceWrapper.replace("8074","8075") + "/excelUpload",
url: "http://localhost:8075/excelUpload",
type: 'POST',
data: form_data,
processData: false,
contentType: false,
success: function(response) {
success: function (response) {
response = JSON.parse(response);
$('#excelModal').modal('hide');
app.$data.fileUploadStatus = "Status: Upload successfully~状态:上传成功";
$('#excelFile').val("");
let inputParameters = app.$data.task.inputParameters;
inputParameters.forEach(function (item, index) {
if(Object.keys(response).includes(item.name)){
if (Object.keys(response).includes(item.name)) {
let temp = "";
let tempArray = [];
for (let i = 0; i < response[item.name].length; i++) {
for(let key of Object.keys(response)){
if(key.includes(item.name)){
temp += response[key][i] == undefined? "": response[key][i] + "~";
for (let key of Object.keys(response)) {
if (key.includes(item.name)) {
temp += response[key][i] == undefined ? "" : response[key][i] + "~";
}
}
temp = temp.substring(0, temp.length - 1); //去掉最后一个~
@ -408,11 +441,11 @@
}
});
$("#tipCustom").slideDown(); //提示框
setTimeout(function() {
setTimeout(function () {
$("#tipCustom").slideUp();
}, 3000);
},
error: function(err) {
error: function (err) {
app.$data.fileUploadStatus = "Status: Upload failed~状态:上传失败";
}
});
@ -436,17 +469,17 @@
params: JSON.stringify(param)
}
$.post(app.$data.backEndAddressServiceWrapper + "/invokeTask", message, function (result) {
if(app.$data.ID == result){
if (app.$data.ID == result) {
if (getUrlParam("lang") == "en" || getUrlParam("lang") == "") {
$("#tipID_EN").slideDown(); //提示框
$("#newID_EN").text(result);
setTimeout(function() {
setTimeout(function () {
$("#tipID_EN").slideUp();
}, 5000);
} else {
$("#tipID").slideDown(); //提示框
$("#newID_ZH").text(result);
setTimeout(function() {
setTimeout(function () {
$("#tipID").slideUp();
}, 5000);
}
@ -455,16 +488,16 @@
});
// }
},
localExecute: function (with_user_data=false) {
localExecute: function (with_user_data = false) {
if (this.ID === "") {
if (getUrlParam("lang") == "en" || getUrlParam("lang") == "") {
$("#tipEN").slideDown(); //提示框
setTimeout(function() {
setTimeout(function () {
$("#tipEN").slideUp();
}, 3000);
} else {
$("#tip").slideDown(); //提示框
setTimeout(function() {
setTimeout(function () {
$("#tip").slideUp();
}, 3000);
}
@ -478,24 +511,24 @@
// text = "确定要在本地运行此任务吗?";
// }
// if (confirm(text)) {
let message = { //显示flowchart
type: 5, //消息类型,调用执行程序
message: {
"id": app.$data.ID,
"user_data_folder": app.$data.with_user_data ? app.$data.user_data_folder : "",
"mysql_config_path": app.$data.mysql_config_path,
"execute_type": 0,
}
};
ws.send(JSON.stringify(message));
changeCommand();
let message = { //显示flowchart
type: 5, //消息类型,调用执行程序
message: {
"id": app.$data.ID,
"user_data_folder": app.$data.with_user_data ? app.$data.user_data_folder : "",
"mysql_config_path": app.$data.mysql_config_path,
"execute_type": 0,
}
};
ws.send(JSON.stringify(message));
changeCommand();
$('#myModal').modal('show');
// }
},
remoteExecute: function () {
},
localExecuteInstant: function (with_user_data=false) {
localExecuteInstant: function (with_user_data = false) {
let text = "";
// if (getUrlParam("lang") == "en" || getUrlParam("lang") == "") {
// text = "Are you sure to run this task locally now?";
@ -505,34 +538,36 @@
this.with_user_data = with_user_data;
// if (confirm(text)) {
let param = {};
let t = $('#form').serializeArray();
t.forEach(function (item, index) {
param[item.name] = item.value;
});
$.post(app.$data.backEndAddressServiceWrapper + "/invokeTask", {
id: this.task.id,
EID: this.ID,
params: JSON.stringify(param)
}, function (result) {
let message = { //显示flowchart
type: 5, //消息类型,调用执行程序
message: {
"id": result,
"user_data_folder": app.$data.with_user_data ? app.$data.user_data_folder : "",
"mysql_config_path": app.$data.mysql_config_path,
"execute_type": 1,
}
};
app.$data.ID = result;
ws.send(JSON.stringify(message));
$.get(app.$data.backEndAddressServiceWrapper + "/queryOSVersion", function (OSInfo) {
if(OSInfo.version == 'darwin'){
changeCommand();
$('#myModal').modal('show');
}
});
});
let param = {};
let t = $('#form').serializeArray();
t.forEach(function (item, index) {
param[item.name] = item.value;
});
$.post(app.$data.backEndAddressServiceWrapper + "/invokeTask", {
id: this.task.id,
EID: this.ID,
params: JSON.stringify(param)
}, function (result) {
let message = { //显示flowchart
type: 5, //消息类型,调用执行程序
message: {
"id": result,
"user_data_folder": app.$data.with_user_data ? app.$data.user_data_folder : "",
"mysql_config_path": app.$data.mysql_config_path,
"execute_type": 1,
}
};
app.$data.ID = result;
ws.send(JSON.stringify(message));
// 使用函数并打印结果
const systemInfo = detectOperatingSystemAndArch();
// $.get(app.$data.backEndAddressServiceWrapper + "/queryOSVersion", function (OSInfo) {
if (systemInfo.OS == 'MacOS') {
changeCommand();
$('#myModal').modal('show');
}
// });
});
// }
},
remoteExecuteInstant: function () {
@ -541,23 +576,25 @@
});
function changeCommand() {
$.get(app.$data.backEndAddressServiceWrapper + "/queryOSVersion", function (OSInfo) {
app.$data.OS = OSInfo.version;
if(OSInfo.version == 'win32' && OSInfo.bit == 'x64'){
// $.get(app.$data.backEndAddressServiceWrapper + "/queryOSVersion", function (OSInfo) {
// app.$data.OS = systemInfo.OS;
const systemInfo = detectOperatingSystemAndArch();
app.$data.OS = systemInfo.OS;
if (systemInfo.OS == 'Windows' && systemInfo.architecture == 'x64') {
app.$data.command = "./EasySpider/resources/app/chrome_win64/easyspider_executestage.exe --ids [" + app.$data.ID.toString() + "] --user_data " + (app.$data.with_user_data ? "1" : "0") + " --server_address " + app.$data.backEndAddressServiceWrapper;
} else if(OSInfo.version == 'win32' && OSInfo.bit == 'ia32'){
} else if (systemInfo.OS == 'Windows' && systemInfo.architecture == 'ia32') {
app.$data.command = "./EasySpider/resources/app/chrome_win32/easyspider_executestage.exe --ids [" + app.$data.ID.toString() + "] --user_data " + (app.$data.with_user_data ? "1" : "0") + " --server_address " + app.$data.backEndAddressServiceWrapper;
} else if(OSInfo.version == 'linux'){
} else if (systemInfo.OS == 'Linux') {
app.$data.command = "./EasySpider/resources/app/chrome_linux64/easyspider_executestage --ids '[" + app.$data.ID.toString() + "]' --user_data " + (app.$data.with_user_data ? "1" : "0") + " --server_address " + app.$data.backEndAddressServiceWrapper;
} else if(OSInfo.version == 'darwin'){
if(getUrlParam("lang") == "zh"){
app.$data.easyspider_location = "你的EasySpider文件夹cd /Users/"+ app.$data.config_folder.split("/")[2] + "/Downloads/EasySpider_MacOS";
} else if (systemInfo.OS == 'MacOS') {
if (getUrlParam("lang") == "zh") {
app.$data.easyspider_location = "你的EasySpider文件夹cd /Users/" + app.$data.config_folder.split("/")[2] + "/Downloads/EasySpider_MacOS";
} else {
app.$data.easyspider_location = "Your EasySpider folder, such as: cd /Users/"+ app.$data.config_folder.split("/")[2] + "/Downloads/EasySpider_MacOS";
app.$data.easyspider_location = "Your EasySpider folder, such as: cd /Users/" + app.$data.config_folder.split("/")[2] + "/Downloads/EasySpider_MacOS";
}
app.$data.command = "./easyspider_executestage --ids '[" + app.$data.ID.toString() + "]' --user_data " + (app.$data.with_user_data ? "1" : "0") + " --server_address " + app.$data.backEndAddressServiceWrapper;
}
});
// });
}
$.get(app.$data.backEndAddressServiceWrapper + "/queryTask?id=" + sId, function (result) {
@ -565,7 +602,7 @@
app.$data.show = true;
});
ws = new WebSocket("ws://localhost:"+getUrlParam("wsport"));
ws = new WebSocket("ws://localhost:" + getUrlParam("wsport"));
ws.onopen = function () {
// Web Socket 已连接上,使用 send() 方法发送数据
console.log("Connected");
@ -587,10 +624,10 @@
};
this.send(JSON.stringify(message));
};
ws.onmessage = function(message){
ws.onmessage = function (message) {
message = JSON.parse(message.data);
app.$data.config_folder = message.config_folder.replaceAll("\\","/");
app.$data.easyspider_location = message.easyspider_location.replace("/EasySpider.app/","");
app.$data.config_folder = message.config_folder.replaceAll("\\", "/");
app.$data.easyspider_location = message.easyspider_location.replace("/EasySpider.app/", "");
}
ws.onclose = function () {
// 关闭 websocket

View File

@ -23,14 +23,14 @@ function DateFormat(datetime) {
function formatDateTime(date) {
const addZero = (num) => (num < 10 ? `0${num}` : num);
let year = date.getFullYear();
let month = addZero(date.getMonth() + 1); // getMonth() 返回值范围是0-11所以加1
let day = addZero(date.getDate());
let hours = addZero(date.getHours());
let minutes = addZero(date.getMinutes());
let seconds = addZero(date.getSeconds());
return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;
}
@ -57,7 +57,7 @@ function detectLang(str) {
if (enCount === cnCount) {
return 2;
} else if (cnCount>=3) {
} else if (cnCount >= 3) {
return 1;
}
return 0;
@ -82,15 +82,23 @@ Vue.filter('lang', function (value) {
}
})
function LANG(zh, en) {
if (window.location.href.indexOf("_CN") != -1) {
return zh;
} else {
return en;
}
}
function isValidMySQLTableName(tableName) {
// 正则表达式以字母或汉字开头后接字母、数字、下划线或汉字的字符串长度为1到64字符
const pattern = /^[\u4e00-\u9fa5a-zA-Z][\u4e00-\u9fa5a-zA-Z0-9_]{0,63}$/;
return pattern.test(tableName);
}
document.onkeydown = function(e) {
document.onkeydown = function (e) {
let t = false;
try{
try {
t = nowNode;
} catch (e) {
console.log(e);
@ -109,8 +117,8 @@ document.onkeydown = function(e) {
location.reload();
} else if (currKey == 123) {
console.log("打开devtools")
let command = new WebSocket("ws://localhost:"+getUrlParam("wsport"))
command.onopen = function() {
let command = new WebSocket("ws://localhost:" + getUrlParam("wsport"))
command.onopen = function () {
let message = {
type: 6, //消息类型0代表连接操作
};
@ -119,3 +127,27 @@ document.onkeydown = function(e) {
}
}
}
function detectOperatingSystemAndArch() {
const platform = navigator.platform.toLowerCase();
const userAgent = navigator.userAgent.toLowerCase();
let OS = 'Unknown';
let architecture = 'Unknown';
// 判断操作系统类型
if (platform.includes('win')) {
OS = 'Windows';
} else if (platform.includes('mac')) {
OS = 'MacOS';
} else if (platform.includes('linux')) {
OS = 'Linux';
}
// 判断操作系统位数
if (userAgent.includes('wow64') || userAgent.includes('win64') || platform.includes('x86_64') || platform.includes('amd64')) {
architecture = 'x64';
} else {
architecture = 'ia32';
}
return { OS, architecture };
}

View File

@ -83,6 +83,18 @@ function changeGetDataParameters(msg, i) {
msg["parameters"][i]["afterJSWaitTime"] = 0; //执行后js等待时间
msg["parameters"][i]["downloadPic"] = 0; //是否下载图片
msg["parameters"][i]["splitLine"] = 0; //是否分割行
try {
let exampleValue = msg["parameters"][i]["exampleValues"][0]["value"];
//计算句子中去掉空格后的长度
let len = exampleValue.replace(/\s+/g, "").length;
//如果是文本类型的话长度超过200就默认分割行
if (len > 200 && msg["parameters"][i]["nodeType"] == 0 && msg["parameters"][i]["contentType"] == 0) {
msg["parameters"][i]["splitLine"] = 1; //如果示例值长度超过200就默认分割行
showInfo(LANG("单个字段示例值长度超过200已自动开启换行功能。", "The length of the example value of a single field exceeds 200, and the line break function has been automatically turned on."), 4000);
}
} catch (e) {
console.log(e);
}
}
@ -188,20 +200,34 @@ function notifyParameterNum(num) {
ws.send(JSON.stringify(message));
}
function trailElement(node, type = 1) {
// type=0代表标记节点type=1代表试运行
let parentNode = nodeList[actionSequence[node["parentId"]]];
if (node.option == 10) { //条件分支的话,传父元素的父元素
function updateParentNode() {
// console.log("updateParentNode")
let parentNode = nodeList[actionSequence[app._data.nowNode["parentId"]]];
if (app._data.nowNode.option == 10) { //条件分支的话,传父元素的父元素
parentNode = nodeList[actionSequence[parentNode["parentId"]]];
}
if (parentNode.option == 10) { //如果父元素是条件分支,传父元素的爷爷元素
parentNode = nodeList[actionSequence[parentNode["parentId"]]];
parentNode = nodeList[actionSequence[parentNode["parentId"]]];
}
app._data.parentNode = parentNode;
}
function trailElement(node, type = 1) {
// type=0代表标记节点type=1代表试运行
// let parentNode = nodeList[actionSequence[node["parentId"]]];
// if (node.option == 10) { //条件分支的话,传父元素的父元素
// parentNode = nodeList[actionSequence[parentNode["parentId"]]];
// }
// if (parentNode.option == 10) { //如果父元素是条件分支,传父元素的爷爷元素
// parentNode = nodeList[actionSequence[parentNode["parentId"]]];
// parentNode = nodeList[actionSequence[parentNode["parentId"]]];
// }
updateParentNode();
let message = {
type: 4, //消息类型4代表试运行事件
from: 1, //0代表从浏览器到流程图1代表从流程图到浏览器
message: {"type": type, "node": JSON.stringify(node), "parentNode": JSON.stringify(parentNode)}
message: {"type": type, "node": JSON.stringify(node), "parentNode": JSON.stringify(app._data.parentNode)}
};
ws.send(JSON.stringify(message));
console.log(node);
@ -214,6 +240,7 @@ function handleElement() {
app._data["nowNode"] = nodeList[vueData.nowNodeIndex];
app._data["nodeType"] = app._data["nowNode"]["option"];
app._data.useLoop = app._data["nowNode"]["parameters"]["useLoop"];
app._data.xpath = app._data["nowNode"]["parameters"]["xpath"];
app._data["codeMode"] = -1; //自定义初始化
if (app._data["nodeType"] == 8) {
app._data.loopType = app._data["nowNode"]["parameters"]["loopType"];
@ -267,6 +294,7 @@ function addParameters(t) {
t["parameters"]["afterJS"] = ""; //执行后执行的js
t["parameters"]["afterJSWaitTime"] = 0; //执行后js等待时间
t["parameters"]["alertHandleType"] = 0; //弹窗处理类型1代表确认2代表取消
t["parameters"]["downloadWaitTime"] = 3600; //下载等待时间
} else if (t.option == 3) { //提取数据
t["parameters"]["clear"] = 0; //清空其他字段数据
t["parameters"]["newLine"] = 1; //生成新行
@ -359,7 +387,7 @@ function modifyParameters(t, param) {
t["parameters"]["xpath"] = param["xpath"];
t["parameters"]["useLoop"] = param["useLoop"];
t["parameters"]["allXPaths"] = param["allXPaths"];
if(param["type"] == "loopClickEvery"){
if (param["type"] == "loopClickEvery") {
t["parameters"]["newTab"] = 1; //循环点击每个元素,新标签页打开
}
} else if (t.option == 4) { //输入文字事件
@ -389,7 +417,7 @@ function modifyParameters(t, param) {
if (content.length > 15) {
content = content.substring(0, 15) + "...";
content = LANG("", ": ") + content;
} else if(content.length == 0){
} else if (content.length == 0) {
content = LANG("单个元素", " Single Element");
} else {
content = LANG("", ": ") + content;
@ -418,7 +446,7 @@ function modifyParameters(t, param) {
}
}
function showSuccess(msg, time = 4000) {
function showSuccess(msg, time = 1000) {
$("#tip").text(msg);
$("#tip").slideDown(); //提示框
let fadeout = setTimeout(function () {
@ -463,7 +491,7 @@ if (mobile == "true") {
}
let serviceInfo = {
"version": "0.6.0"
"version": "0.6.3"
};
function saveService(type) {
@ -597,7 +625,7 @@ function saveService(type) {
"links": links,
"create_time": $("#create_time").val(),
"update_time": formatDateTime(new Date()),
"version": "0.6.0",
"version": "0.6.3",
"saveThreshold": saveThreshold,
// "cloudflare": cloudflare,
"quitWaitTime": parseInt($("#quitWaitTime").val()),
@ -652,8 +680,8 @@ if (sId != null && sId != -1) //加载任务
if (!("cookies" in node["parameters"])) {
node["parameters"]["cookies"] = "";
}
} else if(node["option"] == 3){ //提取数据
if(node["parameters"]["paras"] != undefined && node["parameters"]["params"] == undefined){
} else if (node["option"] == 3) { //提取数据
if (node["parameters"]["paras"] != undefined && node["parameters"]["params"] == undefined) {
node["parameters"]["params"] = node["parameters"]["paras"];
}
}
@ -681,10 +709,4 @@ if (sId != null && sId != -1) //加载任务
refresh(); //新增任务
}
function LANG(zh, en) {
if (window.location.href.indexOf("_CN") != -1) {
return zh;
} else {
return en;
}
}

View File

@ -23,7 +23,7 @@
<body>
<div class="row" style="margin-top: 40px" id="newTask">
<div class="col-md-6" style="margin:0 auto;" style="text-align: center;">
<div class="col-md-8" style="margin:0 auto;" style="text-align: center;">
<nav aria-label="breadcrumb">
<ol class="breadcrumb" style="padding-left:0;background-color: white">
<li class="breadcrumb-item" @click="gotoHome"><a href="#">{{"Home~首页" | lang}}</a></li>
@ -33,7 +33,7 @@
<h4 style="text-align: center;">{{"New Task~新任务" | lang}}</h4>
<div class="form-group">
<label>{{"Please Input URL (http or https):~请输入网页网址以http或https开头" | lang}} </label>
<textarea class="form-control" id="links" placeholder="links" style="min-height: 100px;">{{"https://www.ebay.com~https://www.jd.com" | lang}}</textarea>
<textarea class="form-control" id="links" placeholder="links" style="min-height: 100px;">{{"https://www.ebay.com~https://www.baidu.com" | lang}}</textarea>
</div>
<button type="submit" id="send" class="btn btn-primary">{{"Start Design~开始设计" | lang}}</button>
<!-- <div class="form-group" style="margin-top: 10px">-->

View File

@ -4,7 +4,8 @@
<head>
<script src="jquery-3.4.1.min.js"></script>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
<meta name="viewport"
content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<script src="vue.js"></script>
<link rel="stylesheet" href="bootstrap/css/bootstrap.css"></link>
@ -18,7 +19,7 @@
td,
th,
tr {
border-color: black!important;
border-color: black !important;
text-overflow: ellipsis;
overflow: hidden;
white-space: nowrap;
@ -34,84 +35,89 @@
<body>
<div class="row" style="margin-top: 40px;">
<div class="row" style="margin-top: 40px;">
<div class="col-md-7" style="margin:0 auto" id="taskInfo" v-if="show">
<nav aria-label="breadcrumb">
<ol class="breadcrumb" style="padding-left:0;background-color: white">
<li class="breadcrumb-item" @click="gotoHome"><a href="#">{{"Home~首页" | lang}}</a></li>
<li class="breadcrumb-item active" aria-current="page" style="color: black">{{"Task Information~任务信息" | lang}}</li>
</ol>
</nav>
<h4 style="text-align: center;">{{"Task Information~任务信息" | lang}}</h4>
<p>{{"Task Name:~任务名称:" | lang}} {{task["name"]}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"Task Description:~任务描述:" | lang}} {{task["desc"]}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"Example URL:~样例网址:" | lang}} {{task["url"]}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"Create Time:~创建时间:" | lang}} {{dateFormat(task["create_time"])}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"Update Time:~更新时间:" | lang}} {{dateFormat(task["update_time"])}}</p>
<p>{{"Operations (Please close this window and select 'Design Task' button if you want to modify task with a browser)~操作(如要带浏览器修改任务流程请关闭此窗口并选择设计任务)" | lang}}</p>
<p><a style="margin-top: 5px" href="javascript:void(0)" v-on:click="modifyTask(task['id'],task['url'])" class="btn btn-primary">{{"Modify Task~修改任务" | lang}}</a>
<a style="margin-top: 5px" href="javascript:void(0)" v-on:click="invokeTask(task['id'],task['url'])" class="btn btn-primary">{{"Execute Task~执行任务" | lang}}</a></p>
<p>{{"Input Parameters~输入参数" | lang}}</p>
<table class="table table-bordered">
<tbody>
<tr>
<th style="min-width: 50px; text-align: center">ID</th>
<th style="text-align: center">{{"Parameter Name~参数名称" | lang}}</th>
<th style="text-align: center">{{"Invoke Name~调用名称" | lang}}</th>
<th style="text-align: center">{{"Parameter Type~参数类型" | lang}}</th>
<th>{{"Example Value~示例值" | lang}}</th>
<th>{{"Parameter Description~参数描述" | lang}}</th>
</tr>
<tr v-if="task.inputParameters.length>0" v-for="i in task.inputParameters.length">
<td style="min-width: 50px; text-align: center">{{i}}</td>
<td style="text-align: center;white-space: initial">{{task.inputParameters[i-1]["nodeName"]}}</td>
<td style="text-align: center">{{task.inputParameters[i-1]["name"]}}</td>
<td style="text-align: center">{{task.inputParameters[i-1]["type"]}}</td>
<td>{{task.inputParameters[i-1]["exampleValue"]}}</td>
<td>{{task.inputParameters[i-1]["desc"]}}</td>
</tr>
<tr v-if="task.inputParameters.length==0">
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
</tr>
</tbody>
</table>
<p>{{"Output Parameters~输出参数" | lang}}</p>
<table class="table table-bordered">
<tbody>
<tr>
<th style="min-width: 50px; text-align: center">ID</th>
<th style="text-align: center">{{"Parameter Name~参数名称" | lang}}</th>
<th style="text-align: center">{{"Parameter Type~参数类型" | lang}}</th>
<th>{{"Example Value~示例值" | lang}}</th>
<th>{{"Parameter Description~参数描述" | lang}}</th>
<th style="text-align: center">{{"Record as a field~作为字段保存" | lang}}</th>
</tr>
<tr v-if="task.outputParameters.length>0" v-for="i in task.outputParameters.length">
<td style="min-width: 50px; text-align: center">{{i}}</td>
<td style="text-align: center">{{task.outputParameters[i-1]["name"]}}</td>
<td style="text-align: center">{{task.outputParameters[i-1]["type"]}}</td>
<td>{{task.outputParameters[i-1]["exampleValue"]}}</td>
<td>{{task.outputParameters[i-1]["desc"]}}</td>
<td style="text-align: center">{{task.outputParameters[i-1]["recordASField"] == 1? "Yes~是": "No~否" | lang}}</td>
</tr>
<tr v-if="task.outputParameters.length==0">
<td style="min-width: 50px;text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
</tr>
</tbody>
</table>
</div>
<div class="col-md-8" style="margin:0 auto" id="taskInfo" v-if="show">
<nav aria-label="breadcrumb">
<ol class="breadcrumb" style="padding-left:0;background-color: white">
<li class="breadcrumb-item" @click="gotoHome"><a href="#">{{"Home~首页" | lang}}</a></li>
<li class="breadcrumb-item active" aria-current="page"
style="color: black">{{"Task Information~任务信息" | lang}}
</li>
</ol>
</nav>
<h4 style="text-align: center;">{{"Task Information~任务信息" | lang}}</h4>
<p>{{"Task Name:~任务名称:" | lang}} {{task["name"]}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"Task Description:~任务描述:" | lang}} {{task["desc"]}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"Example URL:~样例网址:" | lang}} {{task["url"]}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"Create Time:~创建时间:" | lang}} {{dateFormat(task["create_time"])}}</p>
<p style="word-wrap: break-word;word-break: break-all;overflow: hidden;max-height: 100px;">{{"Update Time:~更新时间:" | lang}} {{dateFormat(task["update_time"])}}</p>
<p>{{"Operations (Please close this window and select 'Design Task' button if you want to modify task with a browser)~操作(如要带浏览器修改任务流程请关闭此窗口并选择设计任务)" | lang}}</p>
<p><a style="margin-top: 5px" href="javascript:void(0)" v-on:click="modifyTask(task['id'],task['url'])"
class="btn btn-primary">{{"Modify Task~修改任务" | lang}}</a>
<a style="margin-top: 5px" href="javascript:void(0)" v-on:click="invokeTask(task['id'],task['url'])"
class="btn btn-primary">{{"Execute Task~执行任务" | lang}}</a></p>
<p>{{"Input Parameters~输入参数" | lang}}</p>
<table class="table table-bordered">
<tbody>
<tr>
<th style="min-width: 50px; text-align: center">ID</th>
<th style="text-align: center">{{"Parameter Name~参数名称" | lang}}</th>
<th style="text-align: center">{{"Invoke Name~调用名称" | lang}}</th>
<th style="text-align: center">{{"Parameter Type~参数类型" | lang}}</th>
<th>{{"Example Value~示例值" | lang}}</th>
<th>{{"Parameter Description~参数描述" | lang}}</th>
</tr>
<tr v-if="task.inputParameters.length>0" v-for="i in task.inputParameters.length">
<td style="min-width: 50px; text-align: center">{{i}}</td>
<td style="text-align: center;white-space: initial">{{task.inputParameters[i-1]["nodeName"]}}</td>
<td style="text-align: center">{{task.inputParameters[i-1]["name"]}}</td>
<td style="text-align: center">{{task.inputParameters[i-1]["type"]}}</td>
<td>{{task.inputParameters[i-1]["exampleValue"]}}</td>
<td>{{task.inputParameters[i-1]["desc"]}}</td>
</tr>
<tr v-if="task.inputParameters.length==0">
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
</tr>
</tbody>
</table>
<p>{{"Output Parameters~输出参数" | lang}}</p>
<table class="table table-bordered">
<tbody>
<tr>
<th style="min-width: 50px; text-align: center">ID</th>
<th style="text-align: center">{{"Parameter Name~参数名称" | lang}}</th>
<th style="text-align: center">{{"Parameter Type~参数类型" | lang}}</th>
<th>{{"Example Value~示例值" | lang}}</th>
<th>{{"Parameter Description~参数描述" | lang}}</th>
<th style="text-align: center">{{"Record as a field~作为字段保存" | lang}}</th>
</tr>
<tr v-if="task.outputParameters.length>0" v-for="i in task.outputParameters.length">
<td style="min-width: 50px; text-align: center">{{i}}</td>
<td style="text-align: center">{{task.outputParameters[i-1]["name"]}}</td>
<td style="text-align: center">{{task.outputParameters[i-1]["type"]}}</td>
<td>{{task.outputParameters[i-1]["exampleValue"]}}</td>
<td>{{task.outputParameters[i-1]["desc"]}}</td>
<td style="text-align: center">{{task.outputParameters[i-1]["recordASField"] == 1? "Yes~是": "No~否" | lang}}
</td>
</tr>
<tr v-if="task.outputParameters.length==0">
<td style="min-width: 50px;text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
<td style="text-align: center">{{"Empty~无" | lang}}</td>
</tr>
</tbody>
</table>
</div>
</div>
</body>
@ -128,16 +134,16 @@
},
methods: {
dateFormat: DateFormat,
gotoHome:function(){
let url = "";
if(getUrlParam("lang")=="zh"){
url = "taskList.html?lang=zh&type="+getUrlParam("type")+"&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper="+ app.$data.backEndAddressServiceWrapper
} else{
url = "taskList.html?lang=en&type="+getUrlParam("type")+"&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper="+ app.$data.backEndAddressServiceWrapper
}
window.location.href= url;
gotoHome: function () {
let url = "";
if (getUrlParam("lang") == "zh") {
url = "taskList.html?lang=zh&type=" + getUrlParam("type") + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
} else {
url = "taskList.html?lang=en&type=" + getUrlParam("type") + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
}
window.location.href = url;
},
modifyTask: function(id, url) {
modifyTask: function (id, url) {
let message = { //显示flowchart
type: 1, //消息类型,传递链接
message: {
@ -146,21 +152,20 @@
};
// ws.send(JSON.stringify(message));
// window.location.href = url; //跳转链接
if(getUrlParam("lang")=="zh"){
window.location.href = "FlowChart_CN.html?type="+getUrlParam("type")+"&lang="+getUrlParam("lang")+"&id=" + id + "&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper="+ app.$data.backEndAddressServiceWrapper
} else{
window.location.href = "FlowChart.html?type="+getUrlParam("type")+"&lang="+getUrlParam("lang")+"&id=" + id + "&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper="+ app.$data.backEndAddressServiceWrapper
if (getUrlParam("lang") == "zh") {
window.location.href = "FlowChart_CN.html?type=" + getUrlParam("type") + "&lang=" + getUrlParam("lang") + "&id=" + id + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
} else {
window.location.href = "FlowChart.html?type=" + getUrlParam("type") + "&lang=" + getUrlParam("lang") + "&id=" + id + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper
}
},
invokeTask: function(id) {
window.location.href = "executeTask.html?type="+getUrlParam("type")+"&lang="+getUrlParam("lang")+"&id=" + id + "&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper="+ app.$data.backEndAddressServiceWrapper;
invokeTask: function (id) {
window.location.href = "executeTask.html?type=" + getUrlParam("type") + "&lang=" + getUrlParam("lang") + "&id=" + id + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper;
},
}
});
$.get(app.$data.backEndAddressServiceWrapper + "/queryTask?id=" + sId, function(result) {
$.get(app.$data.backEndAddressServiceWrapper + "/queryTask?id=" + sId, function (result) {
app.$data.task = result;
app.$data.show = true;
});
</script>

View File

@ -4,85 +4,171 @@
<head>
<script src="jquery-3.4.1.min.js"></script>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
<meta name="viewport"
content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<script src="vue.js"></script>
<link rel="stylesheet" href="bootstrap/css/bootstrap.css"></link>
<link rel="stylesheet" href="element-ui/index.css"></link>
<script src="element-ui/index.js"></script>
<title>任务列表 | Task List</title>
</head>
<style>
th,td{
th, td {
text-align: left;
vertical-align: middle!important;
vertical-align: middle !important;
}
@media (max-width: 500px) {
.tasklist{
margin-left:10%!important;
.tasklist {
margin-left: 10% !important;
}
}
.search-header {
display: flex;
justify-content: flex-end; /* Right align the search box */
align-items: center;
}
.search-input {
/*margin-right: 8px; !* Optional: Adjust spacing between input and button *!*/
}
.task-links {
display: flex;
justify-content: space-between; /* Spread links evenly */
}
</style>
<body>
<div class="row" style="margin-top: 40px;">
<div style="margin:0 auto; min-width: 70%;" id="taskList" class="tasklist">
<h4 style="text-align: center;">{{"Task List~任务列表" | lang}}</h4>
<h5 style="text-align: center;" v-if="mobile==1">{{"View this table by direction keys on keyboard~按键盘方向键浏览此表格" | lang}}</h5>
<p><a v-if="type==3" href="javascript:void(0)" v-on:click="newTask" class="btn btn-primary">{{"New Task~创建新任务" | lang}}</a></p>
<div v-if="type != 3" style="margin-bottom: 20px">
<div style="margin-bottom: 5px">{{"提示下方的官方教程和答疑平台均在Github可能出现访问速度慢的问题请耐心等待。~" | lang}}</div>
<a class="btn btn-primary" href="https://github.com/NaiboWang/EasySpider/wiki" target="_blank">{{"Software Documentation~软件使用说明文档" | lang}}</a>
<a class="btn btn-primary" href="https://github.com/NaiboWang/EasySpider/issues?q=is%3Aissue" target="_blank">{{"Ask questions here~官方答疑平台" | lang}}</a>
<a class="btn btn-primary" href="https://github.com/NaiboWang/EasySpider/issues/22" target="_blank">{{"See how to run task by schedule~定时执行任务教程" | lang}}</a>
<!-- <a class="btn btn-primary" href="https://github.com/NaiboWang/EasySpider/wiki/Run-multiple-tasks-in-parallel" target="_blank">{{"See how to run multiple tasks in parallel~同时执行多个任务教程" | lang}}</a>-->
<div class="row" style="margin-top: 40px;">
<div style="margin:0 auto; min-width: 70%;" id="taskList" class="tasklist">
<h4 style="text-align: center;">{{"Task List~任务列表" | lang}}</h4>
<h5 style="text-align: center;"
v-if="mobile==1">{{"View this table by direction keys on keyboard~按键盘方向键浏览此表格" | lang}}</h5>
<p><a v-if="type==3" href="javascript:void(0)" v-on:click="newTask"
class="btn btn-primary">{{"New Task~创建新任务" | lang}}</a></p>
<div v-if="type != 3" style="margin-bottom: 20px">
<div style="margin-bottom: 5px">{{"提示下方的官方教程和答疑平台均在Github可能出现访问速度慢的问题请耐心等待。~" | lang}}
</div>
<div style="margin-bottom: 10px">
<table style="table-layout: auto;" class="table table-hover">
<thead>
<tr>
<th style="text-align: center">No.</th>
<th style="text-align: center">ID</th>
<th style="text-align: center">{{"Task Name~任务名称" | lang}}</th>
<th>{{"URL~网址" | lang}}</th>
<th v-bind:colspan="type" style="min-width: 300px">{{"Operations~操作" | lang}}</th>
</tr>
</thead>
<tbody>
<tr v-for="i in list.length">
<td style="text-align: center">{{i}}</td>
<td style="text-align: center">{{list[i-1]["id"]}}</td>
<!-- <td style="overflow: hidden;; max-width: 200px;text-align: center">{{list[i-1]["id"]}}</td>-->
<td style="overflow: hidden;; max-width: 200px;text-align: center">{{list[i-1]["name"]}}</td>
<td style="height: 30px;overflow: hidden; max-width: 200px">{{list[i-1]["url"]}}</td>
<td style="text-align: left"><a href="javascript:void(0)" v-on:click="browseTask(list[i-1]['id'])">{{"Task Information~任务信息" | lang}}</a></td>
<td style="text-align: left;font-weight: bold" v-if="type==3"><a href="javascript:void(0)" v-on:click="modifyTask(list[i-1]['id'],list[i-1]['url'])">{{"Modify Task~修改任务" | lang}}</a></td>
<td style="text-align: left"><a disabled href="javascript:void(0)" v-on:dblclick="deleteTask(list[i-1]['id'])">{{"Delete Task (Double Click)~删除任务(双击)" | lang}}</a></td>
</tr>
</tbody>
</table>
</div>
<a class="btn btn-primary" href="https://github.com/NaiboWang/EasySpider/wiki"
target="_blank">{{"Software Documentation~软件使用说明文档" | lang}}</a>
<a class="btn btn-primary" href="https://github.com/NaiboWang/EasySpider/issues?q=is%3Aissue"
target="_blank">{{"Ask questions here~官方答疑平台" | lang}}</a>
<a class="btn btn-primary" href="https://github.com/NaiboWang/EasySpider/issues/22"
target="_blank">{{"See how to run task by schedule~定时执行任务教程" | lang}}</a>
<!-- <a class="btn btn-primary" href="https://github.com/NaiboWang/EasySpider/wiki/Run-multiple-tasks-in-parallel" target="_blank">{{"See how to run multiple tasks in parallel~同时执行多个任务教程" | lang}}</a>-->
</div>
</div>
<el-table
style="width: 100%"
:empty-text="LANG('No Task~暂无任务')"
:data="list.filter(data => !search || (data.name.toLowerCase().includes(search.toLowerCase())) || (data.url.toLowerCase().includes(search.toLowerCase())) || (data.links.includes(search.toLowerCase())) || (data.desc.includes(search.toLowerCase())) || (data.id.toString().includes(search.toLowerCase())))"
:default-sort="{prop: 'mtime', order: 'descending'}"
>
<el-table-column
prop="id"
:label="LANG('Task ID~任务ID')"
sortable
width="120"
align="center"
>
</el-table-column>
<el-table-column
prop="name"
:label="LANG('Task Name~任务名称')"
sortable
align="center"
>
</el-table-column>
<el-table-column
prop="url"
label="URL"
sortable
>
</el-table-column>
<!-- <el-table-column-->
<!-- prop="mtime"-->
<!-- :label="LANG('Update Time~更新时间')"-->
<!-- sortable-->
<!-- :formatter="formatDate"-->
<!-- width="170"-->
<!-- >-->
</el-table-column>
<el-table-column
width="350"
align="center">
<!-- Header template for the search input -->
<template slot="header" slot-scope="scope">
<div class="search-header">
<!-- Search input aligned to the right -->
<el-input
v-model="search"
class="search-input"
prefix-icon="el-icon-search"
:placeholder="LANG('Please input keywords to search~请输入关键词搜索')">
</el-input>
<!-- <el-button icon="el-icon-search"></el-button>-->
</div>
</template>
<template slot-scope="scope">
<!-- Use flex container to justify content space-around -->
<div class="task-links">
<a href="javascript:void(0)" v-on:click="browseTask(scope.$index, scope.row)">{{ "View~任务信息"
| lang }}</a>
<a href="javascript:void(0)" v-if="type==3" v-on:click="modifyTask(scope.$index, scope.row)">{{
"Modify~修改任务" | lang }}</a>
<a href="javascript:void(0)"
v-on:dblclick="deleteTask(scope.$index, scope.row)">{{ "Delete (Double Click)~删除任务(双击)" | lang }}</a>
</div>
</template>
</el-table-column>
</el-table>
</div>
</div>
</body>
</html>
<script src="global.js"></script>
<script>
var app = new Vue({
let app = new Vue({
el: '#taskList',
data: {
search: '',
list: [],
type: 3, //记录服务行为
mobile: getUrlParam("mobile"),
backEndAddressServiceWrapper: getUrlParam("backEndAddressServiceWrapper"),
},
methods: {
newTask: function (){
window.location.href = "newTask.html?lang="+getUrlParam("lang")+"&mobile="+getUrlParam("mobile")+"&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper="+ app.$data.backEndAddressServiceWrapper;
formatDate: function (row, column) {
//2023-01-01 00:00:00
let date = row[column.property];
// 2023-12-26T12:44:32.599Z
let original_time = row.mtime;
let year = original_time.substring(0, 4);
let month = original_time.substring(5, 7);
let day = original_time.substring(8, 10);
let hour = original_time.substring(11, 13);
let minute = original_time.substring(14, 16);
let second = original_time.substring(17, 19);
return year + "-" + month + "-" + day + " " + hour + ":" + minute + ":" + second;
},
modifyTask: function(id, url) {
newTask: function () {
window.location.href = "newTask.html?lang=" + getUrlParam("lang") + "&mobile=" + getUrlParam("mobile") + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper;
},
LANG: function (text) {
if (getUrlParam("lang") == "en" || getUrlParam("lang") == "") {
return text.split("~")[0];
} else if (getUrlParam("lang") == "zh") {
return text.split("~")[1];
}
},
modifyTask: function (index, row) {
let id = row.id;
let url = row.links.split("\n")[0];
console.log(index, row)
let message = { //显示flowchart
type: 1, //消息类型,传递链接
message: {
@ -92,10 +178,12 @@
ws.send(JSON.stringify(message));
window.location.href = url; //跳转链接
},
browseTask: function(id) {
window.location.href = "taskInfo.html?type="+getUrlParam("type")+"&id=" + id + "&lang="+getUrlParam("lang")+"&wsport="+getUrlParam("wsport")+"&backEndAddressServiceWrapper="+ app.$data.backEndAddressServiceWrapper; //跳转链接
browseTask: function (index, row) {
let id = row.id;
window.location.href = "taskInfo.html?type=" + getUrlParam("type") + "&id=" + id + "&lang=" + getUrlParam("lang") + "&wsport=" + getUrlParam("wsport") + "&backEndAddressServiceWrapper=" + app.$data.backEndAddressServiceWrapper; //跳转链接
},
deleteTask: function(id) {
deleteTask: function (index, row) {
let id = row.id;
// let text = "Are you sure to remove the selected task?";
// if (getUrlParam("lang") == "en"|| getUrlParam("lang")=="") {
// text = "Are you sure to remove the selected task?";
@ -103,30 +191,30 @@
// text = "确定要删除选中的任务吗?";
// }
// if (confirm(text)) {
$.get(app.$data.backEndAddressServiceWrapper + "/deleteTask?id=" + id, function(res) {
$.get(app.$data.backEndAddressServiceWrapper + "/queryTasks", function(re) {
result = re.sort(desc);
app.$data.list = result;
});
$.get(app.$data.backEndAddressServiceWrapper + "/deleteTask?id=" + id, function (res) {
$.get(app.$data.backEndAddressServiceWrapper + "/queryTasks", function (re) {
result = re.sort(desc);
app.$data.list = result;
});
// alert("Sorry, the task cannot be deleted since the system is a demo system for paper reviewers, please contact the author (naibowang@u.nus.edu) to remove it.")
});
// alert("Sorry, the task cannot be deleted since the system is a demo system for paper reviewers, please contact the author (naibowang@u.nus.edu) to remove it.")
// }
},
}
});
var desc = function(x, y) {
let desc = function (x, y) {
return (x["id"] < y["id"]) ? 1 : -1
}
$.get(app.$data.backEndAddressServiceWrapper + "/queryTasks", function(re) {
$.get(app.$data.backEndAddressServiceWrapper + "/queryTasks", function (re) {
// result = re.sort(desc);
app.$data.list = re;
if (getUrlParam("type") == "1") {
app.$data.type = 2;
}
});
ws = new WebSocket("ws://localhost:"+getUrlParam("wsport"));
ws.onopen = function() {
ws = new WebSocket("ws://localhost:" + getUrlParam("wsport"));
ws.onopen = function () {
// Web Socket 已连接上,使用 send() 方法发送数据
console.log("已连接");
message = {
@ -137,7 +225,7 @@
};
this.send(JSON.stringify(message));
};
ws.onclose = function() {
ws.onclose = function () {
// 关闭 websocket
console.log("连接已关闭...");
};

View File

@ -0,0 +1,3 @@
const path = require("path");
const task_server = require(path.join(__dirname, "server.js"));
task_server.start(8074); //start local server

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1 +1 @@
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"12/7/2023, 2:56:47 AM","version":"0.6.0","saveThreshold":10,"quitWaitTime":60,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}}]}
{"id":228,"name":"[2312.02977] Exploring the nonclassical dynamics of the \"classical'' Schrödinger equation","url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","create_time":"12/7/2023, 2:44:58 AM","update_time":"2024-01-05 22:08:46","version":"0.6.0","saveThreshold":10,"quitWaitTime":3,"environment":1,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"TTT","dataWriteMode":3,"inputExcel":"","startFromExit":0,"pauseKey":"p","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"https://arxiv.org/abs/2312.02977","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://arxiv.org/abs/2312.02977","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://arxiv.org/abs/2312.02977"},{"id":1,"name":"loopTimes_1","nodeId":5,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":10,"value":10}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,5],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://arxiv.org/abs/2312.02977","links":"https://arxiv.org/abs/2312.02977","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":2,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":3,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":2,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":-1,"index":4,"parentId":0,"type":0,"option":2,"title":"点击Download PDF","sequence":[],"isInLoop":false,"position":3,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"download-pdf\")]","iframe":false,"wait":2,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"clickWay":0,"maxWaitTime":10,"params":[],"alertHandleType":0,"allXPaths":["/html/body/div[2]/main[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[1]/a[1]","//a[contains(., 'Download P')]","//A[@class='abs-button download-pdf']","/html/body/div[last()-3]/main/div/div/div[last()-2]/div[last()-5]/ul/li[last()-2]/a"]}},{"id":2,"index":5,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[2],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"//body","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":10,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

View File

@ -1 +1 @@
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"07/12/2023, 03:43:34","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"desc":"https://www.zhihu.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":2,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":1,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":3,"index":3,"parentId":2,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}}]}
{"id":229,"name":"知乎 - 有问题,就会有答案","url":"https://www.zhihu.com","links":"https://www.zhihu.com","create_time":"07/12/2023, 03:26:24","update_time":"2023-12-27 20:05:50","version":"0.6.0","saveThreshold":10,"quitWaitTime":6,"environment":0,"maximizeWindow":0,"maxViewLength":15,"recordLog":1,"outputFormat":"xlsx","saveName":"current_time","dataWriteMode":1,"inputExcel":"","startFromExit":0,"pauseKey":"t","containJudge":false,"browser":"chrome","removeDuplicate":0,"desc":"知了个乎","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.zhihu.com","desc":"要采集的网址列表,多行以\\n分开","type":"text","exampleValue":"https://www.zhihu.com"},{"id":1,"name":"loopTimes_1","nodeId":4,"nodeName":"循环 - 单个元素","desc":"循环循环 - 单个元素执行的次数0代表无限循环","type":"int","exampleValue":0,"value":0}],"outputParameters":[{"id":0,"name":"参数1_文本","desc":"","type":"text","recordASField":1,"exampleValue":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,4,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"url":"https://www.zhihu.com","links":"https://www.zhihu.com","maxWaitTime":10,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"cookies":""}},{"id":3,"index":2,"parentId":0,"type":1,"option":8,"title":"循环采集数据","sequence":[3],"isInLoop":false,"position":2,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":1,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"allXPaths":["/html/body/div[1]/div[1]/main[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/h2[1]/div[1]","//div[contains(., '死刑执行前可以谎称肚')]","/html/body/div[last()-7]/div/main/div/div/div[last()-1]/div/div/div/div/div/div[last()-12]/div/div/div/div/h2/div"]}},{"id":4,"index":3,"parentId":3,"type":0,"option":3,"title":"提取数据","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"clear":0,"newLine":1,"params":[{"nodeType":0,"contentType":0,"relative":true,"name":"参数1_文本","desc":"","extractType":0,"relativeXPath":"","allXPaths":"","exampleValues":[{"num":0,"value":"死刑执行前可以谎称肚子痛,想排泄粪便,籍此拖延时间吗?"}],"unique_index":"onlvi030w9jlpu5tjzb","iframe":false,"default":"","paraType":"text","recordASField":1,"beforeJS":"","beforeJSWaitTime":0,"JS":"","JSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"downloadPic":0}],"loopType":1}},{"id":2,"index":4,"parentId":0,"type":1,"option":8,"title":"循环 - 单个元素","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","iframe":false,"wait":0,"waitType":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"waitElement":"","waitElementTime":10,"waitElementIframeIndex":0,"scrollType":0,"scrollCount":1,"scrollWaitTime":1,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"exitElement":"//body","historyWait":2,"breakMode":0,"breakCode":"","breakCodeWaitTime":0,"skipCount":0}}]}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

Some files were not shown because too many files have changed in this diff Show More