1
0
forked from lolcat/4get

Compare commits

101 Commits

Author SHA1 Message Date
aa9806300a doc changes 2025-08-19 23:45:05 -04:00
91ce5c1563 remove ssl ciphers they bug out the new method 2025-08-19 23:28:54 -04:00
cdf958d293 fix wikipedia crash 2025-08-10 21:55:15 -04:00
2d63475b07 fix MDN answers not rendering properly 2025-08-10 21:49:51 -04:00
ae31274db9 these comments were too long 2025-08-10 21:39:19 -04:00
20ef5b3e3a re-added stackoverflow instant answers 2025-08-10 21:37:45 -04:00
7c970031d0 fix #2 for real this time 2025-08-10 17:22:58 -04:00
2c2bd28a9f fix syntax highlighter 2025-08-10 17:15:42 -04:00
dea8b0a362 this was so much pain to figure out 2025-08-10 16:39:11 -04:00
da1ea1d6e8 add more hacks 2025-08-10 15:53:02 -04:00
2ca8fb0006 remove backtick 2025-08-10 15:44:46 -04:00
362cf61508 doc config changes number twoo 2025-08-10 15:43:49 -04:00
1828c63233 doc config changes 2025-08-10 15:42:14 -04:00
4215f2678d i always forget the fucking config 2025-08-09 13:34:45 -04:00
acd02d83d4 added cara.app 2025-08-09 13:28:36 -04:00
319640cd77 greppr fix 2025-08-09 11:00:48 -04:00
ad535a1609 fix broken mojeek instant answer iamge 2025-08-07 17:56:39 -04:00
7ac53c6e11 bypass mojeek bot check 2025-08-07 17:49:45 -04:00
f33f02e816 so fucking dumb 2025-08-06 21:10:58 -04:00
27b8509ac0 fix ddg answer image 404 2025-08-06 21:09:33 -04:00
1e52982cb9 fix ddg returning weird reresults when no match is found 2025-08-06 20:19:13 -04:00
3f2bfcb8c7 aaaaaa part 3 2025-08-04 00:39:47 -04:00
d3ef1a67c0 aaaaaa part 2 2025-08-04 00:39:13 -04:00
9afef55d89 aaaaaa 2025-08-04 00:37:12 -04:00
706b490bf3 bypass captcha??? hopefullly........ 2025-08-04 00:34:06 -04:00
74f7c920f6 handle vimeo captchas 2025-08-04 00:18:38 -04:00
3dbcf60a3e added vimeo 2025-08-04 00:00:39 -04:00
f30872134f handle mojeek block 2025-08-03 12:28:57 -04:00
2c4dc7da84 lol forgot the request is still for yt 2025-08-02 17:44:10 -04:00
5a0f5b868a i hate git 2025-08-02 16:48:58 -04:00
7cf403e125 forgot the fucking config 2025-08-02 16:39:34 -04:00
73b7922898 added sepia search 2025-08-02 16:38:36 -04:00
336cb49d98 im so retarded 2025-08-01 20:54:08 -04:00
8cd8e7380f readme 2025-07-30 23:21:12 -04:00
3d9d95db34 ffffffffffukccccccccc 2025-07-30 15:49:05 -04:00
eb73b1f357 dementia 2025-07-30 15:47:29 -04:00
f7499294de added cock cock, removed solofield 2025-07-30 15:35:27 -04:00
60f7150008 readme 2025-07-28 18:53:21 -04:00
2b8d90af12 forgot the config fucking dementiacatwill 2025-07-27 21:53:20 -04:00
0f803804a4 forgot the settings damn it 2025-07-27 21:48:10 -04:00
f43feff0aa added baidu, the best search engine 2025-07-27 21:46:03 -04:00
0bdd5e73df added kuuro theme 2025-07-27 11:52:09 -04:00
430c0a2f0f fix potential xss woops 2025-07-08 23:10:13 -04:00
1a00bf8069 duckduckgo spelling fix 2025-07-08 23:08:12 -04:00
502f6d12e4 marginalia crash fix 2025-07-03 19:43:58 -04:00
a2bc1e6190 bypass anubis bullshit on marginalia 2025-06-20 01:18:57 -04:00
f73b5f0298 fix yandex web 2025-06-18 10:30:31 -04:00
3e1487e614 maybe we should call the function chucknuts 2025-06-12 10:30:47 -04:00
037566bbba handle google captcha 2025-06-11 18:44:41 -04:00
b61bc6d07c fix google image crash 2025-06-01 13:03:39 -04:00
8d50667b0d handle imgur ip block 2025-05-27 20:03:40 -04:00
a0545b6006 fix startpage videos 2025-05-24 20:49:49 -04:00
78aa2e198f fuiwhwehfuiewuf 2025-04-19 10:42:21 -04:00
b85820cbcd hopefully this fixes bing images my fucking god 2025-04-19 10:37:35 -04:00
4b85841a3e forgot nsfw filter god damn 2025-04-17 21:41:12 -04:00
8d07e72dfe forgot settings, god damn i have dementia 2025-04-17 20:08:15 -04:00
566680fe36 ok its unfucked now 2025-04-17 20:06:42 -04:00
077692db49 i fucking hate bing 2025-04-17 20:05:58 -04:00
e4bf53cdaa forgot config 2025-04-17 20:02:02 -04:00
4489bb21e5 forgot flickr 2025-04-17 20:00:33 -04:00
ff8b1addf7 fixed bing images failing to load, added flickr 2025-04-17 19:54:34 -04:00
3e2c3fc5d9 fixed google videos 2025-04-02 21:40:53 -04:00
49ddd1a216 duckduckgo images nsfw fix 2025-03-20 21:05:36 -04:00
81ca8eaddc gore theme fix 2025-03-16 03:39:58 -04:00
c9c8d578f3 Merge branch 'master' of https://git.lolcat.ca/lolcat/4get 2025-03-02 21:58:34 -05:00
b2203804c7 path traversal exploit (this is what you get for using free software) 2025-03-02 21:58:18 -05:00
13dfa9240c Merge pull request 'fix Dockerfile build.' (#67) from Fijxu/4get:dockerfile-fix into master
Reviewed-on: lolcat/4get#67
2025-02-04 07:05:12 +00:00
0a53c3605a fix Dockerfile build.
The `alpine:latest` image do not longer include php83 on their repos.
Using a specific image tag is better to prevent breakages on the future.

Ref: https://github.com/dnaprawa/dockerfile-best-practices?tab=readme-ov-file#the-latest-is-an-evil-choose-specific-image-tag
2025-02-02 01:37:14 -03:00
36b0c570aa Merge branch 'master' of https://git.lolcat.ca/lolcat/4get 2025-02-01 22:53:06 -05:00
47a7a2a224 brave pagination fix 2025-02-01 22:51:44 -05:00
0180cf5224 Merge pull request 'override ssl.conf when using http config' (#63) from docker_fix_conf into master
Reviewed-on: lolcat/4get#63
2025-01-24 15:41:41 +00:00
eed32a153c remove ssl.conf when using http config 2025-01-24 00:53:34 -08:00
f9f3c919d6 brave scraper fix 2025-01-22 20:04:42 -05:00
4b0d8f75dc google scraper fix 2025-01-19 14:02:24 -05:00
033e4cb959 added vsco scraper 2025-01-11 23:07:58 -05:00
91f621e105 readme changes 2025-01-11 14:34:23 -05:00
9f60900875 500px scraper 2025-01-11 14:12:54 -05:00
631aa58565 marginalia hotfix 2025-01-07 21:12:07 -05:00
b892f90b13 readme changes 2025-01-07 00:05:42 -05:00
463ba0775f added pinterest 2025-01-06 23:56:49 -05:00
cfad4fb035 wordnik bugfix 2025-01-06 21:05:45 -05:00
4e968b4b1c youtube scraper fix 2025-01-03 22:22:30 -05:00
81df52235c fixed google crash 2025-01-03 21:43:40 -05:00
1ca2626ad9 fix ddg bug with EOF result 2025-01-03 21:16:00 -05:00
9ca93f34c6 ddg hotfix 2024-12-17 21:01:36 -05:00
0a43b9c849 added arquivo.pt 2024-12-17 10:11:53 -05:00
b636fec319 fucking git is so shit 2024-12-17 00:35:15 -05:00
774f7113df duckduckgo scraper rewrite 2024-12-17 00:31:15 -05:00
0b3bbe0f15 gore's shitty theme fix 2024-12-02 15:19:10 -05:00
5f0b0a7b83 findthatmeme fix 2024-12-01 15:59:03 -05:00
920b9d5b3f brave crash fix 2024-11-19 09:22:58 -05:00
9cd369ac08 http2 on ddg 2024-11-07 23:37:43 -05:00
e83865be49 added pagination 2024-11-07 00:12:06 -05:00
68dd7f29f6 mojeek thumbnail fix 2024-11-06 23:43:54 -05:00
aaa30c79f5 fix google cse image crash + added word autocorrect 2024-10-31 20:31:23 -04:00
070f9d442b brave fix 2024-10-29 21:29:17 -04:00
9c18753ec3 yes of course i need to fucking forget the .php again AAAAAAAAAAA 2024-10-24 21:46:54 -04:00
d8a729796e fix crash on google cse, added settings 2024-10-22 20:15:00 -04:00
2bbe5a29a9 Merge branch 'master' of https://git.lolcat.ca/lolcat/4get 2024-10-22 11:34:06 -04:00
9ac195ac3b added google CSE 2024-10-22 11:33:14 -04:00
d427a48ed4 Merge pull request 'nginx documentation but better' (#41) from bread/4get:master into master
Reviewed-on: lolcat/4get#41
2024-10-21 14:17:07 +00:00
46 changed files with 11615 additions and 6186 deletions

1
.dockerignore Normal file
View File

@@ -0,0 +1 @@
.git

View File

@@ -1,10 +1,9 @@
FROM alpine:latest
FROM alpine:3.21
WORKDIR /var/www/html/4get
RUN apk update && apk upgrade
RUN apk add php apache2-ssl php83-fileinfo php83-openssl php83-iconv php83-common php83-dom php83-sodium php83-curl curl php83-pecl-apcu php83-apache2 imagemagick php83-pecl-imagick php-mbstring imagemagick-webp imagemagick-jpeg
RUN apk add php apache2-ssl php84-fileinfo php84-openssl php84-iconv php84-common php84-dom php84-sodium php84-curl curl php84-pecl-apcu php84-apache2 imagemagick php84-pecl-imagick php84-mbstring imagemagick-webp imagemagick-jpeg
COPY ./docker/apache/ /etc/apache2/
COPY . .
RUN chmod 777 /var/www/html/4get/icons
@@ -14,4 +13,5 @@ EXPOSE 443
ENV FOURGET_PROTO=http
CMD ["./docker/docker-entrypoint.sh"]
ENTRYPOINT ["./docker/docker-entrypoint.sh"]
CMD ["start"]

View File

@@ -9,9 +9,11 @@ https://4get.ca/about
## Official instance
https://4get.ca , or visit the official instance list: https://4get.ca/instances
_NOT to be confused with 4get.ch, 4get.lol and friends! I **don't** host these._
## Totally unbiased comparison between alternatives
| | 4get | searx(ng) | libreY | araa | hearch |
| | 4get | searx(ng) | libreY | araa | hearch.co |
|----------------------------|-------------------------|-----------|-------------|-----------|-------------------|
| RAM usage | 200-400mb~ | 2GB~ | 200-400mb~ | 2GB~ | idk |
| Does it suck | no (debunked by snopes) | yes | yes | a little | better than searx |
@@ -23,36 +25,37 @@ https://4get.ca , or visit the official instance list: https://4get.ca/instances
3. Bot protection that *actually* filters out the bots (when configured)
4. Interface doesn't require javascript
5. Favicon fetcher with caching support & image proxy
6. Bunch of other shit
6. Bunch of other shits
tl;dr the best way to actually browse for shit.
tl;dr 4get is the best way to browse for shit.
# Supported websites
| Web | Images | Videos | News | Music | Autocompleter |
|------------|--------------|------------|------------|------------|---------------|
| DuckDuckGo | DuckDuckGo | YouTube | DuckDuckGo | Soundcloud | Brave |
| Brave | Brave | DuckDuckGo | Brave | | DuckDuckGo |
| Yandex | Yandex | Brave | Google | | Yandex |
| Google | Google | Yandex | Startpage | | Google |
| Startpage | Startpage | Google | Qwant | | Startpage |
| Qwant | Qwant | Startpage | Mojeek | | Kagi |
| Ghostery | Yep | Qwant | | | Qwant |
| Yep | Solofield | Solofield | | | Ghostery |
| Greppr | Imgur | | | | Yep |
| Crowdview | FindThatMeme | | | | Marginalia |
| Mwmbl | | | | | YouTube |
| Mojeek | | | | | Soundcloud |
| Solofield | | | | | |
| Marginalia | | | | | |
| wiby | | | | | |
| Curlie | | | | | |
| Web | Images | Videos | News | Music | Autocompleter |
|------------|--------------|--------------|------------|------------|---------------|
| DuckDuckGo | DuckDuckGo | YouTube | DuckDuckGo | Soundcloud | Brave |
| Brave | Brave | Sepia Search | Brave | | DuckDuckGo |
| Yandex | Yandex | DuckDuckGo | Google | | Yandex |
| Google | Google | Brave | Startpage | | Google |
| Startpage | Startpage | Yandex | Qwant | | Startpage |
| Qwant | Qwant | Google | Mojeek | | Kagi |
| Ghostery | Yep | Startpage | Baidu | | Qwant |
| Yep | Baidu | Qwant | | | Ghostery |
| Greppr | Pinterest | Baidu | | | Yep |
| Crowdview | 500px | Coc Coc | | | Marginalia |
| Mwmbl | VSCO | | | | YouTube |
| Mojeek | Imgur | | | | Soundcloud |
| Baidu | FindThatMeme | | | | |
| Coc Coc | | | | | |
| Marginalia | | | | | |
| wiby | | | | | |
| Curlie | | | | | |
# Installation
Refer to the <a href="https://git.lolcat.ca/lolcat/4get/src/branch/master/docs/">documentation index</a>. I recommend following the <a href="https://git.lolcat.ca/lolcat/4get/src/branch/master/docs/apache2.md">apache2 guide</a>.
## Contact
Shit breaks all the time but I repair it all the time too... Email me here: <b>will (at) lolcat.ca</b> or create an issue.
Shit breaks all the time but I repair it all the time too. Email me here: <b>will (at) lolcat.ca</b> or create an issue.
## License
AGPL

21
api.txt
View File

@@ -1,10 +1,17 @@
__ __ __
/ // / ____ ____ / /_
/ // /_/ __ `/ _ \/ __/
/__ __/ /_/ / __/ /_
/_/ \__, /\___/\__/
/____/
44
4444444 44
44444444 44444 444
44444444 444444 444444444
44444 44444444 444444444
444444444 4444444
4444444444 444444
4444444444444
444444444444444444
444444444444444
44444444
4444
44
+ Welcome to the 4get API documentation +
+ Terms of use

View File

@@ -100,7 +100,6 @@ class config{
"https://4get.sijh.net",
"https://4get.hbubli.cc",
"https://4get.plunked.party",
"https://4get.seitan-ayoub.lol",
"https://4get.etenie.pl",
"https://4get.lunar.icu",
"https://4get.dcs0.hu",
@@ -119,7 +118,7 @@ class config{
// Default user agent to use for scraper requests. Sometimes ignored to get specific webpages
// Changing this might break things.
const USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0";
const USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:141.0) Gecko/20100101 Firefox/141.0";
// Proxy pool assignments for each scraper
// false = Use server's raw IP
@@ -129,8 +128,12 @@ class config{
const PROXY_BRAVE = false;
const PROXY_FB = false; // facebook
const PROXY_GOOGLE = false;
const PROXY_GOOGLE_API = false;
const PROXY_GOOGLE_CSE = false;
const PROXY_STARTPAGE = false;
const PROXY_QWANT = false;
const PROXY_BAIDU = false;
const PROXY_COCCOC = false;
const PROXY_GHOSTERY = false;
const PROXY_MARGINALIA = false;
const PROXY_MOJEEK = false;
@@ -140,8 +143,15 @@ class config{
const PROXY_WIBY = false;
const PROXY_CURLIE = false;
const PROXY_YT = false; // youtube
const PROXY_SEPIASEARCH = false;
const PROXY_ODYSEE = false;
const PROXY_VIMEO = false;
const PROXY_YEP = false;
const PROXY_PINTEREST = false;
const PROXY_SANKAKUCOMPLEX = false;
const PROXY_FLICKR = false;
const PROXY_FIVEHPX = false;
const PROXY_VSCO = false;
const PROXY_SEZNAM = false;
const PROXY_NAVER = false;
const PROXY_GREPPR = false;
@@ -149,6 +159,7 @@ class config{
const PROXY_MWMBL = false;
const PROXY_FTM = false; // findthatmeme
const PROXY_IMGUR = false;
const PROXY_CARA = false;
const PROXY_YANDEX_W = false; // yandex web
const PROXY_YANDEX_I = false; // yandex images
const PROXY_YANDEX_V = false; // yandex videos
@@ -157,6 +168,9 @@ class config{
// Scraper-specific parameters
//
// GOOGLE CSE & GOOGLE API
const GOOGLE_CX_ENDPOINT = "d4e68b99b876541f0";
// MARGINALIA
// Use "null" to default out to HTML scraping OR specify a string to
// use the API (Eg: "public"). API has less filters.

View File

@@ -6,14 +6,15 @@ services:
image: luuul/4get:latest
restart: unless-stopped
environment:
- FOURGET_PROTO=http
- FOURGET_SERVER_NAME=4get.ca
- FOURGET_INSTANCES=https://4get.ca
ports:
- "80:80"
- "443:443"
volumes:
- /etc/letsencrypt/live/domain.tld:/etc/4get/certs
# mount custom banners and captcha
- ./banners:/var/www/html/4get/banner
- ./captcha:/var/www/html/4get/data/captcha
# volumes:
# - /etc/letsencrypt/live/domain.tld:/etc/4get/certs # mount ssl
# - ./banners:/var/www/html/4get/banner # mount custom banners
# - ./captcha:/var/www/html/4get/data/captcha # mount captcha images

View File

@@ -0,0 +1 @@
# intentionally blank

View File

@@ -8,18 +8,27 @@ FOURGET_PROTO="${FOURGET_PROTO#\"}"
# make lowercase
FOURGET_PROTO=`echo $FOURGET_PROTO | awk '{print tolower($0)}'`
FOURGET_SRC='/var/www/html/4get'
mkdir -p /etc/apache2
if [ "$FOURGET_PROTO" = "https" ]; then
echo "Using https configuration"
cp /etc/apache2/https.conf /etc/apache2/httpd.conf
cp -r ${FOURGET_SRC}/docker/apache/https/httpd.conf /etc/apache2
cp -r ${FOURGET_SRC}/docker/apache/https/conf.d/* /etc/apache2/conf.d
else
echo "Using http configuration"
cp /etc/apache2/http.conf /etc/apache2/httpd.conf
cp -r ${FOURGET_SRC}/docker/apache/http/httpd.conf /etc/apache2
cp -r ${FOURGET_SRC}/docker/apache/http/conf.d/* /etc/apache2/conf.d
fi
php ./docker/gen_config.php
echo "4get is running"
exec httpd -DFOREGROUND
if [ "$@" = "start" ]; then
echo "4get is running"
exec httpd -DFOREGROUND
else
exec "$@"
fi

View File

@@ -7,27 +7,35 @@ Then, install the following dependencies:
```sh
apt update
apt upgrade
apt install php-mbstring apache2 certbot php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-php
apt install php-mbstring apache2 certbot php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-fcgid php-fpm
```
Enable the required modules:
```sh
a2dismod mpm_prefork
a2enmod mpm_event
a2enmod ssl
a2enmod rewrite
a2enmod proxy_fcgi setenvif actions alias
a2enmod http2
a2enmod headers
a2enmod proxy
```
And enable these optional ones, which might be useful to you later on. The `proxy` module is useful for setting up reverse proxies to services like gitea, and `headers` is useful to tweak global header values:
Tune the performance of php-fpm. You will need to edit this file according to your server specs and number of users. Edit the file at `/etc/php/8.4/pool.d/www.conf`:
```sh
a2enmod proxy
a2enmod headers
pm = static
pm.max_children = 50
```
These values are what I currently use on 4get.ca, but for personal use, you can set `pm` to `ondemand` and `pm.max_children` to `20` (if you want those thumbnails to load fast!)
Now, restart apache2:
```sh
service apache2 restart
```
Just for good measure, please check if your webserver is running. Access it through HTTP, not HTTPS. You should see the apache2 default landing page.
Just for good measure, please check if your webserver is running. Access it through HTTP, not HTTPS. You should see the apache2 default landing page. Just a note, http2 won't work just yet since you don't have SSL yet.
## 000-default.conf
Now, edit the following file: `/etc/apache2/sites-available/000-default.conf`, remove everything and carefully add each rule specified here, while making sure to replace my domains with your own:
@@ -73,13 +81,28 @@ Now, edit the following file: `/etc/apache2/sites-available/000-default.conf`, r
AddOutputFilterByType DEFLATE text/css
DocumentRoot /var/www/4get
<FilesMatch \.php$>
SetHandler "proxy:unix:/run/php/php8.1-fpm.sock|fcgi://localhost/"
</FilesMatch>
Options -MultiViews
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.php [NC,L]
<Directory /var/www/4get>
Options -MultiViews
AllowOverride All
Require all granted
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.php [NC,L]
</Directory>
# deny access to private resources
<Directory /var/www/4get/data/>
Order Deny,allow
@@ -116,6 +139,7 @@ Make sure to replace `4get.ca` with your own domain under the `SSLCertificate*`
ServerAdmin will@lolcat.ca
DocumentRoot /var/www/4get
Protocols h2 http/1.1
SSLEngine On
SSLOptions +StdEnvVars
@@ -128,6 +152,10 @@ Make sure to replace `4get.ca` with your own domain under the `SSLCertificate*`
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/css
<FilesMatch \.php$>
SetHandler "proxy:unix:/run/php/php8.1-fpm.sock|fcgi://localhost/"
</FilesMatch>
SSLCertificateFile /etc/letsencrypt/live/4get.ca/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/4get.ca/privkey.pem
SSLCertificateChainFile /etc/letsencrypt/live/4get.ca/chain.pem

View File

@@ -9,34 +9,47 @@ Welcome! This guide assumes that you have a working 4get instance. This will hel
4. The captcha font is located in `data/fonts/captcha.ttf`
# Cloudflare bypass (TLS check)
**Note: this only allows you to bypass the browser integrity checks. Captchas & javascript challenges will not be bypassed.**
>These instructions have been updated to work with Debian 13 Trixie.
Configuring this lets you fetch images sitting behind Cloudflare and allows you to scrape the **Yep** & the **Mwmbl** search engines. Please be aware that APT will fight against you and will re-install the openSSL-version of curl constantly when updating.
**Note: this only allows you to bypass the browser integrity checks. Captchas & javascript challenges will not be bypassed by this program!**
First, follow these instructions. Only install the Firefox modules:
Configuring this lets you fetch images sitting behind Cloudflare and allows you to scrape the **Yep** search engine.
https://github.com/lwthiker/curl-impersonate/blob/main/INSTALL.md#native-build
Once you did this, you should be able to run the following inside your terminal:
To come up with this set of instructions, I used [this guide](https://github.com/lwthiker/curl-impersonate/blob/main/INSTALL.md#native-build) as a reference, but trust me you probably want to stick to what's written on this page.
First, compile curl-impersonate (the firefox flavor).
```sh
$ curl_ff117 --version
curl 8.1.1 (x86_64-pc-linux-gnu) libcurl/8.1.1 NSS/3.92 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 nghttp2/1.56.0
Release-Date: 2023-05-23
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB SSL threadsafe UnixSockets zstd
```
Now, after compiling, you should have a `libcurl-impersonate-ff.so` sitting somewhere. Mine (on my debian install) is located at `/usr/local/lib/libcurl-impersonate-ff.so`.
Find the `libcurl.so.4` file used by your current installation of curl. For me, this file is located at `/usr/lib/x86_64-linux-gnu/libcurl.so.4`
Now comes the sketchy part: replace `libcurl.so.4` with `libcurl-impersonate-ff.so`. You can do this in the following way:
```sh
sudo rm /usr/lib/x86_64-linux-gnu/libcurl.so.4
sudo cp /usr/local/lib/libcurl-impersonate-ff.so /usr/lib/x86_64-linux-gnu/libcurl.so.4
git clone https://github.com/lwthiker/curl-impersonate/
cd curl-impersonate
sudo apt install build-essential pkg-config cmake ninja-build curl autoconf automake libtool python3-pip libnss3 libnss3-dev
mkdir build
cd build
../configure
make firefox-build
sudo make firefox-install
sudo ldconfig
```
Make sure to restart your webserver and/or PHP daemon, otherwise it will keep using the old library. You should now be able to bypass Cloudflare's shitty checks!!
Now, after compiling, you should have a `libcurl-impersonate-ff.so` sitting somewhere. Mine is located at `/usr/local/lib/libcurl-impersonate-ff.so`. Patch your PHP install so that it loads the right library:
```sh
sudo systemctl edit php8.4-fpm.service
```
^This will open a text editor. Add the following shit in there, in between those 2 comments I pasted for ya just for reference:
```sh
### Editing /etc/systemd/system/php8.4-fpm.service.d/override.conf
### Anything between here and the comment below will become the contents of the>
[Service]
Environment="LD_PRELOAD=/usr/local/lib/libcurl-impersonate-ff.so"
Environment="CURL_IMPERSONATE=firefox117"
### Edits below this comment will be discarded
```
Restart php8.4-fpm. (`sudo service php8.4-fpm restart`). To test things out, try making a search on "Yep", they check for SSL. If you get results (or a timeout, this piece of shit engine is slow as fuck) that means it works!
# Robots.txt
Make sure you configure this right to optimize your search engine presence! Head over to `/robots.txt` and change the 4get.ca domain to your own domain.

View File

@@ -15,7 +15,12 @@ class favicon{
header("Content-Type: image/png");
if(substr_count($url, "/") !== 2){
if(
preg_match(
'/^https?:\/\/[A-Za-z0-9.-]+$/',
$url
) === 0
){
header("X-Error: Only provide the protocol and domain");
$this->defaulticon();

100
lib/anubis.php Normal file
View File

@@ -0,0 +1,100 @@
<?php
//
// Reference
// https://github.com/TecharoHQ/anubis/blob/ecc716940e34ebe7249974f2789a99a2c7115e4e/web/js/proof-of-work.mjs
//
class anubis{
public function __construct(){
include_once "fuckhtml.php";
$this->fuckhtml = new fuckhtml();
}
public function scrape($html){
$this->fuckhtml->load($html);
$script =
$this->fuckhtml
->getElementById(
"anubis_challenge",
"script"
);
if($script === false){
throw new Exception("Failed to scrape anubis challenge data");
}
$script =
json_decode(
$this->fuckhtml
->getTextContent(
$script
),
true
);
if($script === null){
throw new Exception("Failed to decode anubis challenge data");
}
if(
!isset($script["challenge"]) ||
!isset($script["rules"]["difficulty"]) ||
!is_int($script["rules"]["difficulty"]) ||
!is_string($script["challenge"])
){
throw new Exception("Found invalid challenge data");
}
return $this->rape($script["challenge"], $script["rules"]["difficulty"]);
}
private function is_valid_hash($hash, $difficulty){
for ($i=0; $i<$difficulty; $i++) {
$index = (int)floor($i / 2);
$nibble = $i % 2;
$byte = ord($hash[$index]);
$nibble = ($byte >> ($nibble === 0 ? 4 : 0)) & 0x0f;
if($nibble !== 0){
return false;
}
}
return true;
}
public function rape($data, $difficulty = 5){
$nonce = 0;
while(true){
$hash_binary = hash("sha256", $data . $nonce, true);
if($this->is_valid_hash($hash_binary, $difficulty)){
$hash_hex = bin2hex($hash_binary);
return [
"response" => $hash_hex,
//"data" => $data,
//"difficulty" => $difficulty,
"nonce" => $nonce
];
}
$nonce++;
}
}
}

View File

@@ -75,6 +75,7 @@ class backend{
break;
case "socks5_hostname":
case "socks5h":
case "socks5a":
curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5_HOSTNAME);
curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port);

View File

@@ -403,27 +403,28 @@ class frontend{
$text =
trim(
preg_replace(
'/<\/span>$/',
"", // remove stray ending span because of the <?php stuff
'/<code [^>]+>/',
"",
str_replace(
[
'<br />',
'&nbsp;'
"<br />",
"&nbsp;",
"<pre>",
"</pre>",
"</code>"
],
[
"\n", // replace <br> with newlines
" " // replace html entity to space
],
str_replace(
[
// leading <?php garbage
"<span style=\"color: c-default\">\n&lt;?php&nbsp;",
"<code>",
"</code>"
],
"\n",
" ",
"",
highlight_string("<?php " . $text, true)
)
"",
""
],
explode(
"&lt;?php",
highlight_string("<?php " . $text, true),
2
)[1]
)
)
);
@@ -838,10 +839,10 @@ class frontend{
}
$payload .=
'<a href="https://webcache.googleusercontent.com/search?q=cache:' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://google.com" alt="go">Google cache</a>' .
'<a href="https://web.archive.org/web/' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://archive.org" alt="ar">Archive.org</a>' .
'<a href="https://archive.ph/newest/' . htmlspecialchars($link) . '" class="list" target="_BLANK"><img src="/favicon?s=https://archive.is" alt="ar">Archive.is</a>' .
'<a href="https://ghostarchive.org/search?term=' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://ghostarchive.org" alt="gh">Ghostarchive</a>' .
'<a href="https://arquivo.pt/wayback/' . htmlspecialchars($link) . '" class="list" target="_BLANK"><img src="/favicon?s=https://arquivo.pt" alt="ar">Arquivo.pt</a>' .
'<a href="https://www.bing.com/search?q=url%3A' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://bing.com" alt="bi">Bing cache</a>' .
'<a href="https://megalodon.jp/?url=' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://megalodon.jp" alt="me">Megalodon</a>' .
'</div>';
@@ -939,6 +940,8 @@ class frontend{
"brave" => "Brave",
"yandex" => "Yandex",
"google" => "Google",
//"google_api" => "Google API",
"google_cse" => "Google CSE",
"startpage" => "Startpage",
"qwant" => "Qwant",
"ghostery" => "Ghostery",
@@ -947,7 +950,9 @@ class frontend{
"crowdview" => "Crowdview",
"mwmbl" => "Mwmbl",
"mojeek" => "Mojeek",
"solofield" => "Solofield",
"baidu" => "Baidu",
"coccoc" => "Cốc Cốc",
//"solofield" => "Solofield",
"marginalia" => "Marginalia",
"wiby" => "wiby",
"curlie" => "Curlie"
@@ -963,13 +968,20 @@ class frontend{
"yandex" => "Yandex",
"brave" => "Brave",
"google" => "Google",
"google_cse" => "Google CSE",
"startpage" => "Startpage",
"qwant" => "Qwant",
"yep" => "Yep",
"solofield" => "Solofield",
//"pinterest" => "Pinterest",
"baidu" => "Baidu",
//"solofield" => "Solofield",
"pinterest" => "Pinterest",
"cara" => "Cara",
"flickr" => "Flickr",
"fivehpx" => "500px",
"vsco" => "VSCO",
"imgur" => "Imgur",
"ftm" => "FindThatMeme"
"ftm" => "FindThatMeme",
//"sankakucomplex" => "SankakuComplex"
]
];
break;
@@ -979,6 +991,9 @@ class frontend{
"display" => "Scraper",
"option" => [
"yt" => "YouTube",
"vimeo" => "Vimeo",
//"odysee" => "Odysee",
"sepiasearch" => "Sepia Search",
//"fb" => "Facebook videos",
"ddg" => "DuckDuckGo",
"brave" => "Brave",
@@ -986,7 +1001,9 @@ class frontend{
"google" => "Google",
"startpage" => "Startpage",
"qwant" => "Qwant",
"solofield" => "Solofield"
"baidu" => "Baidu",
"coccoc" => "Cốc Cốc"
//"solofield" => "Solofield"
]
];
break;
@@ -1001,7 +1018,8 @@ class frontend{
"startpage" => "Startpage",
"qwant" => "Qwant",
"yep" => "Yep",
"mojeek" => "Mojeek"
"mojeek" => "Mojeek",
"baidu" => "Baidu"
]
];
break;

View File

@@ -240,12 +240,13 @@ class fuckhtml{
public function getElementsByFuzzyAttributeValue(string $name, string $value, $collection = null){
$elems = $this->getElementsByAttributeName($name, $collection);
$value =
explode(
" ",
trim(
preg_replace(
'/ +/',
'/\s+/',
" ",
$value
)
@@ -258,7 +259,18 @@ class fuckhtml{
foreach($elem["attributes"] as $attrib_name => $attrib_value){
$attrib_value = explode(" ", $attrib_value);
$attrib_value =
explode(
" ",
trim(
preg_replace(
'/\s+/',
" ",
$attrib_value
)
)
);
$ac = count($attrib_value);
$nc = count($value);
$cr = 0;
@@ -381,6 +393,8 @@ class fuckhtml{
$json_out = null;
$last_char = null;
$keyword_check = null;
for($i=0; $i<strlen($json); $i++){
switch($json[$i]){
@@ -396,6 +410,7 @@ class fuckhtml{
$bracket = false;
$is_close_bracket = true;
}else{
if($bracket === false){
@@ -429,6 +444,31 @@ class fuckhtml{
$is_close_bracket === false
){
// do keyword check
$keyword_check .= $json[$i];
if(in_array($json[$i], [":", "{"])){
$keyword_check = substr($keyword_check, 0, -1);
if(
preg_match(
'/function|array|return/i',
$keyword_check
)
){
$json_out =
preg_replace(
'/[{"]*' . preg_quote($keyword_check, "/") . '$/',
"",
$json_out
);
}
$keyword_check = null;
}
// here we know we're not iterating over a quoted string
switch($json[$i]){
@@ -498,4 +538,85 @@ class fuckhtml{
$string
);
}
public function extract_json($json){
$len = strlen($json);
$array_level = 0;
$object_level = 0;
$in_quote = null;
$start = null;
for($i=0; $i<$len; $i++){
switch($json[$i]){
case "[":
if($in_quote === null){
$array_level++;
if($start === null){
$start = $i;
}
}
break;
case "]":
if($in_quote === null){
$array_level--;
}
break;
case "{":
if($in_quote === null){
$object_level++;
if($start === null){
$start = $i;
}
}
break;
case "}":
if($in_quote === null){
$object_level--;
}
break;
case "\"":
case "'":
if(
$i !== 0 &&
$json[$i - 1] !== "\\"
){
// found a non-escaped quote
if($in_quote === null){
// open quote
$in_quote = $json[$i];
}elseif($in_quote === $json[$i]){
// close quote
$in_quote = null;
}
}
break;
}
if(
$start !== null &&
$array_level === 0 &&
$object_level === 0
){
return substr($json, $start, $i - $start + 1);
break;
}
}
}
}

View File

@@ -34,22 +34,46 @@ try{
)
){
if(
!isset($image["query"]) ||
!isset($image["path"]) ||
$image["path"] != "/th"
){
if(!isset($image["path"])){
header("X-Error: Invalid bing image path");
header("X-Error: Missing bing image path");
$proxy->do404();
die();
}
parse_str($image["query"], $str);
if(!isset($str["id"])){
//
// get image ID
// formations:
// https://tse2.mm.bing.net/th/id/OIP.3yLBkUPn8EXA1wlhWP2BHwHaE3
// https://tse2.mm.bing.net/th?id=OIP.3yLBkUPn8EXA1wlhWP2BHwHaE3
//
$id = null;
if(isset($image["query"])){
header("X-Error: Missing bing ID");
parse_str($image["query"], $str);
if(isset($str["id"])){
$id = $str["id"];
}
}
if($id === null){
$id = explode("/th/id/", $image["path"], 2);
if(count($id) !== 2){
// malformed
return $url;
}
$id = $id[1];
}
if(is_array($id)){
header("X-Error: Missing bing id parameter");
$proxy->do404();
die();
}
@@ -63,7 +87,7 @@ try{
case "cover": $req = "&w=207&h=270&p=0&qlt=90"; break;
}
$proxy->stream_linear_image("https://" . $image["host"] . "/th?id=" . urlencode($str["id"]) . $req, "https://www.bing.com");
$proxy->stream_linear_image("https://" . $image["host"] . "/th?id=" . rawurlencode($id) . $req, "https://www.bing.com");
die();
}

2229
scraper/baidu.php Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -210,6 +210,63 @@ class brave{
return $data;
}
private function get_js(){
$script_disc =
$this->fuckhtml
->getElementsByTagName(
"script"
);
$data = null;
foreach($script_disc as &$discs){
if(
preg_match(
'/kit\.start\(/',
$discs["innerHTML"]
)
){
$data =
explode(
"data:",
$discs["innerHTML"],
2
);
if(count($data) !== 2){
throw new Exception("Failed to split up data field");
}
$data = $data[1];
break;
}
}
if($data === null){
throw new Exception("Could not grep JavaScript object");
}
$data =
$this->fuckhtml
->parseJsObject(
$this->fuckhtml
->extract_json(
$data
)
);
if($data === null){
throw new Exception("Failed to decode JavaScript object");
}
return $data;
}
public function web($get){
if($get["npt"]){
@@ -293,8 +350,8 @@ class brave{
/*
$handle = fopen("scraper/brave.html", "r");
$html = fread($handle, filesize("scraper/brave.html"));
fclose($handle);
*/
fclose($handle);*/
try{
$html =
@@ -346,7 +403,7 @@ class brave{
$nextpage =
$this->fuckhtml
->getElementsByClassName("btn", "a");
->getElementsByClassName("button", "a");
if(count($nextpage) !== 0){
@@ -382,45 +439,9 @@ class brave{
}
}
// do some magic
$this->fuckhtml->load($html);
$script_disc =
$this->fuckhtml
->getElementsByTagName(
"script"
);
$grep = [];
foreach($script_disc as $discs){
preg_match(
'/const data ?= ?(\[{.*}]);/',
$discs["innerHTML"],
$grep
);
if(isset($grep[1])){
break;
}
}
if(!isset($grep[1])){
throw new Exception("Could not grep JavaScript object");
}
$data =
$this->fuckhtml
->parseJsObject(
$grep[1]
);
unset($grep);
if($data === null){
throw new Exception("Failed to decode JavaScript object");
}
$data = $this->get_js();
if(
isset($data[2]["data"]["title"]) &&
@@ -663,7 +684,10 @@ class brave{
$table["Address"] = $result["location"]["postal_address"]["displayAddress"];
}
if(isset($result["location"]["rating"])){
if(
isset($result["location"]["rating"]) &&
$result["location"]["rating"] != "void 0"
){
$table["Rating"] =
$result["location"]["rating"]["ratingValue"] . "/" .
@@ -671,13 +695,19 @@ class brave{
number_format($result["location"]["rating"]["reviewCount"]) . " votes)";
}
if(isset($result["location"]["contact"]["telephone"])){
if(
isset($result["location"]["contact"]["telephone"]) &&
$result["location"]["contact"]["telephone"] != "void 0"
){
$table["Phone number"] =
$result["location"]["contact"]["telephone"];
}
if(isset($result["location"]["price_range"])){
if(
isset($result["location"]["price_range"]) &&
$result["location"]["price_range"] != "void 0"
){
$table["Price"] =
$result["location"]["price_range"];
@@ -1160,23 +1190,8 @@ class brave{
$proxy
);
preg_match(
'/const data ?= ?(\[{.*}]);/',
$html,
$json
);
if(!isset($json[1])){
throw new Exception("Failed to grep javascript object");
}
$json = $this->fuckhtml->parseJsObject($json[1], true);
if($json === null){
throw new Exception("Failed to parse javascript object");
}
$this->fuckhtml->load($html);
$json = $this->get_js();
foreach(
$json[1]["data"]["body"]["response"]["news"]["results"]
@@ -1258,22 +1273,8 @@ class brave{
$html = fread($handle, filesize("scraper/brave-image.html"));
fclose($handle);*/
preg_match(
'/const data = (\[{.*}\]);/',
$html,
$json
);
if(!isset($json[1])){
throw new Exception("Failed to get data object");
}
$json =
$this->fuckhtml
->parseJsObject(
$json[1]
);
$this->fuckhtml->load($html);
$json = $this->get_js();
foreach(
$json[1]
@@ -1403,22 +1404,8 @@ class brave{
$html = fread($handle, filesize("scraper/brave-video.html"));
fclose($handle);*/
preg_match(
'/const data = (\[{.*}\]);/',
$html,
$json
);
if(!isset($json[1])){
throw new Exception("Failed to get data object");
}
$json =
$this->fuckhtml
->parseJsObject(
$json[1]
);
$this->fuckhtml->load($html);
$json = $this->get_js();
foreach(
$json
@@ -1790,42 +1777,57 @@ class brave{
$nextpage =
$this->fuckhtml
->getElementsByClassName("btn", "a");
->getElementById(
"pagination",
"div"
);
if(count($nextpage) !== 0){
if($nextpage){
$this->fuckhtml->load($nextpage);
$nextpage =
$nextpage[count($nextpage) - 1];
if(
strtolower(
$this->fuckhtml
->getTextContent(
$nextpage
)
) == "next"
){
preg_match(
'/offset=([0-9]+)/',
$this->fuckhtml->getTextContent($nextpage["attributes"]["href"]),
$nextpage
$this->fuckhtml
->getElementsByClassName(
"button",
"a"
);
if(count($nextpage) !== 0){
return
$this->backend->store(
json_encode(
[
"q" => $q,
"offset" => (int)$nextpage[1],
"nsfw" => $nsfw,
"country" => $country,
"spellcheck" => $spellcheck
]
),
$page,
$proxy
$nextpage =
$nextpage[count($nextpage) - 1];
if(
strtolower(
$this->fuckhtml
->getTextContent(
$nextpage
)
) == "next"
){
preg_match(
'/offset=([0-9]+)/',
$this->fuckhtml->getTextContent($nextpage["attributes"]["href"]),
$nextpage
);
return
$this->backend->store(
json_encode(
[
"q" => $q,
"offset" => (int)$nextpage[1],
"nsfw" => $nsfw,
"country" => $country,
"spellcheck" => $spellcheck
]
),
$page,
$proxy
);
}
}
}

847
scraper/cara.php Normal file
View File

@@ -0,0 +1,847 @@
<?php
class cara{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("cara");
}
public function getfilters($page){
return [
"sort" => [
"display" => "Sort by",
"option" => [
"Top" => "Top",
"MostRecent" => "Most Recent"
]
],
"type" => [
"display" => "Post type",
"option" => [
"any" => "Any type",
"portfolio" => "Portfolio", // {"posts":["portfolio"]}
"timeline" => "Timeline" // {"posts":["timeline"]}
]
],
"fields" => [
"display" => "Field/Medium",
"option" => [
"any" => "Any field",
"2D" => "2D Work",
"3D" => "3D Work",
"3DPrinting" => "3D Printing",
"Acrylic" => "Acrylic",
"AlcoholMarkers" => "Alcohol Markers",
"Animation" => "Animation",
"Chalk" => "Chalk",
"Charcoal" => "Charcoal",
"Colored pencil" => "Colored pencil",
"Conte" => "Conte",
"Crayon" => "Crayon",
"Digital" => "Digital",
"Gouache" => "Gouache",
"Ink" => "Ink",
"MixedMedia" => "Mixed-Media",
"Oil" => "Oil",
"Oil-based Markers" => "Oil-based Markers",
"Other" => "Other",
"Pastels" => "Pastels",
"Photography" => "Photography",
"Sculpture" => "Sculpture",
"Sketches" => "Sketches",
"Tattoos" => "Tattoos",
"Traditional" => "Traditional",
"VFX" => "VFX",
"Watercolor" => "Watercolor"
]
],
"category" => [
"display" => "Category",
"option" => [
"any" => "Any category",
"3DScanning" => "3D Scanning",
"Abstract" => "Abstract",
"Adoptable" => "Adoptable",
"Anatomy" => "Anatomy",
"Animals" => "Animals",
"Anime" => "Anime",
"App" => "App",
"ArchitecturalConcepts" => "Architectural Concepts",
"ArchitecturalVisualization" => "Architectural Visualization",
"AugmentedReality" => "Augmented Reality",
"Automotive" => "Automotive",
"BoardGameArt" => "Board Game Art",
"BookIllustration" => "Book Illustration",
"CardGameArt" => "Card Game Art",
"CeramicsPottery" => "Ceramics/Pottery",
"CharacterAnimation" => "Character Animation",
"CharacterDesign" => "Character Design",
"CharacterModeling" => "Character Modeling",
"ChildrensArt" => "Children's Illustration",
"Collectibles" => "Collectibles",
"ColoringPage" => "Coloring Page",
"ComicArt" => "Comic Art",
"ConceptArt" => "Concept Art",
"Cosplay" => "Cosplay",
"CostumeDesign" => "Costume Design",
"CoverArt" => "Cover Art",
"Creatures" => "Creatures",
"Diorama" => "Diorama",
"EditorialIllustration" => "Editorial Illustration",
"EmbroiderySewing" => "Embroidery/Sewing",
"EnvironmentalConceptArt" => "Environmental Concept Art",
"EnvironmentalConceptDesign" => "Environmental Concept Design",
"FanArt" => "Fan Art",
"Fantasy" => "Fantasy",
"Fashion" => "Fashion",
"FashionStyling" => "Fashion Styling",
"FiberArts" => "Fiber Arts",
"Furry" => "Furry",
"GameArt" => "Game Art",
"GameplayDesign" => "Gameplay Design",
"GamesEnvironmentArt" => "Games Environment Art",
"Gem" => "Gem",
"GraphicDesign" => "Graphic Design",
"Handicraft" => "Handicraft",
"HairStyling" => "Hair Styling",
"HardSurface" => "Hard Surface",
"Horror" => "Horror",
"Illustration" => "Illustration",
"IllustrationVisualization" => "Illustration Visualization",
"IndustrialDesign" => "Industrial Design",
"Jewelry" => "Jewelry",
"KnittingCrochet" => "Knitting/Crochet",
"Landscape" => "Landscape",
"LevelDesign" => "Level Design",
"Lighting" => "Lighting",
"Makeup" => "Makeup",
"Manga" => "Manga",
"MapsCartography" => "Maps/Cartography",
"MattePainting" => "Matte Painting",
"Materials" => "Materials",
"MechanicalDesign" => "Mechanical Design",
"Medical" => "Medical",
"Mecha" => "Mecha",
"MiniatureArt" => "Miniature Art",
"MotionGraphics" => "Motion Graphics",
"FrescoMurals" => "Fresco/Murals",
"Natural" => "Natural",
"Original Character" => "Original Character",
"Overlay" => "Overlay",
"PleinAir" => "Plein Air",
"Photogrammetry" => "Photogrammetry",
"PixelArt" => "Pixel Art",
"Portraits" => "Portraits",
"Props" => "Props",
"ProductDesign" => "Product Design",
"PublicDomain" => "Public Domain or Royalty Free",
"Real-Time3DEnvironmentArt" => "Real-Time 3D Environment Art",
"Realism" => "Realism",
"ScienceFiction" => "Science Fiction",
"ScientificVisualization" => "Scientific Visualization",
"Scripts" => "Scripts",
"StillLife" => "Still Life",
"Storyboards" => "Storyboards",
"Stylized" => "Stylized",
"Surreal" => "Surreal",
"TechnicalArt" => "Technical Art",
"Textures" => "Textures",
"Tools" => "Tools",
"Toys" => "Toys",
"ToyPackaging" => "Toy Packaging",
"Tutorials" => "Tutorials",
"UIArt" => "User Interface (UI) Art",
"UrbanSketch" => "Urban Sketch",
"VFXforAnimation" => "VFX for Animation",
"VFXforFilm" => "VFX for Film",
"VFXforGames" => "VFX for Games",
"VFXforRealTime" => "VFX for Real-Time",
"VFXforTV" => "VFX for TV",
"Vehicles" => "Vehicles",
"VirtualReality" => "Virtual Reality",
"VisualDevelopment" => "Visual Development",
"VoxelArt" => "Voxel Art",
"Vtubers" => "Vtubers",
"WIP" => "WIP (Work in Progress)",
"Web" => "Web",
"Weapons" => "Weapons",
"Wildlife" => "Wildlife",
"Woodcutting" => "Woodcutting"
]
],
"software" => [
"display" => "Software",
"option" => [
"any" => "Any software",
"123D" => "123D",
"123DCatch" => "123D Catch",
"3DBee" => "3DBee",
"3DCoat" => "3DCoat",
"3DCoatPrint" => "3DCoatPrint",
"3DCoatTextura" => "3DCoatTextura",
"3DEqualizer" => "3DEqualizer",
"3DFZephyr" => "3DF Zephyr",
"3Delight" => "3Delight",
"3dpeople" => "3dpeople",
"3dsMax" => "3ds Max",
"3DSPaint" => "3DS Paint",
"ACDSeeCanvas" => "ACDSee Canvas",
"AbletonLive" => "Ableton Live",
"Acrobat" => "Acrobat",
"AdobeDraw" => "Adobe Draw",
"AdobeFlash" => "Adobe Flash",
"AdobeFresco" => "Adobe Fresco",
"AdobeSubstance3Dassets" => "Adobe Substance 3D assets",
"AdobeXD" => "Adobe XD",
"AffinityDesigner" => "Affinity Designer",
"AffinityPhoto" => "Affinity Photo",
"AfterEffects" => "After Effects",
"Akeytsu" => "Akeytsu",
"Alchemy" => "Alchemy",
"AliasDesign" => "Alias Design",
"AlightMotion" => "Alight Motion",
"Amadine" => "Amadine",
"Amberlight" => "Amberlight",
"Animate" => "Animate",
"AnimationMaster" => "Animation:Master",
"AnimeStudio" => "Anime Studio",
"Apophysis" => "Apophysis",
"ArchiCAD" => "ArchiCAD",
"Arion" => "Arion",
"ArionFX" => "ArionFX",
"Arnold" => "Arnold",
"ArtEngine" => "ArtEngine",
"ArtFlow" => "ArtFlow",
"ArtRage" => "ArtRage",
"ArtstudioPro" => "Artstudio Pro",
"Artweaver" => "Artweaver",
"Aseprite" => "Aseprite",
"Audition" => "Audition",
"AutoCAD" => "AutoCAD",
"AutodeskSketchBook" => "Autodesk SketchBook",
"AvidMediaComposer" => "Avid Media Composer",
"AzPainter" => "AzPainter",
"babylonjs" => "babylon.js",
"BalsamiqMockup" => "Balsamiq Mockup",
"Bforartists" => "Bforartists",
"BlackInk" => "Black Ink",
"BlackmagicDesignFusion" => "Blackmagic Design Fusion",
"Blender" => "Blender",
"Blender DeepPaint" => "Blender DeepPaint",
"BlenderGreasePencil" => "Blender Grease Pencil",
"Blockbench" => "Blockbench",
"BodyPaint" => "BodyPaint",
"Boxcutter" => "Boxcutter",
"BraidMaker" => "Braid Maker",
"BrickLinkStudio" => "BrickLink Studio",
"Bridge" => "Bridge",
"Brushifyio" => "Brushify.io",
"C" => "C",
"C#" => "C#",
"C++" => "C++",
"CACANi" => "CACANi",
"CLIPSTUDIOPAINT" => "CLIP STUDIO PAINT",
"CLO" => "CLO",
"CRYENGINE" => "CRYENGINE",
"Callipeg" => "Callipeg",
"Canva" => "Canva",
"CaptureOne" => "Capture One",
"CartoonAnimator" => "Cartoon Animator",
"Carveco" => "Carveco",
"Cavalry" => "Cavalry",
"Chaotica" => "Chaotica",
"CharacterAnimator" => "Character Animator",
"CharacterCreator" => "Character Creator",
"Cinema4D" => "Cinema 4D",
"ClarisseiFX" => "Clarisse iFX",
"Coiffure" => "Coiffure",
"ColorsLive" => "Colors Live",
"Combustion" => "Combustion",
"Construct2" => "Construct 2",
"Core" => "Core",
"CorelPainter" => "Corel Painter",
"CorelDRAWGraphicsSuite" => "CorelDRAW Graphics Suite",
"CoronaRenderer" => "Corona Renderer",
"ProMotionNG" => "Cosmigo Pro Motion NG",
"CrazyBump" => "CrazyBump",
"Crocotile3D" => "Crocotile 3D",
"Curvy3D" => "Curvy 3D",
"Cycles4D" => "Cycles 4D",
"Darkroom" => "Darkroom",
"DAZStudio" => "DAZ Studio",
"DDO" => "DDO",
"DECIMA" => "DECIMA",
"Darktable" => "Darktable",
"DaVinciResolve" => "DaVinci Resolve",
"Dimension" => "Dimension",
"DragonBones" => "DragonBones",
"Dragonframe" => "Dragonframe",
"Drawpile" => "Drawpile",
"Dreams" => "Dreams",
"Dreamweaver" => "Dreamweaver",
"DxOPhotoLab" => "DxO PhotoLab",
"ECycles" => "E-Cycles",
"EmberGen" => "EmberGen",
"Encore" => "Encore",
"Expresii" => "Expresii",
"FStorm" => "FStorm",
"FadeIn" => "FadeIn",
"Feather3D" => "Feather 3D",
"FiberShop" => "FiberShop",
"Figma" => "Figma",
"FilmoraWondershare" => "Filmora Wondershare",
"FilterForge" => "Filter Forge",
"FinalCutPro" => "Final Cut Pro",
"FinalDraft" => "Final Draft",
"finalRender" => "finalRender",
"FireAlpaca" => "FireAlpaca",
"Fireworks" => "Fireworks",
"FlamePainter" => "Flame Painter",
"Flash" => "Flash",
"FlipaClip" => "FlipaClip",
"FlipnoteStudio" => "Flipnote Studio",
"Fluent" => "Fluent",
"ForestPack" => "Forest Pack",
"FormZ" => "Form-Z",
"Fractorium" => "Fractorium",
"FreeCAD" => "FreeCAD",
"FreeHand" => "FreeHand",
"Forger" => "Forger",
"FrostbiteEngine" => "Frostbite Engine",
"fSpy" => "fSpy",
"FumeFX" => "FumeFX",
"Fusion360" => "Fusion 360",
"GIMP" => "GIMP",
"GSCurveTools" => "GS CurveTools",
"GSToolbox" => "GS Toolbox",
"Gaea" => "Gaea",
"GameTextures" => "Game Textures",
"GameMakerStudio" => "GameMaker: Studio",
"GarageFarmNET" => "GarageFarm.NET",
"GeoGlyph" => "GeoGlyph",
"GigapixelAl" => "Gigapixel Al",
"Glaxnimate" => "Glaxnimate",
"GnomePaint" => "Gnome Paint",
"Godot" => "Godot",
"Goxel" => "Goxel",
"Graphite" => "Graphite",
"Graswald" => "Graswald",
"GravitySketch" => "Gravity Sketch",
"GuerillaRender" => "GuerillaRender",
"HDRLightStudio" => "HDR Light Studio",
"HairStrandDesigner" => "Hair Strand Designer",
"HairTGHairFur" => "HairTG - Hair &amp; Fur",
"HairTGSurfaceFeatherEdition" => "HairTG - Surface, Feather Edition",
"HairTGSurfaceHairEdition" => "HairTG - Surface, Hair Edition",
"Handplane" => "Handplane",
"Hansoft" => "Hansoft",
"HardOps" => "Hard Ops",
"HardMesh" => "HardMesh",
"Harmony" => "Harmony",
"HeavypaintWebbypaint" => "Heavypaint/Webbypaint",
"HelloPaint" => "HelloPaint",
"HeliconFocus" => "Helicon Focus",
"Hexels" => "Hexels",
"HiPaint" => "HiPaint",
"Houdini" => "Houdini",
"HydraRenderer" => "Hydra Renderer",
"iArtbook" => "iArtbook",
"IbisPaint" => "ibisPaint",
"Ideas" => "Ideas",
"IllustStudio" => "Illust Studio",
"Illustrator" => "Illustrator",
"IllustratorDraw" => "Illustrator Draw",
"InDesign" => "InDesign",
"Inochi2D" => "Inochi2D",
"InVision" => "InVision",
"InVisionCraft" => "InVision Craft",
"InfinitePainter" => "Infinite Painter",
"Inkscape" => "Inkscape",
"Inspirit" => "Inspirit",
"InstaLOD" => "InstaLOD",
"InstaMAT" => "InstaMAT",
"InstantLightRealtimePBR" => "Instant Light Realtime PBR",
"InstantMeshes" => "Instant Meshes",
"InstantTerra" => "Instant Terra",
"Inventor" => "Inventor",
"Iray" => "Iray",
"JWildfire" => "JWildfire",
"Java" => "Java",
"Jira" => "Jira",
"JumpPaint" => "Jump Paint by MediBang",
"JSPaint" => "JS Paint",
"Katana" => "Katana",
"Keyshot" => "Keyshot",
"KidPix" => "Kid Pix",
"KitBash3D" => "KitBash3D",
"Knald" => "Knald",
"Kodon" => "Kodon",
"KolourPaint" => "KolourPaint",
"Krakatoa" => "Krakatoa",
"KRESKA" => "KRESKA",
"Krita" => "Krita",
"LensStudio" => "Lens Studio",
"LibreSprite" => "LibreSprite",
"LightWave3D" => "LightWave 3D",
"Lightroom" => "Lightroom",
"Linearity" => "Linearity",
"LiquiGen" => "LiquiGen",
"Live2DCubism" => "Live2D Cubism",
"LookatmyHair" => "Look at my Hair",
"Lotpixel" => "Lotpixel",
"Lumion" => "Lumion",
"LuxRender" => "LuxRender",
"MacPaint" => "MacPaint",
"MagicaCSG" => "MagicaCSG",
"MagicaVoxel" => "MagicaVoxel",
"Magma" => "Magma",
"MakeHuman" => "MakeHuman",
"Malmal" => "Malmal",
"Mandelbulb3D" => "Mandelbulb 3D",
"Mandelbulber" => "Mandelbulber",
"MangaStudio" => "Manga Studio",
"Mari" => "Mari",
"MarmosetToolbag" => "Marmoset Toolbag",
"MarvelousDesigner" => "Marvelous Designer",
"MasterpieceStudioPro" => "Masterpiece Studio Pro",
"MasterpieceVR" => "MasterpieceVR",
"Maverick" => "Maverick",
"MaxwellRender" => "Maxwell Render",
"Maya" => "Maya",
"MediBangPaint" => "MediBang Paint",
"MediumbyAdobe" => "Medium by Adobe",
"Megascans" => "Megascans",
"mentalray" => "mental ray",
"MeshLab" => "MeshLab",
"Meshroom" => "Meshroom",
"MetaHumanCreator" => "MetaHuman Creator",
"Metashape" => "Metashape",
"MightyBake" => "MightyBake",
"MikuMikuDance" => "MikuMikuDance",
"Minecraft" => "Minecraft",
"Mischief" => "Mischief",
"Mixamo" => "Mixamo",
"Mixer" => "Mixer",
"MoI3D" => "MoI3D",
"Mocha" => "Mocha",
"Modo" => "Modo",
"Moho" => "Moho",
"MotionBuilder" => "MotionBuilder",
"Mudbox" => "Mudbox",
"Muse" => "Muse",
"MSPaint" => "MS Paint",
"MyPaint" => "MyPaint",
"NDO" => "NDO",
"NX" => "NX",
"NdotCAD" => "NdotCAD",
"NintendoNotes" => "Nintendo Notes",
"NomadSculpt" => "Nomad Sculpt",
"Notability" => "Notability",
"Nuke" => "Nuke",
"Nvil" => "Nvil",
"OctaneRender" => "Octane Render",
"Omniverse" => "Omniverse",
"OmniverseCreate" => "Omniverse Create",
"ON1PhotoRAW" => "ON1 Photo RAW",
"Open3DEngine" => "Open 3D Engine",
"OpenCanvas" => "OpenCanvas",
"OpenGL" => "OpenGL",
"OpenToonz" => "OpenToonz",
"Ornatrix" => "Ornatrix",
"OsciRender" => "Osci-Render",
"OurPaint" => "Our Paint",
"PBRMAX" => "PBRMAX",
"PFTrack" => "PFTrack",
"PTGui" => "PTGui",
"Paintbrush" => "Paintbrush",
"PaintNET" => "Paint.NET",
"PaintShopPro" => "PaintShop Pro",
"PaintToolSAI" => "Paint Tool SAI",
"PaintstormStudio" => "Paintstorm Studio",
"Paper" => "Paper",
"Pencil2D" => "Pencil2D",
"Penpot" => "Penpot",
"PhoenixFD" => "Phoenix FD",
"Phonto" => "Phonto",
"PhotoLab2" => "PhotoLab 2",
"Photopea" => "Photopea",
"Photoscan" => "Photoscan",
"Photoshop" => "Photoshop",
"PhotoshopElements" => "Photoshop Elements",
"PicoCAD" => "picoCAD",
"PicoCAD2" => "picoCAD 2",
"Pinta" => "Pinta",
"Piskel" => "Piskel",
"Pixilart" => "Pixilart",
"Pixelitor" => "Pixelitor",
"Pixelmator" => "Pixelmator",
"Pixelorama" => "Pixelorama",
"PixivSketch" => "pixiv Sketch",
"Pixquare" => "Pixquare",
"PlantCatalog" => "PlantCatalog",
"PlantFactory" => "PlantFactory",
"Plasticity" => "Plasticity",
"PNGtuberPlus" => "PNGtuber Plus",
"Poliigon" => "Poliigon",
"Polybrush" => "Polybrush",
"PopcornFx" => "PopcornFx",
"Poser" => "Poser",
"Premiere" => "Premiere",
"PremiereElements" => "Premiere Elements",
"PresagisCreator" => "Presagis Creator",
"ProTools" => "Pro Tools",
"Procreate" => "Procreate",
"ProcreateDreams" => "Procreate Dreams",
"Producer" => "Producer",
"PrometheanAI" => "Promethean AI",
"PureRef" => "PureRef",
"Python" => "Python",
"PyxelEdit" => "PyxelEdit",
"QuadRemesher" => "Quad Remesher",
"QuarkXPress" => "QuarkXPress",
"Qubicle" => "Qubicle",
"Quill" => "Quill",
"QuixelBridge" => "Quixel Bridge",
"QuixelMegascans" => "Quixel Megascans",
"QuixelMixer" => "Quixel Mixer",
"QuixelSuite" => "Quixel Suite",
"R3DSWrap" => "R3DS Wrap",
"R3DSZWRAP" => "R3DS ZWRAP",
"RDTextures" => "RD-Textures",
"RailClone" => "RailClone",
"RealFlow" => "RealFlow",
"RealisticPaintStudio" => "Realistic Paint Studio",
"RealityCapture" => "RealityCapture",
"RealityScan" => "RealityScan",
"RealtimeBoard" => "Realtime Board",
"Rebelle" => "Rebelle",
"Redshift" => "Redshift",
"RenderMan" => "RenderMan",
"RenderNetwork" => "Render Network",
"Revit" => "Revit",
"Rhino" => "Rhino",
"Rhinoceros" => "Rhinoceros",
"RizomUV" => "RizomUV",
"RoughAnimator" => "Rough Animator",
"SamsungNotes" => "Samsung Notes",
"SamsungPENUP" => "Samsung PENUP",
"ScansLibrary" => "ScansLibrary",
"Scrivener" => "Scrivener",
"Sculpt+" => "Sculpt+",
"Sculptris" => "Sculptris",
"ShaveandaHaircut" => "Shave and a Haircut",
"ShiVa3D" => "ShiVa3D",
"Shotgun" => "Shotgun",
"Silo" => "Silo",
"Silugen" => "Silugen",
"Sketch" => "Sketch",
"SketchApp" => "Sketch App",
"SketchBookPro" => "SketchBook Pro",
"SketchClub" => "SketchClub",
"SketchUp" => "SketchUp",
"Sketchable" => "Sketchable",
"Sketchfab" => "Sketchfab",
"Skyshop" => "Skyshop",
"Snapseed" => "Snapseed",
"Snowdrop" => "Snowdrop",
"Softimage" => "Softimage",
"SolidWorks" => "SolidWorks",
"SonySketch" => "Sony Sketch",
"Soundbooth" => "Soundbooth",
"Source2" => "Source 2",
"SourceControl" => "Source Control",
"SourceFilmmaker" => "Source Filmmaker",
"SpeedTree" => "SpeedTree",
"Speedgrade" => "Speedgrade",
"SpeedyPainter" => "SpeedyPainter",
"Spine2D" => "Spine 2D",
"Spriter" => "Spriter",
"Stingray" => "Stingray",
"Storyboarder" => "Storyboarder",
"StoryboardPro" => "Storyboard Pro",
"SublimeText" => "Sublime Text",
"Substance3DDesigner" => "Substance 3D Designer",
"Substance3DModeler" => "Substance 3D Modeler",
"Substance3DPainter" => "Substance 3D Painter",
"Substance3DSampler" => "Substance 3D Sampler",
"Substance3DStager" => "Substance 3D Stager",
"SubstanceB2M" => "Substance B2M",
"SweetHome3D" => "Sweet Home 3D",
"SynthEyes" => "SynthEyes",
"TTools" => "TTools",
"TVPaint" => "TVPaint",
"TVPaintAnimation" => "TVPaint Animation",
"TayasuiSketches" => "Tayasui Sketches",
"TayasuiSketchesMobileApp" => "Tayasui Sketches Mobile App",
"TayasuiSketchesPro" => "Tayasui Sketches Pro",
"Terragen" => "Terragen",
"Texturescom" => "Textures.com",
"Texturingxyz" => "Texturingxyz",
"TeyaConceptor" => "Teya Conceptor",
"TheGrove3D" => "The Grove 3D",
"TheaRender" => "Thea Render",
"Threejs" => "Three.js",
"Tiled" => "Tiled",
"TiltBrush" => "Tilt Brush",
"Tooll3" => "Tooll3",
"ToonBoomHarmony" => "Toon Boom Harmony",
"ToonBoomStudio" => "Toon Boom Studio",
"ToonSquid" => "ToonSquid",
"TopoGun" => "TopoGun",
"TuxPaint" => "Tux Paint",
"Tvori" => "Tvori",
"Twinmotion" => "Twinmotion",
"UNIGINEEngine" => "UNIGINE Engine",
"UVLayout" => "UVLayout",
"UltraFractal" => "Ultra Fractal",
"uMake" => "uMake",
"Unfold3D" => "Unfold 3D",
"Unity" => "Unity",
"UnrealEngine" => "Unreal Engine",
"Vengi" => "vengi",
"VRay" => "V-Ray",
"VRED" => "VRED",
"VTubeStudio" => "VTube Studio",
"Vectary" => "Vectary",
"VectorayGen" => "VectorayGen",
"Vectorworks" => "Vectorworks",
"VegasPro" => "Vegas Pro",
"VisualDesigner3D" => "Visual Designer 3D",
"VisualStudio" => "Visual Studio",
"VRoidStudio" => "VRoid Studio",
"Vue" => "Vue",
"Vuforia" => "Vuforia",
"WebGL" => "WebGL",
"WhiteboardFox" => "Whiteboard Fox",
"WickEditor" => "Wick Editor",
"Wings3D" => "Wings 3D",
"Word" => "Word",
"WorldCreator" => "World Creator",
"WorldMachine" => "World Machine",
"XParticles" => "X-Particles",
"Xfrog" => "Xfrog",
"Xgen" => "Xgen",
"xNormal" => "xNormal",
"xTex" => "xTex",
"XoliulShader" => "Xoliul Shader",
"Yafaray" => "Yafaray",
"Yeti" => "Yeti",
"ZBrush" => "ZBrush",
"ZBrushCore" => "ZBrushCore",
"ZenBrush" => "Zen Brush"
]
]
];
}
private function get($proxy, $url, $get = [], $search){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: application/json, text/plain, */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
//"sentry-trace: 72b0318a7141fe18cbacbd905572eddf-a60de161b66b1e6f-1
//"baggage: sentry-environment=vercel-production,sentry-release=251ff5179b4de94974f36d9b8659a487bbb8a819,sentry-public_key=2b87af2b44c84643a011838ad097735f,sentry-trace_id=72b0318a7141fe18cbacbd905572eddf,sentry-transaction=GET%20%2Fsearch,sentry-sampled=true,sentry-sample_rand=0.09967130764937493,sentry-sample_rate=0.5",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
//"Referer: https://cara.app/search?q=jak+and+daxter&type=&sortBy=Top&filters=%7B%7D",
"Referer: https://cara.app/search?q=" . urlencode($search),
//"Cookie: __Host-next-auth.csrf-token=b752c4296375bccb7b480ff010e1e916c65c35c311a4a57ac6cd871468730578%7C4d3783cfb72a98f390e534abd149806432b6cf8d50555a52d00e99216a516911; __Secure-next-auth.callback-url=https%3A%2F%2Fcara.app; crumb=BV0HDt87G5+fOWE0ZDQ5MWM0ZTQ3YTZmMzM4MGU5MGNjNDNmMzY2",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"TE: trailers"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function image($get){
if($get["npt"]){
[$npt, $proxy] =
$this->backend->get(
$get["npt"],
"images"
);
$npt = json_decode($npt, true);
}else{
$search = $get["s"];
if(strlen($search) === 0){
throw new Exception("Search term is empty!");
}
$proxy = $this->backend->get_ip();
$npt = [
"q" => $get["s"],
"sortBy" => $get["sort"],
"take" => 24,
"skip" => 0,
"filters" => []
];
// parse filters
if($get["type"] != "any"){
$npt["filters"]["posts"] = [$get["type"]];
}
if($get["fields"] != "any"){
$npt["filters"]["fields"] = [$get["fields"]];
}
if($get["category"] != "any"){
$npt["filters"]["categories"] = [$get["category"]];
}
if($get["software"] != "any"){
$npt["filters"]["softwares"] = [$get["software"]];
}
if($npt["filters"] == []){
$npt["filters"] = "{}";
}else{
$npt["filters"] = json_encode($npt["filters"]);
}
}
$out = [
"status" => "ok",
"npt" => null,
"image" => []
];
// https://cara.app/api/search/portfolio-posts?q=jak+and+daxter&sortBy=Top&take=24&skip=0&filters=%7B%7D
try{
$json =
$this->get(
$proxy,
"https://cara.app/api/search/posts",
$npt,
$npt["q"]
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to decode JSON");
}
$imagecount = 0;
foreach($json as $image){
if(count($image["images"]) === 0){
// sometimes the api returns no images for an object
$imagecount++;
continue;
}
$cover = null;
$sources = [];
foreach($image["images"] as $source){
if($source["isCoverImg"]){
$cover = [
"url" => "https://images.cara.app/" . $this->fix_url($source["src"]),
"width" => 500,
"height" => 500
];
}else{
$sources[] = [
"url" => "https://images.cara.app/" . $this->fix_url($source["src"]),
"width" => null,
"height" => null
];
}
}
if($cover !== null){
$sources[] = $cover;
}
$out["image"][] = [
"title" => str_replace("\n", " ", $image["content"]),
"source" => $sources,
"url" => "https://cara.app/post/" . $image["id"]
];
$imagecount++;
}
if($imagecount === 24){
$npt["skip"] += 24;
$out["npt"] =
$this->backend->store(
json_encode($npt),
"images",
$proxy
);
}
return $out;
}
private function fix_url($url){
return
str_replace(
[" "],
["%20"],
$url
);
}
}

672
scraper/coccoc.php Normal file
View File

@@ -0,0 +1,672 @@
<?php
class coccoc{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("coccoc");
include "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
}
private function get($proxy, $url, $get = []){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
// http2 bypass
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
curl_setopt($curlproc, CURLOPT_HTTPHEADER, [
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
//"Cookie: _contentAB_15040_vi=V-06_01; split_test_search=new_search; uid=L_bauXyZBY1B; vid=uCVQJQSTgb9QGT3o; ls=1753742684; serp_version=29223843/7621a70; savedS=direct",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: cross-site",
"Priority: u=0, i"
]);
$this->backend->assign_proxy($curlproc, $proxy);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function getfilters($pagetype){
return [
"nsfw" => [
"display" => "NSFW",
"option" => [
"yes" => "Yes", // nsfw by default????
"no" => "No" // &safe=1
]
],
"time" => [
"display" => "Time posted",
"option" => [
"any" => "Any time",
"1w" => "1 week ago",
"2w" => "2 weeks ago",
"1m" => "1 month ago",
"3m" => "3 months ago",
"6m" => "6 months ago",
"1Y" => "1 year ago"
]
],
"filter" => [
"display" => "Remove duplicates",
"option" => [
"no" => "No",
"yes" => "Yes" // &filter=0
]
]
];
}
public function web($get){
if($get["npt"]){
[$query, $proxy] =
$this->backend->get(
$get["npt"],
"web"
);
$query = json_decode($query, true);
}else{
$proxy = $this->backend->get_ip();
$query = [
"query" => $get["s"]
];
// add filters
if($get["nsfw"] == "no"){
$query["safe"] = 1;
}
if($get["time"] != "any"){
$query["tbs"] = $get["time"];
}
if($get["filter"] == "yes"){
$query["filter"] = 0;
}
}
try{
$html =
$this->get(
$proxy,
"https://coccoc.com/search",
$query
);
}catch(Exception $error){
throw new Exception("Failed to get search page");
}
//$html = file_get_contents("scraper/coccoc.html");
$html = explode("window.composerResponse", $html, 2);
if(count($html) !== 2){
throw new Exception("Failed to grep window.composerResponse");
}
$html =
json_decode(
$this->fuckhtml
->extract_json(
ltrim($html[1], " =")
),
true
);
if($html === null){
throw new Exception("Failed to decode JSON");
}
if(!isset($html["search"]["search_results"])){
throw new Exception("Coc Coc did not return a search_results object");
}
$out = [
"status" => "ok",
"spelling" => [
"type" => "no_correction",
"using" => null,
"correction" => null
],
"npt" => null,
"answer" => [],
"web" => [],
"image" => [],
"video" => [],
"news" => [],
"related" => []
];
// word correction
foreach($html["top"] as $element){
if(isset($element["spellChecker"][0]["query"])){
$out["spelling"] = [
"type" => "not_many",
"using" => $html["search"]["query"],
"correction" => $element["spellChecker"][0]["query"]
];
}
}
foreach($html["search"]["search_results"] as $result){
if(isset($result["type"])){
switch($result["type"]){
//
// Related searches
//
case "related_queries":
$out["related"] = $result["queries"];
continue 2;
//
// Videos
//
case "video_hits":
foreach($result["results"] as $video){
if(
isset($video["image_url"]) &&
!empty($video["image_url"])
){
$thumb = [
"ratio" => "16:9",
"url" => $video["image_url"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["video"][] = [
"title" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$video["title"]
)
),
"description" => null,
"author" => [
"name" => $video["uploader"],
"url" => null,
"avatar" => null
],
"date" => (int)$video["date"],
"duration" => (int)$video["duration"],
"views" => null,
"thumb" => $thumb,
"url" => $video["url"]
];
}
continue 2;
}
}
if(
!isset($result["title"]) ||
!isset($result["url"])
){
// should not happen
continue;
}
if(isset($result["rich"]["data"]["image_url"])){
$thumb = [
"url" => $result["rich"]["data"]["image_url"],
"ratio" => "16:9"
];
}else{
$thumb = [
"url" => null,
"ratio" => null
];
}
$sublinks = [];
if(isset($result["rich"]["data"]["linked_docs"])){
foreach($result["rich"]["data"]["linked_docs"] as $sub){
$sublinks[] = [
"title" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$sub["title"]
)
),
"description" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$sub["content"]
)
),
"date" => null,
"url" => $sub["url"]
];
}
}
// get date
if(isset($result["date"])){
$date = (int)$result["date"];
}else{
$date = null;
}
// probe for metadata
$table = [];
if(isset($result["rich"]["data"]["rating"])){
$table["Rating"] = $result["rich"]["data"]["rating"];
if(isset($result["rich"]["data"]["num_rating"])){
$table["Rating"] .= " (" . number_format($result["rich"]["data"]["num_rating"]) . " ratings)";
}
}
if(isset($result["rich"]["data"]["views"])){
$table["Views"] = number_format($result["rich"]["data"]["views"]);
}
if(isset($result["rich"]["data"]["duration"])){
$table["Duration"] = $this->int2hms($result["rich"]["data"]["duration"]);
}
if(isset($result["rich"]["data"]["channel_name"])){
$table["Author"] = $result["rich"]["data"]["channel_name"];
}
if(isset($result["rich"]["data"]["video_quality"])){
$table["Quality"] = $result["rich"]["data"]["video_quality"];
}
if(isset($result["rich"]["data"]["category"])){
$table["Category"] = $result["rich"]["data"]["category"];
}
$out["web"][] = [
"title" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$result["title"]
)
),
"description" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$result["content"]
)
),
"url" => $result["url"],
"date" => $date,
"type" => "web",
"thumb" => $thumb,
"sublink" => $sublinks,
"table" => $table
];
}
//
// Get wikipedia head
//
if(isset($html["right"])){
foreach($html["right"] as $wiki){
$description = [];
if(isset($wiki["short_intro"])){
$description[] =
[
"type" => "quote",
"value" => $wiki["short_intro"],
];
}
if(isset($wiki["intro"])){
$description[] =
[
"type" => "text",
"value" => $wiki["intro"],
];
}
// get table elements
$table = [];
if(isset($wiki["fields"])){
foreach($wiki["fields"] as $element){
$table[$element["title"]] = implode(", ", $element["value"]);
}
}
// get sublinks
$sublinks = [];
if(isset($wiki["website"])){
if(
preg_match(
'/^http/',
$wiki["website"]
) === 0
){
$sublinks["Website"] = "https://" . $wiki["website"];
}else{
$sublinks["Website"] = $wiki["website"];
}
}
foreach($wiki["profiles"] as $sitename => $url){
$sitename = explode("_", $sitename);
$sitename = ucfirst($sitename[count($sitename) - 1]);
$sublinks[$sitename] = $url;
}
$out["answer"][] = [
"title" =>
$this->titledots(
$wiki["title"]
),
"description" => $description,
"url" => null,
"thumb" => isset($wiki["image"]["contentUrl"]) ? $wiki["image"]["contentUrl"] : null,
"table" => $table,
"sublink" => $sublinks
];
}
}
// get next page
if((int)$html["search"]["page"] < (int)$html["search"]["max_page"]){
// https://coccoc.com/composer?_=1754021153532&p=0&q=zbabduiqwhduwqhdnwq&reqid=bwcAs00q&s=direct&apiV=1
// ^json endpoint, but we can just do &page=2 lol
if(!isset($query["page"])){
$query["page"] = 2;
}else{
$query["page"]++;
}
$out["npt"] =
$this->backend
->store(
json_encode($query),
"web",
$proxy
);
}
return $out;
}
public function video($get){
//$html = file_get_contents("scraper/coccoc.html");
if($get["npt"]){
[$query, $proxy] =
$this->backend->get(
$get["npt"],
"videos"
);
$query = json_decode($query, true);
}else{
$proxy = $this->backend->get_ip();
$query = [
"query" => $get["s"],
"tbm" => "vid"
];
// add filters
if($get["nsfw"] == "no"){
$query["safe"] = 1;
}
if($get["time"] != "any"){
$query["tbs"] = $get["time"];
}
if($get["filter"] == "yes"){
$query["filter"] = 0;
}
}
try{
$html =
$this->get(
$proxy,
"https://coccoc.com/search",
$query
);
}catch(Exception $error){
throw new Exception("Failed to get search page");
}
$html = explode("window.composerResponse", $html, 2);
if(count($html) !== 2){
throw new Exception("Failed to grep window.composerResponse");
}
$html =
json_decode(
$this->fuckhtml
->extract_json(
ltrim($html[1], " =")
),
true
);
if($html === null){
throw new Exception("Failed to decode JSON");
}
$out = [
"status" => "ok",
"npt" => null,
"video" => [],
"author" => [],
"livestream" => [],
"playlist" => [],
"reel" => []
];
if(!isset($html["search_video"]["search_results"])){
if(isset($html["search_video"]["error"]["title"])){
if($html["search_video"]["error"]["title"] == "Không tìm thấy kết quả nào"){
return $out;
}
throw new Exception("Coc Coc returned an error: " . $html["search_video"]["error"]["title"]);
}
throw new Exception("Coc Coc did not supply a search_results object");
}
foreach($html["search_video"]["search_results"] as $video){
if(isset($video["rich"]["data"]["image_url"])){
$thumb = [
"ratio" => "16:9",
"url" => $video["rich"]["data"]["image_url"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["video"][] = [
"title" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$video["title"]
)
),
"description" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$video["content"]
)
),
"author" => [
"name" =>
isset($video["rich"]["data"]["channel_name"]) ?
$video["rich"]["data"]["channel_name"] : null,
"url" => null,
"avatar" => null
],
"date" =>
isset($video["date"]) ?
$video["date"] : null,
"duration" =>
isset($video["rich"]["data"]["duration"]) ?
(int)$video["rich"]["data"]["duration"] : null,
"views" => null,
"thumb" => $thumb,
"url" => $video["url"]
];
}
// get next page
if((int)$html["search_video"]["page"] < (int)$html["search_video"]["max_page"]){
if(!isset($query["page"])){
$query["page"] = 2;
}else{
$query["page"]++;
}
$out["npt"] =
$this->backend
->store(
json_encode($query),
"videos",
$proxy
);
}
return $out;
}
private function titledots($title){
return trim($title, " .\t\n\r\0\x0B");
}
private function int2hms($seconds){
$hours = floor($seconds / 3600);
$minutes = floor(($seconds % 3600) / 60);
$seconds = $seconds % 60;
return sprintf("%02d:%02d:%02d", $hours, $minutes, $seconds);
}
}

File diff suppressed because it is too large Load Diff

262
scraper/fivehpx.php Normal file
View File

@@ -0,0 +1,262 @@
<?php
class fivehpx{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("fivehpx");
include "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
}
public function getfilters($page){
return [
"sort" => [
"display" => "Sort",
"option" => [
"relevance" => "Relevance",
"pulse" => "Pulse",
"newest" => "Newest"
]
]
];
}
private function get($proxy, $url, $get = [], $post_data = null){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
if($post_data === null){
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: same-origin",
"Sec-Fetch-User: ?1",
"Priority: u=0, i",
"TE: trailers"]
);
}else{
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"Referer: https://500px.com/",
"content-type: application/json",
//"x-csrf-token: undefined",
"x-500px-source: Search",
"Content-Length: " . strlen($post_data),
"Origin: https://500px.com",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
// "Cookie: _pin_unauth, _fbp, _sharedID, _sharedID_cst",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-site",
"Priority: u=4",
"TE: trailers"]
);
// set post data
curl_setopt($curlproc, CURLOPT_POST, true);
curl_setopt($curlproc, CURLOPT_POSTFIELDS, $post_data);
}
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
// http2 bypass
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function image($get){
if($get["npt"]){
[$pagination, $proxy] =
$this->backend->get(
$get["npt"], "images"
);
$pagination = json_decode($pagination, true);
$search = $pagination["search"];
}else{
$search = $get["s"];
if(strlen($search) === 0){
throw new Exception("Search term is empty!");
}
$proxy = $this->backend->get_ip();
$pagination = [
"sort" => strtoupper($get["sort"]),
"search" => $search,
"filters" => [],
"nlp" => false,
];
}
try{
$json =
$this->get(
$proxy,
"https://api.500px.com/graphql",
[],
json_encode([
"operationName" => "PhotoSearchPaginationContainerQuery",
"variables" => $pagination,
"query" =>
'query PhotoSearchPaginationContainerQuery(' .
(isset($pagination["cursor"]) ? '$cursor: String, ' : "") .
'$sort: PhotoSort, $search: String!, $filters: [PhotoSearchFilter!], $nlp: Boolean) { ...PhotoSearchPaginationContainer_query_1vzAZD} fragment PhotoSearchPaginationContainer_query_1vzAZD on Query { photoSearch(sort: $sort, first: 100, ' .
(isset($pagination["cursor"]) ? 'after: $cursor, ' : "") .
'search: $search, filters: $filters, nlp: $nlp) { edges { node { id legacyId canonicalPath name description width height images(sizes: [33, 36]) { size url id } } } totalCount pageInfo { endCursor hasNextPage } }}'
])
);
}catch(Exception $error){
throw new Exception("Failed to fetch graphQL object");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to decode graphQL object");
}
if(isset($json["errors"][0]["message"])){
throw new Exception("500px returned an API error: " . $json["errors"][0]["message"]);
}
if(!isset($json["data"]["photoSearch"]["edges"])){
throw new Exception("No edges returned by API");
}
$out = [
"status" => "ok",
"npt" => null,
"image" => []
];
foreach($json["data"]["photoSearch"]["edges"] as $image){
$image = $image["node"];
$title =
trim(
$this->fuckhtml
->getTextContent(
$image["name"]
) . ": " .
$this->fuckhtml
->getTextContent(
$image["description"]
)
, " :"
);
$small = $this->image_ratio(600, $image["width"], $image["height"]);
$large = $this->image_ratio(2048, $image["width"], $image["height"]);
$out["image"][] = [
"title" => $title,
"source" => [
[
"url" => $image["images"][1]["url"],
"width" => $large[0],
"height" => $large[1]
],
[
"url" => $image["images"][0]["url"],
"width" => $small[0],
"height" => $small[1]
]
],
"url" => "https://500px.com" . $image["canonicalPath"]
];
}
// get NPT token
if($json["data"]["photoSearch"]["pageInfo"]["hasNextPage"] === true){
$out["npt"] =
$this->backend->store(
json_encode([
"cursor" => $json["data"]["photoSearch"]["pageInfo"]["endCursor"],
"search" => $search,
"sort" => $pagination["sort"],
"filters" => [],
"nlp" => false
]),
"images",
$proxy
);
}
return $out;
}
private function image_ratio($longest_edge, $width, $height){
$ratio = [
$longest_edge / $width,
$longest_edge / $height
];
if($ratio[0] < $ratio[1]){
$ratio = $ratio[0];
}else{
$ratio = $ratio[1];
}
return [
floor($width * $ratio),
floor($height * $ratio)
];
}
}

415
scraper/flickr.php Normal file
View File

@@ -0,0 +1,415 @@
<?php
class flickr{
const req_web = 0;
const req_xhr = 1;
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("flickr");
include "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
}
public function getfilters($page){
return [
"nsfw" => [
"display" => "NSFW",
"option" => [
"yes" => "Yes",
"maybe" => "Maybe",
"no" => "No",
]
],
"sort" => [
"display" => "Sort by",
"option" => [
"relevance" => "Relevance",
"date-posted-desc" => "Newest uploads",
"date-posted-asc" => "Oldest uploads",
"date-taken-desc" => "Newest taken",
"date-taken-asc" => "Oldest taken",
"interestingness-desc" => "Interesting"
]
],
"color" => [
"display" => "Color",
"option" => [
"any" => "Any color",
// color_codes=
"0" => "Red",
"1" => "Brown",
"2" => "Orange",
"b" => "Pink",
"4" => "Yellow",
"3" => "Golden",
"5" => "Lime",
"6" => "Green",
"7" => "Sky blue",
"8" => "Blue",
"9" => "Purple",
"a" => "Hot pink",
"c" => "White",
"d" => "Gray",
"e" => "Black",
// styles= override
"blackandwhite" => "Black & white",
]
],
"style" => [ // styles=
"display" => "Style",
"option" => [
"any" => "Any style",
"depthoffield" => "Depth of field",
"minimalism" => "Minimalism",
"pattern" => "Patterns"
]
],
"license" => [
"display" => "License",
"option" => [
"any" => "Any license",
"1,2,3,4,5,6,9,11,12,13,14,15,16" => "All creative commons",
"4,5,6,9,10,11,12,13" => "Commercial use allowed",
"1,2,4,5,9,10,11,12,14,15" => "Modifications allowed",
"4,5,9,10,11,12" => "Commercial use & mods allowed",
"7,9,10" => "No known copyright restrictions",
"8" => "U.S Government works"
]
]
];
}
private function get($proxy, $url, $get = [], $reqtype){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
if($reqtype === flickr::req_web){
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: same-origin",
"Sec-Fetch-User: ?1",
"Priority: u=0, i",
"TE: trailers"]
);
}else{
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"Origin: https://www.flickr.com",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Referer: https://www.flickr.com/",
// Cookie:
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-site",
"TE: trailers"]
);
}
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
// http2 bypass
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function image($get){
if($get["npt"]){
[$filters, $proxy] =
$this->backend->get(
$get["npt"], "images"
);
$filters = json_decode($filters, true);
// Workaround for the future, if flickr deprecates &page argument on html page
/*
try{
$json =
$this->get(
$proxy,
"https://api.flickr.com/services/rest",
[
"sort" => $data["sort"],
"parse_tags" => 1,
// url_s,url_n,url_w,url_m,url_z,url_c,url_l,url_h,url_k,url_3k,url_4k,url_5k,url_6k,url_o
"extras" => "can_comment,can_print,count_comments,count_faves,description,isfavorite,license,media,needs_interstitial,owner_name,path_alias,realname,rotation,url_sq,url_q,url_t,url_s,url_n,url_w,url_m,url_z,url_c,url_l",
"per_page" => 100,
"page" => $data["page"],
"lang" => "en-US",
"text" => $data["search"],
"viewerNSID" => "",
"method" => "flickr.photos.search",
"csrf" => "",
"api_key" => $data["api_key"],
"format" => "json",
"hermes" => 1,
"hermesClient" => 1,
"reqId" => $data["reqId"],
"nojsoncallback" => 1
]
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}*/
}else{
if(strlen($get["s"]) === 0){
throw new Exception("Search term is empty!");
}
$proxy = $this->backend->get_ip();
// compute filters
$filters = [
"page" => 1,
"sort" => $get["sort"]
];
if($get["style"] != "any"){
$filters["styles"] = $get["style"];
}
if($get["color"] != "any"){
if($get["color"] != "blackandwhite"){
$filters["color_codes"] = $get["color"];
}else{
$filters["styles"] = "blackandwhite";
}
}
if($get["license"] != "any"){
$filters["license"] = $get["license"];
}
switch($get["nsfw"]){
case "yes": $filters["safe_search"] = 0; break;
case "maybe": $filters["safe_search"] = 2; break;
case "no": $filters["safe_search"] = 1; break;
}
}
$get_params = [
"text" => $get["s"],
"per_page" => 50,
// scrape highest resolution
"extras" => "url_s,url_n,url_w,url_m,url_z,url_c,url_l,url_h,url_k,url_3k,url_4k,url_5k,url_6k,url_o",
"view_all" => 1
];
$get_params = array_merge($get_params, $filters);
$html =
$this->get(
$proxy,
"https://www.flickr.com/search/",
$get_params,
flickr::req_web
);
// @TODO
// get api_key and reqId, if flickr deprecates &page
$this->fuckhtml->load($html);
//
// get response JSON
//
$scripts =
$this->fuckhtml
->getElementsByClassName(
"modelExport",
"script"
);
$found = false;
foreach($scripts as $script){
$json =
preg_split(
'/modelExport: ?/',
$script["innerHTML"],
2
);
if(count($json) !== 0){
$found = true;
$json = $json[1];
break;
}
}
if($found === false){
throw new Exception("Failed to grep JSON");
}
$json =
json_decode(
$this->fuckhtml
->extract_json(
$json
),
true
);
if($json === null){
throw new Exception("Failed to decode JSON");
}
$out = [
"status" => "ok",
"npt" => null,
"image" => []
];
if(!isset($json["main"]["search-photos-lite-models"][0]["data"]["photos"]["data"]["_data"])){
throw new Exception("Failed to access data object");
}
foreach($json["main"]["search-photos-lite-models"][0]["data"]["photos"]["data"]["_data"] as $image){
if(!isset($image["data"])){
// flickr likes to gives us empty array objects
continue;
}
$image = $image["data"];
$title = [];
if(isset($image["title"])){
$title[] =
$this->fuckhtml
->getTextContent(
$image["title"]
);
}
if(isset($image["description"])){
$title[] =
$this->fuckhtml
->getTextContent(
str_replace(
"\n",
" ",
$image["description"]
)
);
}
$title = implode(": ", $title);
$sources = array_values($image["sizes"]["data"]);
$suitable_sizes = ["n", "m", "w", "s"];
$thumb = &$sources[0]["data"];
foreach($suitable_sizes as $testing_size){
if(isset($image["sizes"]["data"][$testing_size])){
$thumb = &$image["sizes"]["data"][$testing_size]["data"];
break;
}
}
$og = &$sources[count($sources) - 1]["data"];
$out["image"][] = [
"title" => $title,
"source" => [
[
"url" => "https:" . $og["displayUrl"],
"width" => (int)$og["width"],
"height" => (int)$og["height"]
],
[
"url" => "https:" . $thumb["displayUrl"],
"width" => (int)$thumb["width"],
"height" => (int)$thumb["height"]
]
],
"url" => "https://www.flickr.com/photos/" . $image["ownerNsid"] . "/" . $image["id"] . "/"
];
}
$total_items = (int)$json["main"]["search-photos-lite-models"][0]["data"]["photos"]["data"]["totalItems"];
if(($filters["page"]) * 50 < $total_items){
$filters["page"]++;
$out["npt"] =
$this->backend->store(
json_encode($filters),
"images",
$proxy
);
}
return $out;
}
}

View File

@@ -136,7 +136,7 @@ class ftm{
"source" => [
[
"url" =>
"https://findthatmeme.us-southeast-1.linodeobjects.com/" .
"https://s3.thehackerblog.com/findthatmeme/" .
$thumb,
"width" => null,
"height" => null

File diff suppressed because it is too large Load Diff

1054
scraper/google_cse.php Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -16,49 +16,82 @@ class greppr{
return [];
}
private function get($proxy, $url, $get = [], $cookie = false){
private function get($proxy, $url, $get = [], $cookie = false, $post){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
if($cookie === false){
if($post === false){
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"]
);
if($cookie === false){
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"]
);
}else{
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Referer: https://greppr.org/search",
"Cookie: PHPSESSID=$cookie",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: same-origin",
"Sec-Fetch-User: ?1",
"Priority: u=0, i"]
);
}
}else{
$get = http_build_query($get);
curl_setopt($curlproc, CURLOPT_POST, true);
curl_setopt($curlproc, CURLOPT_POSTFIELDS, $get);
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"Cookie: PHPSESSID=" . $cookie,
"Accept-Encoding: gzip, deflate, br, zstd",
"Content-Type: application/x-www-form-urlencoded",
"Content-Length: " . strlen($get),
"Origin: https://greppr.org",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Referer: https://greppr.org/",
"Cookie: PHPSESSID=$cookie",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"]
"Sec-Fetch-Site: same-origin",
"Sec-Fetch-User: ?1",
"Priority: u=0, i"]
);
}
@@ -113,7 +146,24 @@ class greppr{
[$q, $proxy] = $this->backend->get($get["npt"], "web");
$q = json_decode($q, true);
$tokens = json_decode($q, true);
//
// Get paginated page
//
try{
$html = $this->get(
$proxy,
"https://greppr.org" . $tokens["get"],
[],
$tokens["cookie"],
false
);
}catch(Exception $error){
throw new Exception("Failed to fetch search page");
}
}else{
@@ -124,88 +174,114 @@ class greppr{
}
$proxy = $this->backend->get_ip();
}
// get token
// token[0] = static token that changes once a day
// token[1] = dynamic token that changes on every request
// token[1] = PHPSESSID cookie
$tokens = apcu_fetch("greppr_token");
if(
$tokens === false ||
$first_attempt === false // force token fetch
){
// we haven't gotten the token yet, get it
//
// get token
//
try{
$response =
$html =
$this->get(
$proxy,
"https://greppr.org",
[]
[],
false,
false
);
}catch(Exception $error){
throw new Exception("Failed to fetch search tokens");
}
$tokens = $this->parse_token($response);
//
// Parse token
//
$this->fuckhtml->load($html["data"]);
$tokens = [];
$inputs =
$this->fuckhtml
->getElementsByTagName(
"input"
);
foreach($inputs as $input){
if(!isset($input["attributes"]["name"])){
continue;
}
switch($input["attributes"]["name"]){
case "var1":
case "var2":
case "n":
$tokens[$input["attributes"]["name"]] =
$this->fuckhtml
->getTextContent(
$input["attributes"]["value"]
);
break;
default:
$tokens["req"] =
$this->fuckhtml
->getTextContent(
$input["attributes"]["name"]
);
break;
}
}
// get cookie
preg_match(
'/PHPSESSID=([^;]+)/',
$html["headers"]["set-cookie"],
$cookie
);
if(!isset($cookie[1])){
// server sent an unexpected cookie
throw new Exception("Got malformed cookie");
}
$tokens["cookie"] = $cookie[1];
if($tokens === false){
throw new Exception("Failed to grep search tokens");
}
}
try{
if($get["npt"]){
//
// Get initial search page
//
try{
$html = $this->get(
$proxy,
"https://greppr.org/search",
[
"var1" => $tokens["var1"],
"var2" => $tokens["var2"],
$tokens["req"] => $search,
"n" => $tokens["n"]
],
$tokens["cookie"],
true
);
}catch(Exception $error){
$params = [
$tokens[0] => $q["q"],
"s" => $q["s"],
"l" => 30,
"n" => $tokens[1]
];
}else{
$params = [
$tokens[0] => $search,
"n" => $tokens[1]
];
throw new Exception("Failed to fetch search page");
}
$searchresults = $this->get(
$proxy,
"https://greppr.org/search",
$params,
$tokens[2]
);
}catch(Exception $error){
throw new Exception("Failed to fetch search page");
}
if(strlen($searchresults["data"]) === 0){
// redirected to main page, which means we got old token
// generate a new one
// ... unless we just tried to do that
if($first_attempt === false){
throw new Exception("Failed to get a new search token");
}
return $this->web($get, false);
}
//$html = file_get_contents("scraper/greppr.html");
//$this->fuckhtml->load($html);
$this->fuckhtml->load($html["data"]);
// refresh the token with new data (this also triggers fuckhtml load)
$this->parse_token($searchresults, $tokens[2]);
// response object
$out = [
"status" => "ok",
"spelling" => [
@@ -254,24 +330,16 @@ class greppr{
if($break === true){
parse_str(
$this->fuckhtml
->getTextContent(
$a["attributes"]["href"]
),
$values
);
$values = array_values($values);
$out["npt"] =
$this->backend->store(
json_encode(
[
"q" => $values[0],
"s" => $values[1]
]
),
json_encode([
"get" =>
$this->fuckhtml
->getTextContent(
$a["attributes"]["href"]
),
"cookie" => $tokens["cookie"]
]),
"web",
$proxy
);
@@ -360,74 +428,6 @@ class greppr{
return $out;
}
private function parse_token($response, $cookie = false){
$this->fuckhtml->load($response["data"]);
$scripts =
$this->fuckhtml
->getElementsByTagName("script");
$found = false;
foreach($scripts as $script){
preg_match(
'/window\.location ?= ?\'\/search\?([^=]+).*&n=([0-9]+)/',
$script["innerHTML"],
$tokens
);
if(isset($tokens[1])){
$found = true;
break;
}
}
if($found === false){
return false;
}
$tokens = [
$tokens[1],
$tokens[2]
];
if($cookie !== false){
// we already specified a cookie, so use the one we have already
$tokens[] = $cookie;
apcu_store("greppr_token", $tokens);
return $tokens;
}
if(!isset($response["headers"]["set-cookie"])){
// server didn't send a cookie
return false;
}
// get cookie
preg_match(
'/PHPSESSID=([^;]+)/',
$response["headers"]["set-cookie"],
$cookie
);
if(!isset($cookie[1])){
// server sent an unexpected cookie
return false;
}
$tokens[] = $cookie[1];
apcu_store("greppr_token", $tokens);
return $tokens;
}
private function limitstrlen($text){
return explode("\n", wordwrap($text, 300, "\n"))[0];

View File

@@ -182,6 +182,23 @@ class imgur{
throw new Exception("Failed to fetch HTML");
}
$json = json_decode($html, true);
if($json){
// {"data":{"error":"Imgur is temporarily over capacity. Please try again later."},"success":false,"status":403}
if(isset($json["data"]["error"])){
if(stripos($json["data"]["error"], "capacity")){
throw new Exception("Imgur IP blocked this 4get instance or request proxy. Try again");
}
}
throw new Exception("Imgur returned an unknown error (IP ban?)");
}
$this->fuckhtml->load($html);
$posts =
@@ -197,7 +214,14 @@ class imgur{
$image =
$this->fuckhtml
->getElementsByTagName("img")[0];
->getElementsByTagName("img");
if(count($image) === 0){
continue;
}
$image = $image[0];
$image_url = "https:" . substr($this->fuckhtml->getTextContent($image["attributes"]["src"]), 0, -5);

View File

@@ -3,7 +3,10 @@
class marginalia{
public function __construct(){
include "lib/fuckhtml.php";
include "lib/anubis.php";
$this->anubis = new anubis();
include_once "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
include "lib/backend.php";
@@ -102,7 +105,40 @@ class marginalia{
);
}
private function get($proxy, $url, $get = []){
private function get($proxy, $url, $get = [], $get_cookies = 1){
$curlproc = curl_init();
switch($get_cookies){
case 0:
$cookies = "";
$cookies_tmp = [];
curl_setopt($curlproc, CURLOPT_HEADERFUNCTION, function($curlproc, $header) use (&$cookies_tmp){
$length = strlen($header);
$header = explode(":", $header, 2);
if(trim(strtolower($header[0])) == "set-cookie"){
$cookie_tmp = explode("=", trim($header[1]), 2);
$cookies_tmp[trim($cookie_tmp[0])] =
explode(";", $cookie_tmp[1], 2)[0];
}
return $length;
});
break;
case 1:
$cookies = "";
break;
default:
$cookies = "Cookie: " . $get_cookies;
}
$headers = [
"User-Agent: " . config::USER_AGENT,
@@ -110,6 +146,7 @@ class marginalia{
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
$cookies,
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
@@ -118,8 +155,6 @@ class marginalia{
"Sec-Fetch-User: ?1"
];
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
@@ -145,7 +180,19 @@ class marginalia{
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
if($get_cookies === 0){
$cookie = [];
foreach($cookies_tmp as $key => $value){
$cookie[] = $key . "=" . $value;
}
curl_close($curlproc);
return implode(";", $cookie);
}
return $data;
}
@@ -220,13 +267,14 @@ class marginalia{
"related" => []
];
// API scraper
if(config::MARGINALIA_API_KEY !== null){
try{
$json =
$this->get(
$this->backend->get_ip(), // no nextpage
"https://api.marginalia.nu/" . config::MARGINALIA_API_KEY . "/search/" . urlencode($search),
"https://api.marginalia-search.com/" . config::MARGINALIA_API_KEY . "/search/" . urlencode($search),
[
"count" => 20
]
@@ -263,34 +311,114 @@ class marginalia{
return $out;
}
// no more cloudflare!! Parse html by default
$params = [
"query" => $search
];
// HTML parser
$proxy = $this->backend->get_ip();
foreach(["adtech", "recent", "intitle"] as $v){
//
// Bypass anubis check
//
/*
if(($anubis_key = apcu_fetch("marginalia_cookie")) === false){
if($get[$v] == "yes"){
switch($v){
try{
$html =
$this->get(
$proxy,
"https://old-search.marginalia.nu/search",
[
"query" => $search
]
);
case "adtech": $params["adtech"] = "reduce"; break;
case "recent": $params["recent"] = "recent"; break;
case "adtech": $params["searchTitle"] = "title"; break;
}catch(Exception $error){
throw new Exception("Failed to get anubis challenge");
}
try{
$anubis_data = $this->anubis->scrape($html);
}catch(Exception $error){
throw new Exception($error);
}
// send anubis response & get cookies
// https://old-search.marginalia.nu/.within.website/x/cmd/anubis/api/pass-challenge?response=0000018966b086834f738bacba6031028adb5aa875974ead197a8b75778baf3a&nonce=39947&redir=https%3A%2F%2Fold-search.marginalia.nu%2F&elapsedTime=1164
try{
$anubis_key =
$this->get(
$proxy,
"https://old-search.marginalia.nu/.within.website/x/cmd/anubis/api/pass-challenge",
[
"response" => $anubis_data["response"],
"nonce" => $anubis_data["nonce"],
"redir" => "https://old-search.marginalia.nu/",
"elapsedTime" => random_int(1000, 2000)
],
0
);
}catch(Exception $error){
throw new Exception("Failed to submit anubis challenge");
}
apcu_store("marginalia_cookie", $anubis_key);
}*/
if($get["npt"]){
[$params, $proxy] =
$this->backend->get(
$get["npt"],
"web"
);
try{
$html =
$this->get(
$proxy,
"https://old-search.marginalia.nu/search?" . $params,
[],
//$anubis_key
);
}catch(Exception $error){
throw new Exception("Failed to get HTML");
}
}else{
$params = [
"query" => $search
];
foreach(["adtech", "recent", "intitle"] as $v){
if($get[$v] == "yes"){
switch($v){
case "adtech": $params["adtech"] = "reduce"; break;
case "recent": $params["recent"] = "recent"; break;
case "adtech": $params["searchTitle"] = "title"; break;
}
}
}
}
try{
$html =
$this->get(
$this->backend->get_ip(),
"https://search.marginalia.nu/search",
$params
);
}catch(Exception $error){
throw new Exception("Failed to get HTML");
try{
$html =
$this->get(
$proxy,
"https://old-search.marginalia.nu/search",
$params,
//$anubis_key
);
}catch(Exception $error){
throw new Exception("Failed to get HTML");
}
}
$this->fuckhtml->load($html);
@@ -387,6 +515,65 @@ class marginalia{
];
}
// get next page
$this->fuckhtml->load($html);
$pagination =
$this->fuckhtml
->getElementsByAttributeValue(
"aria-label",
"pagination",
"nav"
);
if(count($pagination) === 0){
// no pagination
return $out;
}
$this->fuckhtml->load($pagination[0]);
$pages =
$this->fuckhtml
->getElementsByClassName(
"page-link",
"a"
);
$found_current_page = false;
foreach($pages as $page){
if(
stripos(
$page["attributes"]["class"],
"active"
) !== false
){
$found_current_page = true;
continue;
}
if($found_current_page){
// we found current page index, and we iterated over
// the next page <a>
$out["npt"] =
$this->backend->store(
parse_url(
$page["attributes"]["href"],
PHP_URL_QUERY
),
"web",
$proxy
);
break;
}
}
return $out;
}
}

View File

@@ -457,7 +457,7 @@ class mojeek{
"tn" => 7, // number of news results/page
"date" => 1, // show date
"tlen" => 128, // max length of title
"dlen" => 511, // max length of description
//"dlen" => 511, // max length of description
"arc" => ($country == "any" ? "none" : $country) // location. don't use autodetect!
];
@@ -501,11 +501,6 @@ class mojeek{
throw new Exception("Failed to get HTML");
}
/*
$handle = fopen("scraper/mojeek.html", "r");
$html = fread($handle, filesize("scraper/mojeek.html"));
fclose($handle);*/
}
$out = [
@@ -526,6 +521,8 @@ class mojeek{
$this->fuckhtml->load($html);
$this->detect_block();
$results =
$this->fuckhtml
->getElementsByClassName("results-standard", "ul");
@@ -695,15 +692,18 @@ class mojeek{
preg_match(
'/\/image\?img=([^&]+)/i',
$thumb[0]["attributes"]["src"],
$thumb
$matches
);
if(count($thumb) === 2){
if(count($matches) === 2){
// for some reason, if we dont get the image from mojeek
// it sometimes fail to fetch the right image URL
$answer["thumb"] =
"https://mojeek.com" .
$this->fuckhtml
->getTextContent(
$thumb[1]
$thumb[0]["attributes"]["src"]
);
}
}
@@ -1032,6 +1032,8 @@ class mojeek{
$this->fuckhtml->load($html);
$this->detect_block();
$articles =
$this->fuckhtml->getElementsByTagName("article");
@@ -1164,6 +1166,26 @@ class mojeek{
return $out;
}
private function detect_block(){
$title =
$this->fuckhtml
->getElementsByTagName(
"title"
);
if(
count($title) !== 0 &&
$this->fuckhtml
->getTextContent(
$title[0]["innerHTML"]
) == "403 - Forbidden"
){
throw new Exception("Mojeek blocked this instance or request proxy.");
}
}
private function titledots($title){
return trim($title, ". \t\n\r\0\x0B");

View File

@@ -13,31 +13,104 @@ class pinterest{
return [];
}
private function get($proxy, $url, $get = []){
private function get($proxy, $url, $get = [], &$cookies, $header_data_post = null){
$curlproc = curl_init();
if($get !== []){
if($header_data_post === null){
// handling GET
// extract cookies
$cookies_tmp = [];
curl_setopt($curlproc, CURLOPT_HEADERFUNCTION, function($curlproc, $header) use (&$cookies_tmp){
$length = strlen($header);
$header = explode(":", $header, 2);
if(trim(strtolower($header[0])) == "set-cookie"){
$cookie_tmp = explode("=", trim($header[1]), 2);
$cookies_tmp[trim($cookie_tmp[0])] =
explode(";", $cookie_tmp[1], 2)[0];
}
return $length;
});
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: application/json, text/javascript, */*, q=0.01",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"Referer: https://ca.pinterest.com/",
"X-Requested-With: XMLHttpRequest",
"X-APP-VERSION: 78f8764",
"X-Pinterest-AppState: active",
"X-Pinterest-Source-Url: /",
"X-Pinterest-PWS-Handler: www/index.js",
"screen-dpr: 1",
"is-preload-enabled: 1",
"DNT: 1",
"Sec-GPC: 1",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"Connection: keep-alive",
"Alt-Used: ca.pinterest.com",
"Priority: u=0",
"TE: trailers"]
);
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
}else{
// handling POST (pagination)
$get = http_build_query($get);
$url .= "?" . $get;
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: application/json, text/javascript, */*, q=0.01",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"Content-Type: application/x-www-form-urlencoded",
"Content-Length: " . strlen($get),
"Referer: https://ca.pinterest.com/",
"X-Requested-With: XMLHttpRequest",
"X-APP-VERSION: 78f8764",
"X-CSRFToken: " . $cookies["csrf"],
"X-Pinterest-AppState: active",
"X-Pinterest-Source-Url: /search/pins/?rs=ac&len=2&q=" . urlencode($header_data_post) . "&eq=" . urlencode($header_data_post),
"X-Pinterest-PWS-Handler: www/search/[scope].js",
"screen-dpr: 1",
"is-preload-enabled: 1",
"Origin: https://ca.pinterest.com",
"DNT: 1",
"Sec-GPC: 1",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"Connection: keep-alive",
"Alt-Used: ca.pinterest.com",
"Cookie: " . $cookies["cookie"],
"TE: trailers"]
);
curl_setopt($curlproc, CURLOPT_POST, true);
curl_setopt($curlproc, CURLOPT_POSTFIELDS, $get);
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"]
);
// http2 bypass
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
@@ -54,6 +127,26 @@ class pinterest{
throw new Exception(curl_error($curlproc));
}
if($header_data_post === null){
if(!isset($cookies_tmp["csrftoken"])){
throw new Exception("Failed to grep CSRF token");
}
$cookies = "";
foreach($cookies_tmp as $cookie_name => $cookie_value){
$cookies .= $cookie_name . "=" . $cookie_value . "; ";
}
$cookies = [
"csrf" => $cookies_tmp["csrftoken"],
"cookie" => rtrim($cookies, " ;")
];
}
curl_close($curlproc);
return $data;
}
@@ -62,17 +155,68 @@ class pinterest{
if($get["npt"]){
// @TODO
// post data for next page
$data = [
"source_url" => "/search/pins/?q=" . urlencode($search) . "&rs=typed",
"data" =>
json_encode(
[$data, $proxy] =
$this->backend->get(
$get["npt"], "images"
);
$data = json_decode($data, true);
$search = $data["q"];
$cookies = $data["cookies"];
try{
$json =
$this->get(
$proxy,
"https://ca.pinterest.com/resource/BaseSearchResource/get/",
[
// {"options":{"applied_filters":null,"appliedProductFilters":"---","article":null,"auto_correction_disabled":false,"corpus":null,"customized_rerank_type":null,"domains":null,"filters":null,"journey_depth":null,"page_size":null,"price_max":null,"price_min":null,"query_pin_sigs":null,"query":"higurashi","redux_normalize_feed":true,"rs":"typed","scope":"pins","selected_one_bar_modules":null,"source_id":null,"source_module_id":null,"top_pin_id":null,"bookmarks":["Y2JVSG81V2sxcmNHRlpWM1J5VFVad1ZsWlVRbXhpVmtreVZsZHpOV0pIU2tkV2FscFhVbXhhVkZreU1WSmtNREZWVjIxR1RrMXNTbEJXYlhSaFVtMVdjMVZ1U2xaaWEzQnpXVlJPVTJWV1pISlhhM1JYVm10V05sVldVbE5XVjBwMVVXMUdWVll6VFhoVWJYaFhWMVp3Ums1V1RsTmlSbGt5Vm10YWFtVkdWbkpOU0dSUFZsZG9XRmxzWkc5VlZscHlWbGhrYkdKR1NubFdWelZQWVVaYWRHVkVRbFppUmtwVVZrUktWMlJIVWtWV2JHaHBVakZLU0Zkc1pEUmtNVnBZVW10b2FsSXdXbkJXYlRWRFpHeGFSMWRzVG1oaGVrWllXV3RvVTFVeFpFaFZiRUpoVm5wRk1GbHFSbXRYVjA1R1YyczFWMVpHV2pSWFZtaDNVakZrY2sxWVRsaGlhM0JXV1ZSR1MyRkdiRlZTYm1SVVVteHdXbGxWVlRGVk1VbDVWRmhrVjAxdVVuWlVhMXBTWlVaT2MxcEhSbE5TTWswMVdtdGFWMU5YU2paVmJYaFRUVmhDUjFZeU5YZFVNVkY0VjJ0b1ZXRnJOVlpVVmxwTFVURndXR042VmxOV2ExcGFXVlZWTlZVeFNYZE5WRTVYVWtWYVZGWkhNVTlXTVU1WllVWk9hR1ZyV2s1WFZ6QXhZakpPVjFWWWFHRlNWbkJRVm14U1IwMUdXWGxOVkVKVlRWWnNORll5TURWV1YwVjVWV3hDV21FeGNETmFSVnByVjFkS1IyTkhhR2xYUjJkM1ZtdGFhMlF4VVhsVGJGcE9Wa1p3YjFwWGVFdFZWbFp4VW14YWJGWnRVbHBaTUdoTFZHMUtTR1ZJYUZkV2VrWjJWMVphU21ReVJYcGpSbFpwVW10d1RGZHJVa0pPVms1SFZHNVNUbFl3V2xoVmJYUldaVVpaZUZremFGUk5hM0JYVkZaYVYyRkZNSGxWYkVKYVlrWlZlRnBGV210WFIwNUpVMnMxVTFaR1dscFdWekI0VFVaV1IxTllaR3BUUlhCb1dWUkdWbVZHVm5SbFJuQnNZbFpKTWxSVlVYaFBSVGxGV1hwR1QyVnJSVEZVVlZKT1RrVXhSVkpVUWs5bGJFVXhWRmhzZDFOR1ZsWmtNMFp0VWpGYWIxZFhjRXBsUlRGSVZWaHdUbFl4YTNoVVZWSnFUVVUxV0ZadGFFOVNSVnB6Vkd0a1drMUdiRFpUVkVaT1pXMWplRmRzVWxkaFJuQllWVlJTVDJWdFRqWlVNVkpTWlZad2NWcEhkRTlsYTFwMFZGVlNhMkpWTVZWVFZFcE9Wa1pzTmxkWE1WSk9WVEYwVlcweFVGWXdXVFJXUjNSWFYwZGFRbEJVTVRoUFJHTXhUbnBCTlUxRVRUUk5SRVV3VG5wUk5VMTVjRWhWVlhkeFprUlZlRTlFVVRKWlZHc3lUMWRSTWsxVVVUSk9iVnBvV1RKWmVrNTZXWGhPTWs1cFQwUkZNVTlFVm1sTlZGcHBUV3BTYTFsWFRtcE9SR015VG1wVk5GbHFaR2haVjFacldWUmFiVmxxWkdoYVZGWnFUa1JXT0ZSclZsaG1RVDA5fFVIbzVhRkpYZUc1WFYyUlpWVEpHYkdGNk1XWk5ha1ptVFZSR09FOUVZekZPZWtFMVRVUk5ORTFFUlRCT2VsRTFUWGx3U0ZWVmQzRm1SMWw1VFZSUk1WbDZUVEJhUjFGNVQxZFNhVnB0VlRGT1JFVXdXVlJuZVU1cVRUUk5hbU40VDBSSk1VNXFWVEZOYlZwcVdsUnJlRTFFVVhwWmVsVjNXbXBvYkU1dFJYbE9ha0Y2VDFSSk5VMTZWVEJaYWtJNFZHdFdXR1pCUFQwPXxOb25lfDg3NTcwOTAzODAxNDc0OTMqR1FMKnwzMjM3YjM3ZGNhMGU3YjYyYzYzYzAyZGJkNGU1MjdlNzMyMTExMTNlMmUyMzEyOWM2MDAzYmU1ZTlmZjkwYjAwfE5FV3w="]},"context":{}}
]
"source_url" => "/search/pins/?q=" . urlencode($search) . "&rs=typed",
"data" => json_encode(
[
"options" => [
"applied_unified_filters" => null,
"appliedProductFilters" => "---",
"article" => null,
"auto_correction_disabled" => false,
"corpus" => null,
"customized_rerank_type" => null,
"domains" => null,
"dynamicPageSizeExpGroup" => null,
"filters" => null,
"journey_depth" => null,
"page_size" => null,
"price_max" => null,
"price_min" => null,
"query_pin_sigs" => null,
"query" => $data["q"],
"redux_normalize_feed" => true,
"request_params" => null,
"rs" => "typed",
"scope" => "pins",
"selected_one_bar_modules" => null,
"source_id" => null,
"source_module_id" => null,
"source_url" => "/search/pins/?q=" . urlencode($search) . "&rs=typed",
"top_pin_id" => null,
"top_pin_ids" => null,
"bookmarks" => [
$data["bookmark"]
]
],
"context" => []
],
JSON_UNESCAPED_SLASHES
)
],
$cookies,
$search
);
];
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
}else{
@@ -81,27 +225,45 @@ class pinterest{
throw new Exception("Search term is empty!");
}
// https://ca.pinterest.com/resource/BaseSearchResource/get/?source_url=%2Fsearch%2Fpins%2F%3Feq%3Dhigurashi%26etslf%3D5966%26len%3D2%26q%3Dhigurashi%2520when%2520they%2520cry%26rs%3Dac&data=%7B%22options%22%3A%7B%22applied_unified_filters%22%3Anull%2C%22appliedProductFilters%22%3A%22---%22%2C%22article%22%3Anull%2C%22auto_correction_disabled%22%3Afalse%2C%22corpus%22%3Anull%2C%22customized_rerank_type%22%3Anull%2C%22domains%22%3Anull%2C%22dynamicPageSizeExpGroup%22%3Anull%2C%22filters%22%3Anull%2C%22journey_depth%22%3Anull%2C%22page_size%22%3Anull%2C%22price_max%22%3Anull%2C%22price_min%22%3Anull%2C%22query_pin_sigs%22%3Anull%2C%22query%22%3A%22higurashi%20when%20they%20cry%22%2C%22redux_normalize_feed%22%3Atrue%2C%22request_params%22%3Anull%2C%22rs%22%3A%22ac%22%2C%22scope%22%3A%22pins%22%2C%22selected_one_bar_modules%22%3Anull%2C%22source_id%22%3Anull%2C%22source_module_id%22%3Anull%2C%22source_url%22%3A%22%2Fsearch%2Fpins%2F%3Feq%3Dhigurashi%26etslf%3D5966%26len%3D2%26q%3Dhigurashi%2520when%2520they%2520cry%26rs%3Dac%22%2C%22top_pin_id%22%3Anull%2C%22top_pin_ids%22%3Anull%7D%2C%22context%22%3A%7B%7D%7D&_=1736116313987
// source_url=%2Fsearch%2Fpins%2F%3Feq%3Dhigurashi%26etslf%3D5966%26len%3D2%26q%3Dhigurashi%2520when%2520they%2520cry%26rs%3Dac
// &data=%7B%22options%22%3A%7B%22applied_unified_filters%22%3Anull%2C%22appliedProductFilters%22%3A%22---%22%2C%22article%22%3Anull%2C%22auto_correction_disabled%22%3Afalse%2C%22corpus%22%3Anull%2C%22customized_rerank_type%22%3Anull%2C%22domains%22%3Anull%2C%22dynamicPageSizeExpGroup%22%3Anull%2C%22filters%22%3Anull%2C%22journey_depth%22%3Anull%2C%22page_size%22%3Anull%2C%22price_max%22%3Anull%2C%22price_min%22%3Anull%2C%22query_pin_sigs%22%3Anull%2C%22query%22%3A%22higurashi%20when%20they%20cry%22%2C%22redux_normalize_feed%22%3Atrue%2C%22request_params%22%3Anull%2C%22rs%22%3A%22ac%22%2C%22scope%22%3A%22pins%22%2C%22selected_one_bar_modules%22%3Anull%2C%22source_id%22%3Anull%2C%22source_module_id%22%3Anull%2C%22source_url%22%3A%22%2Fsearch%2Fpins%2F%3Feq%3Dhigurashi%26etslf%3D5966%26len%3D2%26q%3Dhigurashi%2520when%2520they%2520cry%26rs%3Dac%22%2C%22top_pin_id%22%3Anull%2C%22top_pin_ids%22%3Anull%7D%2C%22context%22%3A%7B%7D%7D
// &_=1736116313987
$source_url = "/search/pins/?q=" . urlencode($search) . "&rs=" . urlencode($search);
$filter = [
"source_url" => "/search/pins/?q=" . urlencode($search),
"source_url" => $source_url,
"rs" => "typed",
"data" =>
json_encode(
[
"options" => [
"article" => null,
"applied_filters" => null,
"applied_unified_filters" => null,
"appliedProductFilters" => "---",
"auto_correction_disabled" => false,
"article" => null,
"corpus" => null,
"customized_rerank_type" => null,
"domains" => null,
"dynamicPageSizeExpGroup" => null,
"filters" => null,
"query" => $search,
"journey_depth" => null,
"page_size" => null,
"price_max" => null,
"price_min" => null,
"query_pin_sigs" => null,
"query" => $search,
"redux_normalize_feed" => true,
"rs" => "typed",
"request_params" => null,
"rs" => "ac",
"scope" => "pins", // pins, boards, videos,
"source_id" => null
"selected_one_bar_modules" => null,
"source_id" => null,
"source_module_id" => null,
"source_url" => $source_url,
"top_pin_id" => null,
"top_pin_ids" => null
],
"context" => []
]
@@ -110,24 +272,26 @@ class pinterest{
];
$proxy = $this->backend->get_ip();
}
try{
$json =
json_decode(
$cookies = [];
try{
$json =
$this->get(
$proxy,
"https://www.pinterest.ca/resource/BaseSearchResource/get/",
$filter
),
true
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
"https://ca.pinterest.com/resource/BaseSearchResource/get/",
$filter,
$cookies,
null
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to decode JSON");
@@ -139,6 +303,60 @@ class pinterest{
"image" => []
];
if(
!isset(
$json["resource_response"]
["status"]
)
){
throw new Exception("Unknown API failure");
}
if($json["resource_response"]["status"] != "success"){
$status = "Got non-OK response: " . $json["resource_response"]["status"];
if(
isset(
$json["resource_response"]["message"]
)
){
$status .= " - " . $json["resource_response"]["message"];
}
throw new Exception($status);
}
if(
isset(
$json["resource_response"]["sensitivity"]
["notices"][0]["description"]["text"]
)
){
throw new Exception(
"Pinterest returned a notice: " .
$json["resource_response"]["sensitivity"]["notices"][0]["description"]["text"]
);
}
// get NPT
if(isset($json["resource_response"]["bookmark"])){
$out["npt"] =
$this->backend->store(
json_encode([
"q" => $search,
"bookmark" => $json["resource_response"]["bookmark"],
"cookies" => $cookies
]),
"images",
$proxy
);
}
foreach(
$json
["resource_response"]
@@ -150,6 +368,7 @@ class pinterest{
switch($item["type"]){
case "pin":
case "board":
/*
Handle image object
@@ -206,42 +425,15 @@ class pinterest{
"height" => (int)$thumb["height"]
]
],
"url" => "https://www.pinterest.com/pin/" . $item["id"]
"url" =>
$item["link"] === null ?
"https://ca.pinterest.com/pin/" . $item["id"] :
$item["link"]
];
break;
case "board":
if(isset($item["cover_pin"]["image_url"])){
$image = [
"url" => $item["cover_pin"]["image_url"],
"width" => (int)$item["cover_pin"]["size"][0],
"height" => (int)$item["cover_pin"]["size"][1]
];
}elseif(isset($item["image_cover_url_hd"])){
/*
$image = [
"url" =>
"width" => null,
"height" => null
];*/
}
break;
}
}
return $out;
}
private function getfullresimage($image, $has_og){
$has_og = $has_og ? "1200x" : "originals";
return
preg_replace(
'/https:\/\/i\.pinimg\.com\/[^\/]+\//',
"https://i.pinimg.com/" . $has_og . "/",
$image
);
}
}

View File

@@ -410,10 +410,7 @@ class qwant{
"thumb" =>
$answer["data"]["result"]["thumbnail"]["landscape"] == null ?
null :
$this->unshitimage(
$answer["data"]["result"]["thumbnail"]["landscape"],
false
),
$this->unshitimage($answer["data"]["result"]["thumbnail"]["landscape"]),
"table" => [],
"sublink" => []
];
@@ -770,7 +767,7 @@ class qwant{
}else{
$thumb = [
"url" => $this->unshitimage($video["thumbnail"], false),
"url" => $this->unshitimage($video["thumbnail"]),
"ratio" => "16:9"
];
}
@@ -870,7 +867,7 @@ class qwant{
}else{
$thumb = [
"url" => $this->unshitimage($news["media"][0]["pict_big"]["url"], false),
"url" => $this->unshitimage($news["media"][0]["pict_big"]["url"]),
"ratio" => "16:9"
];
}
@@ -920,18 +917,77 @@ class qwant{
return trim($text, ". ");
}
private function unshitimage($url, $is_bing = true){
private function unshitimage($url){
// https://s1.qwant.com/thumbr/0x0/8/d/f6de4deb2c2b12f55d8bdcaae576f9f62fd58a05ec0feeac117b354d1bf5c2/th.jpg?u=https%3A%2F%2Fwww.bing.com%2Fth%3Fid%3DOIP.vvDWsagzxjoKKP_rOqhwrQAAAA%26w%3D160%26h%3D160%26c%3D7%26pid%3D5.1&q=0&b=1&p=0&a=0
parse_str(parse_url($url)["query"], $parts);
// https://s2.qwant.com/thumbr/474x289/7/f/412d13b3fe3a03eb2b89633c8e88b609b7d0b93cdd9a5e52db3c663e41e65e/th.jpg?u=https%3A%2F%2Ftse.mm.bing.net%2Fth%3Fid%3DOIP.9Tm_Eo6m7V7ltN19mxduDgHaEh%26pid%3DApi&q=0&b=1&p=0&a=0
if($is_bing){
$parse = parse_url($parts["u"]);
parse_str($parse["query"], $parts);
$image = parse_url($url);
if(
!isset($image["host"]) ||
!isset($image["query"])
){
return "https://" . $parse["host"] . "/th?id=" . urlencode($parts["id"]);
// cant do anything
return $url;
}
return $parts["u"];
$id = null;
if(
preg_match(
'/s[0-9]+\.qwant\.com$/',
$image["host"]
)
){
parse_str($image["query"], $str);
// we're being served a proxy URL
if(isset($str["u"])){
$bing_url = $str["u"];
}else{
// give up
return $url;
}
}
// parse bing URL
$id = null;
$image = parse_url($bing_url);
if(isset($image["query"])){
parse_str($image["query"], $str);
if(isset($str["id"])){
$id = $str["id"];
}
}
if($id === null){
$id = explode("/th/id/", $image["path"], 2);
if(count($id) !== 2){
// malformed
return $url;
}
$id = $id[1];
}
if(is_array($id)){
// fuck off, let proxy.php deal with it
return $url;
}
return "https://" . $image["host"] . "/th?id=" . rawurlencode($id);
}
}

541
scraper/sepiasearch.php Normal file
View File

@@ -0,0 +1,541 @@
<?php
class sepiasearch{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("sepiasearch");
}
public function getfilters($page){
return [
"nsfw" => [
"display" => "NSFW",
"option" => [
"yes" => "Yes", // &sensitiveContent=both
"no" => "No" // &sensitiveContent=false
]
],
"language" => [
"display" => "Language", // &language=
"option" => [
"any" => "Any language",
"en" => "English",
"fr" => "Français",
"ar" => "العربية",
"ca" => "Català",
"cs" => "Čeština",
"de" => "Deutsch",
"el" => "ελληνικά",
"eo" => "Esperanto",
"es" => "Español",
"eu" => "Euskara",
"fa" => "فارسی",
"fi" => "Suomi",
"gd" => "Gàidhlig",
"gl" => "Galego",
"hr" => "Hrvatski",
"hu" => "Magyar",
"is" => "Íslenska",
"it" => "Italiano",
"ja" => "日本語",
"kab" => "Taqbaylit",
"nl" => "Nederlands",
"no" => "Norsk",
"oc" => "Occitan",
"pl" => "Polski",
"pt" => "Português (Brasil)",
"pt-PT" => "Português (Portugal)",
"ru" => "Pусский",
"sk" => "Slovenčina",
"sq" => "Shqip",
"sv" => "Svenska",
"th" => "ไทย",
"tok" => "Toki Pona",
"tr" => "Türkçe",
"uk" => "украї́нська мо́ва",
"vi" => "Tiếng Việt",
"zh-Hans" => "简体中文(中国)",
"zh-Hant" => "繁體中文(台灣)"
]
],
"type" => [
"display" => "Result type", // i handle this
"option" => [
"videos" => "Videos",
"playlists" => "Playlists",
"channels" => "Channels"
]
],
"sort" => [
"display" => "Sort by",
"option" => [
"best" => "Best match", // no filter
"-publishedAt" => "Newest", // sort=-publishedAt
"publishedAt" => "Oldest" // sort=publishedAt
]
],
"newer" => [ // &startDate=2025-07-26T04:00:00.000Z
"display" => "Newer than",
"option" => "_DATE"
],
"duration" => [
"display" => "Duration",
"option" => [
"any" => "Any duration",
"short" => "Short (0-4mins)", // &durationRange=short
"medium" => "Medium (4-10 mins)",
"long" => "Long (10+ mins)",
]
],
"category" => [
"display" => "Category", // &categoryOneOf[]=
"option" => [
"any" => "Any category",
"1" => "Music",
"2" => "Films",
"3" => "Vehicles",
"4" => "Art",
"5" => "Sports",
"6" => "Travels",
"7" => "Gaming",
"8" => "People",
"9" => "Comedy",
"10" => "Entertainment",
"11" => "News & Politics",
"12" => "How To",
"13" => "Education",
"14" => "Activism",
"15" => "Science & Technology",
"16" => "Animals",
"17" => "Kids",
"18" => "Food"
]
],
"display" => [
"display" => "Display",
"option" => [
"any" => "Everything",
"true" => "Live videos", // &isLive=true
"false" => "VODs" // &isLive=false
]
],
"license" => [
"display" => "License", // &license=
"option" => [
"any" => "Any license",
"1" => "Attribution",
"2" => "Attribution - Share Alike",
"3" => "Attribution - No Derivatives",
"4" => "Attribution - Non Commercial",
"5" => "Attribution - Non Commercial - Share Alike",
"6" => "Attribution - Non Commercial - No Derivatives",
"7" => "Public Domain Dedication"
]
]
];
}
private function get($proxy, $url, $get = []){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt(
$curlproc,
CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: application/json, text/plain, */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Referer: https://sepiasearch.org/search",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"Priority: u=0",
"TE: trailers"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function video($get){
if($get["npt"]){
[$npt, $proxy] =
$this->backend
->get(
$get["npt"],
"videos"
);
$npt = json_decode($npt, true);
$type = $npt["type"];
$npt = $npt["npt"];
}else{
$proxy = $this->backend->get_ip();
$npt = [
"search" => $get["s"],
"start" => 0,
"count" => 20
];
if($get["type"] == "videos"){
//
// Parse video filters
//
switch($get["nsfw"]){
case "yes": $npt["nsfw"] = "both"; break;
case "no": $npt["nsfw"] = "false"; break;
}
$npt["boostLanguages[]"] = "en";
if($get["language"] != "any"){
$npt["languageOneOf[]"] = $get["language"];
}
if($get["sort"] != "best"){
$npt["sort"] = $get["sort"];
}
if($get["newer"] !== false){
$date = new DateTime("@{$get["newer"]}");
$date->setTimezone(new DateTimeZone("UTC"));
$formatted = $date->format("Y-m-d\TH:i:s.000\Z");
$npt["startDate"] = $formatted;
}
switch($get["duration"]){
case "short":
$npt["durationMax"] = 240;
break;
case "medium":
$npt["durationMin"] = 240;
$npt["durationMax"] = 600;
break;
case "long":
$npt["durationMin"] = 600;
break;
}
if($get["category"] != "any"){
$npt["categoryOneOf[]"] = $get["category"];
}
if($get["display"] != "any"){
$npt["isLive"] = $get["display"];
}
if($get["license"] != "any"){
// typo in license, lol
$npt["licenceOneOf[]"] = $get["license"];
}
}
$type = $get["type"];
}
switch($type){
case "videos":
$url = "https://sepiasearch.org/api/v1/search/videos";
break;
case "channels":
$url = "https://sepiasearch.org/api/v1/search/video-channels";
break;
case "playlists":
$url = "https://sepiasearch.org/api/v1/search/video-playlists";
break;
}
//$json = file_get_contents("scraper/sepia.json");
try{
$json =
$this->get(
$proxy,
$url,
$npt
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to parse JSON");
}
if(isset($json["errors"])){
$msg = [];
foreach($json["errors"] as $error){
if(isset($error["msg"])){
$msg[] = $error["msg"];
}
}
throw new Exception("Sepia Search returned error(s): " . implode(", ", $msg));
}
if(!isset($json["data"])){
throw new Exception("Sepia Search did not return a data object");
}
$out = [
"status" => "ok",
"npt" => null,
"video" => [],
"author" => [],
"livestream" => [],
"playlist" => [],
"reel" => []
];
switch($get["type"]){
case "videos":
foreach($json["data"] as $video){
if(count($video["account"]["avatars"]) !== 0){
$avatar =
$video["account"]["avatars"][count($video["account"]["avatars"]) - 1]["url"];
}else{
$avatar = null;
}
if($video["thumbnailUrl"] === null){
$thumb = [
"ratio" => null,
"url" => null
];
}else{
$thumb = [
"ratio" => "16:9",
"url" => $video["thumbnailUrl"]
];
}
if($video["isLive"]){
$append = "livestream";
}else{
$append = "video";
}
$out[$append][] = [
"title" => $video["name"],
"description" =>
$this->limitstrlen(
$this->titledots(
$video["description"]
)
),
"author" => [
"name" => $video["account"]["displayName"] . " ({$video["account"]["name"]})",
"url" => $video["account"]["url"],
"avatar" => $avatar
],
"date" => strtotime($video["publishedAt"]),
"duration" => $video["isLive"] ? "_LIVE" : $video["duration"],
"views" => $video["views"],
"thumb" => $thumb,
"url" => $video["url"]
];
}
break;
case "playlists":
foreach($json["data"] as $playlist){
if(count($playlist["ownerAccount"]["avatars"]) !== 0){
$avatar =
$playlist["ownerAccount"]["avatars"][count($playlist["ownerAccount"]["avatars"]) - 1]["url"];
}else{
$avatar = null;
}
if($playlist["thumbnailUrl"] === null){
$thumb = [
"ratio" => null,
"url" => null
];
}else{
$thumb = [
"ratio" => "16:9",
"url" => $playlist["thumbnailUrl"]
];
}
$out["playlist"][] = [
"title" => $playlist["displayName"],
"description" =>
$this->limitstrlen(
$this->titledots(
$playlist["description"]
)
),
"author" => [
"name" => $playlist["ownerAccount"]["displayName"] . " ({$playlist["ownerAccount"]["name"]})",
"url" => $playlist["ownerAccount"]["url"],
"avatar" => $avatar
],
"date" => strtotime($playlist["createdAt"]),
"duration" => $playlist["videosLength"],
"views" => null,
"thumb" => $thumb,
"url" => $playlist["url"]
];
}
break;
case "channels":
foreach($json["data"] as $channel){
if(count($channel["avatars"]) !== 0){
$thumb = [
"ratio" => "1:1",
"url" => $channel["avatars"][count($channel["avatars"]) - 1]["url"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["author"][] = [
"title" => $channel["displayName"] . " ({$channel["name"]})",
"followers" => $channel["followersCount"],
"description" =>
$channel["videosCount"] . " videos. " .
$this->limitstrlen(
$this->titledots(
$channel["description"]
)
),
"thumb" => $thumb,
"url" => $channel["url"]
];
}
break;
}
// get next page
if($json["total"] - 20 > $npt["start"]){
$npt["start"] += 20;
$npt = [
"type" => $get["type"],
"npt" => $npt
];
$out["npt"] =
$this->backend
->store(
json_encode($npt),
"videos",
$proxy
);
}
return $out;
}
private function titledots($title){
$substr = substr($title, -3);
if(
$substr == "..." ||
$substr == ""
){
return trim(substr($title, 0, -3), " \n\r\t\v\x00\0\x0B\xc2\xa0");
}
return trim($title, " \n\r\t\v\x00\0\x0B\xc2\xa0");
}
private function limitstrlen($text){
return
explode(
"\n",
wordwrap(
str_replace(
["\n\r", "\r\n", "\n", "\r"],
" ",
$text
),
300,
"\n"
),
2
)[0];
}
}

View File

@@ -1226,7 +1226,12 @@ class startpage{
// get results
foreach($json["render"]["presenter"]["regions"]["mainline"] as $category){
if($category["display_type"] == "video-youtube"){
if(
preg_match(
'/^video-/i',
$category["display_type"]
)
){
foreach($category["results"] as $video){
@@ -1248,7 +1253,7 @@ class startpage{
}
$out["video"][] = [
"title" => $video["title"],
"title" => str_replace(["", ""], "", $video["title"]),
"description" => $this->limitstrlen($video["description"]),
"author" => [
"name" => $video["channelTitle"],
@@ -1256,7 +1261,7 @@ class startpage{
"avatar" => null
],
"date" => strtotime($video["publishDate"]),
"duration" => $this->hms2int($video["duration"]),
"duration" => $this->hms2int($category["display_type"] == "video-youtube" ? $video["duration"] : $video["duration"] / 1000),
"views" => (int)$video["viewCount"],
"thumb" => $thumb,
"url" => $video["clickUrl"]

754
scraper/vimeo.php Normal file
View File

@@ -0,0 +1,754 @@
<?php
class vimeo{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("vimeo");
include "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
}
public function getfilters($page){
return [
"time" => [
"display" => "Date uploaded", // &filter_uploaded=
"option" => [
"any" => "Any time",
"today" => "Last 24 hours",
"this-week" => "Last 7 days",
"this-month" => "Last 30 days",
"this-year" => "Last 365 days",
]
],
"display" => [
"display" => "Display",
"option" => [
"video" => "Videos",
"ondemand" => "On-Demand ($$)",
"people" => "People",
"channel" => "Channels",
"group" => "Groups"
]
],
"sort" => [
"display" => "Sort by",
"option" => [
"relevance" => "Relevance", // no param
"recent" => "Newest", // &sort=latest&direction=desc
"popular" => "Most popular", // &sort=popularity&direction=desc
"a_z" => "Title, A to Z", // &sort=alphabetical&direction=asc
"z_a" => "Title, Z to A", // &sort=alphabetical&direction=desc
"longest" => "Longest", // &sort=duration&direction=desc
"shortest" => "Shortest", // &sort=duration&direction=asc
]
],
"duration" => [
"display" => "Duration", // &filter_duration=
"option" => [
"any" => "Any duration",
"short" => "Short (less than 4 minutes)",
"medium" => "Medium (4-10 minutes)",
"long" => "Long (over 10 minutes)"
]
],
"resolution" => [
"display" => "Resolution",
"option" => [
"any" => "Any resolution",
"4k" => "4K" // &filter_resolution=4k
]
],
"category" => [
"display" => "Category", // &filter_category=
"option" => [
"any" => "Any category",
"animation" => "Animation",
"comedy" => "Comedy",
"music" => "Music",
"experimental" => "Experimental",
"documentary" => "Documentary",
"identsandanimatedlogos" => "Idents and Animated Logos",
"industry" => "Industry",
"instructionals" => "Instructionals",
"narrative" => "Narrative",
"personal" => "Personal"
]
],
"live" => [
"display" => "Live events",
"option" => [
"any" => "Any",
"yes" => "Live now" // &filter_live=now
]
],
"hdr" => [
"display" => "HDR", // &filter_hdr=
"option" => [
"any" => "Any",
"hdr" => "Any HDR",
"dolby_vision" => "Dolby Vision",
"hdr10" => "HDR10",
"hdr10+" => "HDR10+"
]
],
"vimeo_360" => [
"display" => "Vimeo 360°", // &filter_vimeo_360
"option" => [
"any" => "Any",
"spatial" => "Spatial",
"360" => "360°"
]
],
"price" => [ // &filter_price=
"display" => "Price",
"option" => [
"any" => "Any price",
"free" => "Free",
"paid" => "Paid"
]
],
"collection" => [
"display" => "Vimeo collections",
"option" => [
"any" => "Any collection",
"staff_pick" => "Staff picks" // &filter_staffpicked=true
]
],
"license" => [ // &filter_license=
"display" => "License",
"option" => [
"any" => "Any license",
"by-nc-nd" => "CC BY-NC-ND",
"by" => "CC BY",
"by-nc" => "CC BY-NC",
"by-nc-sa" => "CC BY-NC-SA",
"by-nd" => "CC BY-ND",
"by-sa" => "CC BY-SA",
"cc0" => "CC0"
]
]
];
}
private function get($proxy, $url, $get = [], $jwt = false){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
if($jwt === false){
curl_setopt(
$curlproc,
CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
"Referer: https://vimeo.com/search",
"X-Requested-With: XMLHttpRequest",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"Priority: u=4"]
);
}else{
curl_setopt(
$curlproc,
CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: application/vnd.vimeo.*+json;version=3.3",
"Accept-Language: en",
"Accept-Encoding: gzip, deflate, br, zstd",
"Referer: https://vimeo.com/",
"Content-Type: application/json",
"Authorization: jwt $jwt",
"Vimeo-Page: /search/[[...slug]]",
"Origin: https://vimeo.com",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-site",
"Priority: u=4"]
);
}
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function video($get){
// parse shit
if($get["npt"]){
[$npt, $proxy] =
$this->backend
->get(
$get["npt"],
"videos"
);
$npt = json_decode($npt, true);
$pagetype = $npt["pagetype"];
$npt = $npt["npt"];
$jwt = $this->get_jwt($proxy);
try{
$json =
$this->get(
$proxy,
"https://api.vimeo.com" . $npt,
[],
$jwt
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
}else{
$proxy = null;
$jwt = $this->get_jwt($proxy); // this gives us a proxy by reference
// parse filters
$npt = [
"query" => $get["s"],
"page" => 1,
"per_page" => 24,
"facets" => "type"
];
switch($get["display"]){
case "video":
$npt["filter_type"] = "clip";
$npt["fields"] = "clip.name,stats.plays,clip.pictures,clip.user.name,clip.user.link,clip.user.pictures.sizes,clip.uri,clip.stats.plays,clip.duration,clip.created_time,clip.link,clip.description";
break;
case "ondemand":
$npt["filter_type"] = "ondemand";
$npt["sizes"] = "296x744";
$npt["fields"] = "ondemand.link,ondemand.name,ondemand.pictures.sizes,ondemand.metadata.interactions.buy,ondemand.metadata.interactions.rent,ondemand.uri";
break;
case "people":
$npt["filter_type"] = "people";
$npt["fetch_user_profile"] = "1";
$npt["fields"] = "people.name,people.location_details.formatted_address,people.metadata.public_videos.total,people.pictures.sizes,people.link,people.metadata.connections.followers.total,people.skills.name,people.skills.uri,people.background_video,people.uri";
break;
case "channel":
$npt["filter_type"] = "channel";
$npt["fields"] = "channel.name,channel.metadata.connections.users.total,channel.metadata.connections.videos.total,channel.pictures.sizes,channel.link,channel.uri";
break;
case "group":
$npt["filter_type"] = "group";
$npt["fields"] = "group.name,group.metadata.connections.users.total,group.metadata.connections.videos.total,group.pictures.sizes,group.link,group.uri";
break;
}
// only apply filters if we're searching for videos
if($get["display"] == "video"){
switch($get["sort"]){
case "relevance": break; // do nothing
case "recent":
$npt["sort"] = "latest";
$npt["direction"] = "desc";
break;
case "popular":
$npt["sort"] = "popularity";
$npt["direction"] = "desc";
break;
case "a_z":
$npt["sort"] = "alphabetical";
$npt["direction"] = "asc";
break;
case "z_a":
$npt["sort"] = "alphabetical";
$npt["direction"] = "desc";
break;
case "longest":
$npt["sort"] = "duration";
$npt["direction"] = "desc";
break;
case "shortest":
$npt["sort"] = "duration";
$npt["direction"] = "asc";
break;
}
if($get["time"] != "any"){
$npt["filter_uploaded"] = $get["time"];
}
if($get["duration"] != "any"){
$npt["filter_duration"] = $get["duration"];
}
if($get["resolution"] != "any"){
$npt["filter_resolution"] = $get["resolution"];
}
if($get["category"] != "any"){
$npt["filter_category"] = $get["category"];
}
if($get["live"] != "any"){
$npt["filter_live"] = "now";
}
if($get["hdr"] != "any"){
$npt["filter_hdr"] = $get["hdr"];
}
if($get["vimeo_360"] != "any"){
$npt["filter_vimeo_360"] = $get["vimeo_360"];
}
if($get["price"] != "any"){
$npt["filter_price"] = $get["price"];
}
if($get["collection"] == "staff_pick"){
$npt["filter_staffpicked"] = "true";
}
if($get["license"] != "any"){
$npt["filter_license"] = $get["license"];
}
}
$pagetype = $npt["filter_type"];
try{
$json =
$this->get(
$proxy,
"https://api.vimeo.com/search",
$npt,
$jwt
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to parse JSON");
}
$out = [
"status" => "ok",
"npt" => null,
"video" => [],
"author" => [],
"livestream" => [],
"playlist" => [],
"reel" => []
];
if(isset($json["error"])){
$error = $json["error"];
if(isset($json["developer_message"])){
$error .= " ({$json["developer_message"]})";
}
throw new Exception("Vimeo returned an error: " . $error);
}
if(!isset($json["data"])){
throw new Exception("Vimeo did not return a data object");
}
switch($pagetype){
case "clip":
foreach($json["data"] as $video){
$video = $video["clip"];
if(isset($video["user"]["pictures"]["sizes"])){
$avatar = $video["user"]["pictures"]["sizes"][count($video["user"]["pictures"]["sizes"]) - 1]["link"];
}else{
$avatar = null;
}
$out["video"][] = [
"title" => $video["name"],
"description" =>
$this->limitstrlen(
$video["description"]
),
"author" => [
"name" => $video["user"]["name"],
"url" => $video["user"]["link"],
"avatar" => $avatar
],
"date" => strtotime($video["created_time"]),
"duration" => (int)$video["duration"],
"views" => (int)$video["stats"]["plays"],
"thumb" => [
"ratio" => "16:9",
"url" => $video["pictures"]["base_link"]
],
"url" => $video["link"]
];
}
break;
case "ondemand":
foreach($json["data"] as $video){
$video = $video["ondemand"];
$description = [];
if(isset($video["metadata"]["interactions"]["rent"]["display_price"])){
$description[] = "Rent for " . $video["metadata"]["interactions"]["rent"]["display_price"];
}
if(isset($video["metadata"]["interactions"]["buy"]["display_price"])){
$description[] = "Buy for " . $video["metadata"]["interactions"]["buy"]["display_price"];
}
$description = implode(", ", $description);
$out["video"][] = [
"title" => $video["name"],
"description" => $description,
"author" => [
"name" => null,
"url" => null,
"avatar" => null
],
"date" => null,
"duration" => null,
"views" => null,
"thumb" => [
"ratio" => "9:16",
"url" => $video["pictures"]["sizes"][0]["link"]
],
"url" => $video["link"]
];
}
break;
case "people":
foreach($json["data"] as $user){
$user = $user["people"];
if(
isset($user["pictures"]["sizes"]) &&
count($user["pictures"]["sizes"]) !== 0
){
$thumb = [
"ratio" => "1:1",
"url" => $user["pictures"]["sizes"][count($user["pictures"]["sizes"]) - 1]["link"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["author"][] = [
"title" => $user["name"],
"followers" => (int)$user["metadata"]["connections"]["followers"]["total"],
"description" => $user["metadata"]["public_videos"]["total"] . " videos.",
"thumb" => $thumb,
"url" => $user["link"]
];
}
break;
case "channel":
case "group":
foreach($json["data"] as $channel){
$channel = $channel[$npt["filter_type"]];
if(
isset($channel["pictures"]["sizes"]) &&
count($channel["pictures"]["sizes"]) !== 0
){
$thumb = [
"ratio" => "16:9",
"url" => $channel["pictures"]["sizes"][count($channel["pictures"]["sizes"]) - 1]["link"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["author"][] = [
"title" => $channel["name"],
"followers" => (int)$channel["metadata"]["connections"]["users"]["total"],
"description" => $channel["metadata"]["connections"]["videos"]["total"] . " videos.",
"thumb" => $thumb,
"url" => $channel["link"]
];
}
break;
}
//
// get next page
//
if(
isset($json["paging"]["next"]) &&
is_string($json["paging"]["next"])
){
$out["npt"] =
$this->backend
->store(
json_encode([
"npt" => $json["paging"]["next"],
"pagetype" => $pagetype
]),
"videos",
$proxy
);
}
return $out;
}
private function get_jwt(&$proxy){
//
// get jwt token
// it's probably safe to cache this across proxies, cause the jwt doesnt contain an userID
// only an appID, whatever shit that is
// we can only cache it for 5 minutes though, otherwise vimeo cries about it
//
if($proxy === null){
$proxy = $this->backend->get_ip();
}
$jwt = apcu_fetch("vimeo_jwt");
if($jwt === false){
/*
$html =
$this->get(
$proxy,
"https://vimeo.com/search",
[],
false
);
$this->fuckhtml->load($html);
$captcha =
$this->fuckhtml
->getElementsByTagName(
"title"
);
if(
count($captcha) !== 0 &&
$this->fuckhtml
->getTextContent(
$captcha[0]
) == "Vimeo / CAPTCHA Challenge"
){
throw new Exception("Vimeo returned a Captcha");
}
$html =
explode(
'<script id="viewer-bootstrap" type="application/json">',
$html,
2
);
if(count($html) !== 2){
throw new Exception("Failed to find JWT json");
}
$jwt =
json_decode(
$this->fuckhtml
->extract_json(
$html[1]
),
true
);
if($jwt === null){
throw new Exception("Failed to decode JWT json");
}
if(!isset($jwt["jwt"])){
throw new Exception("Failed to grep JWT");
}
$jwt = $jwt["jwt"];
*/
try{
$json =
$this->get(
$proxy,
"https://vimeo.com/_next/jwt",
[],
false
);
}catch(Exception $error){
throw new Exception("Failed to fetch JWT token");
}
$this->fuckhtml->load($json);
$captcha =
$this->fuckhtml
->getElementsByTagName(
"title"
);
if(
count($captcha) !== 0 &&
$this->fuckhtml
->getTextContent(
$captcha[0]
) == "Vimeo / CAPTCHA Challenge"
){
throw new Exception("Vimeo returned a Captcha");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("The JWT object could not be decoded");
}
if(!isset($json["token"])){
throw new Exception("Vimeo did not return a JWT");
}
$jwt = $json["token"];
apcu_store("vimeo_jwt", $jwt, 300);
}
return $jwt;
}
private function titledots($title){
$substr = substr($title, -3);
if(
$substr == "..." ||
$substr == ""
){
return trim(substr($title, 0, -3), " \n\r\t\v\x00\0\x0B\xc2\xa0");
}
return trim($title, " \n\r\t\v\x00\0\x0B\xc2\xa0");
}
private function limitstrlen($text){
return
explode(
"\n",
wordwrap(
str_replace(
["\n\r", "\r\n", "\n", "\r"],
" ",
$text
),
300,
"\n"
),
2
)[0];
}
}

257
scraper/vsco.php Normal file
View File

@@ -0,0 +1,257 @@
<?php
class vsco{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("vsco");
}
public function getfilters($page){
return [];
}
private function get($proxy, $url, $get = [], $bearer = null){
$curlproc = curl_init();
if($get !== []){
$get_tmp = http_build_query($get);
$url .= "?" . $get_tmp;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
if($bearer === null){
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: same-origin",
"Sec-Fetch-User: ?1",
"Priority: u=0, i",
"TE: trailers"]
);
}else{
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: */*",
"Accept-Language: en-US",
"Accept-Encoding: gzip",
"Referer: https://vsco.co/search/images/" . urlencode($get["query"]),
"authorization: Bearer " . $bearer,
"content-type: application/json",
"x-client-build: 1",
"x-client-platform: web",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"Priority: u=0",
"TE: trailers"]
);
}
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
// http2 bypass
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function image($get){
if($get["npt"]){
[$data, $proxy] =
$this->backend->get(
$get["npt"], "images"
);
$data = json_decode($data, true);
}else{
$search = $get["s"];
if(strlen($search) === 0){
throw new Exception("Search term is empty!");
}
$proxy = $this->backend->get_ip();
// get bearer token
try{
$html =
$this->get(
$proxy,
"https://vsco.co/feed"
);
}catch(Exception $error){
throw new Exception("Failed to fetch feed page");
}
preg_match(
'/"tkn":"([A-z0-9]+)"/',
$html,
$bearer
);
if(!isset($bearer[1])){
throw new Exception("Failed to grep bearer token");
}
$data = [
"pagination" => [
"query" => $search,
"page" => 0,
"size" => 100
],
"bearer" => $bearer[1]
];
}
try{
$json =
$this->get(
$proxy,
"https://vsco.co/api/2.0/search/images",
$data["pagination"],
$data["bearer"]
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to decode JSON");
}
$out = [
"status" => "ok",
"npt" => null,
"image" => []
];
if(!isset($json["results"])){
throw new Exception("Failed to access results object");
}
foreach($json["results"] as $image){
$image_domain = parse_url("https://" . $image["responsive_url"], PHP_URL_HOST);
$thumbnail = explode($image_domain, $image["responsive_url"], 2)[1];
if(substr($thumbnail, 0, 3) != "/1/"){
$thumbnail =
preg_replace(
'/^\/[^\/]+/',
"",
$thumbnail
);
}
$thumbnail = "https://img.vsco.co/cdn-cgi/image/width=480,height=360" . $thumbnail;
$size =
$this->image_ratio(
(int)$image["dimensions"]["width"],
(int)$image["dimensions"]["height"]
);
$out["image"][] = [
"title" => $image["description"],
"source" => [
[
"url" => "https://" . $image["responsive_url"],
"width" => (int)$image["dimensions"]["width"],
"height" => (int)$image["dimensions"]["height"]
],
[
"url" => $thumbnail,
"width" => $size[0],
"height" => $size[1]
]
],
"url" => "https://" . $image["grid"]["domain"] . "/media/" . $image["imageId"]
];
}
// get NPT
$max_page = ceil($json["total"] / 100);
$data["pagination"]["page"]++;
if($max_page > $data["pagination"]["page"]){
$out["npt"] =
$this->backend->store(
json_encode($data),
"images",
$proxy
);
}
return $out;
}
private function image_ratio($width, $height){
$ratio = [
480 / $width,
360 / $height
];
if($ratio[0] < $ratio[1]){
$ratio = $ratio[0];
}else{
$ratio = $ratio[1];
}
return [
floor($width * $ratio),
floor($height * $ratio)
];
}
}

View File

@@ -14,7 +14,7 @@ class yandex{
// backend included in the scraper functions
}
private function get($proxy, $url, $get = [], $nsfw){
private function get($proxy, $url, $get = [], $nsfw, $get_cookie = 1){
$curlproc = curl_init();
@@ -25,19 +25,55 @@ class yandex{
curl_setopt($curlproc, CURLOPT_URL, $url);
// extract "i" cookie
if($get_cookie === 0){
$cookies_tmp = [];
curl_setopt($curlproc, CURLOPT_HEADERFUNCTION, function($curlproc, $header) use (&$cookies_tmp){
$length = strlen($header);
$header = explode(":", $header, 2);
if(trim(strtolower($header[0])) == "set-cookie"){
$cookie_tmp = explode("=", trim($header[1]), 2);
$cookies_tmp[trim($cookie_tmp[0])] =
explode(";", $cookie_tmp[1], 2)[0];
}
return $length;
});
}
switch($nsfw){
case "yes": $nsfw = "0"; break;
case "maybe": $nsfw = "1"; break;
case "no": $nsfw = "2"; break;
}
switch($get_cookie){
case 0:
$cookie = "";
break;
case 1:
$cookie = "Cookie: yp=" . (time() - 4000033) . ".szm.1:1920x1080:876x1000#" . time() . ".sp.family:" . $nsfw;
break;
default:
$cookie = "Cookie: i=" . $get_cookie;
}
$headers =
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding: gzip",
"Accept-Language: en-US,en;q=0.5",
"DNT: 1",
"Cookie: yp=1716337604.sp.family%3A{$nsfw}#1685406411.szm.1:1920x1080:1920x999",
$cookie,
"Referer: https://yandex.com/images/search",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
@@ -59,6 +95,17 @@ class yandex{
$data = curl_exec($curlproc);
if($get_cookie === 0){
if(isset($cookies_tmp["i"])){
return $cookies_tmp["i"];
}else{
throw new Exception("Failed to get Yandex clearance cookie");
}
}
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
@@ -217,6 +264,23 @@ class yandex{
// https://yandex.com/search/site/?text=minecraft&web=1&frame=1&v=2.0&searchid=3131712
// &within=777&from_day=26&from_month=8&from_year=2023&to_day=26&to_month=8&to_year=2023
// get clearance cookie
if(($cookie = apcu_fetch("yandexweb_cookie")) === false){
$proxy = $this->backend->get_ip();
$cookie =
$this->get(
$proxy,
"https://yandex.ru/support2/smart-captcha/ru/",
[],
false,
0
);
apcu_store("yandexweb_cookie", $cookie);
}
if($get["npt"]){
[$npt, $proxy] = $this->backend->get($get["npt"], "web");
@@ -226,7 +290,8 @@ class yandex{
$proxy,
"https://yandex.com" . $npt,
[],
"yes"
"yes",
$cookie
);
}else{
@@ -236,7 +301,7 @@ class yandex{
throw new Exception("Search term is empty!");
}
$proxy = $this->backend->get_ip();
$proxy = !isset($proxy) ? $this->backend->get_ip() : $proxy;
$lang = $get["lang"];
$older = $get["older"];
$newer = $get["newer"];
@@ -283,7 +348,8 @@ class yandex{
$proxy,
"https://yandex.com/search/site/",
$params,
"yes"
"yes",
$cookie
);
}catch(Exception $error){
@@ -314,6 +380,19 @@ class yandex{
$this->fuckhtml->load($html);
// Scrape page blocked error
$title =
$this->fuckhtml
->getElementsByTagName("title");
if(
count($title) !== 0 &&
$title[0]["innerHTML"] == "403"
){
throw new Exception("Yandex blocked this proxy or 4get instance.");
}
// get nextpage
$npt =
$this->fuckhtml
@@ -668,7 +747,6 @@ class yandex{
foreach($json["blocks"] as $block){
$html .= $block["html"];
// get next page
if(
isset($block["params"]["nextPageUrl"]) &&

View File

@@ -255,13 +255,6 @@ class yep{
// use http2
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
// set ciphers
curl_setopt(
$curlproc,
CURLOPT_SSL_CIPHER_LIST,
"aes_128_gcm_sha_256,chacha20_poly1305_sha_256,aes_256_gcm_sha_384,ecdhe_ecdsa_aes_128_gcm_sha_256,ecdhe_rsa_aes_128_gcm_sha_256,ecdhe_ecdsa_chacha20_poly1305_sha_256,ecdhe_rsa_chacha20_poly1305_sha_256,ecdhe_ecdsa_aes_256_gcm_sha_384,ecdhe_rsa_aes_256_gcm_sha_384,ecdhe_ecdsa_aes_256_sha,ecdhe_ecdsa_aes_128_sha,ecdhe_rsa_aes_128_sha,ecdhe_rsa_aes_256_sha,rsa_aes_128_gcm_sha_256,rsa_aes_256_gcm_sha_384,rsa_aes_128_sha,rsa_aes_256_sha"
);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
@@ -351,6 +344,7 @@ class yep{
"type" => "web"
]
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");

View File

@@ -1209,15 +1209,16 @@ class yt{
$reel =
$reel
->reelItemRenderer;
->shortsLockupViewModel;
array_push(
$this->out["reel"],
[
"title" =>
$reel
->headline
->simpleText,
->overlayMetadata
->primaryText
->content,
"description" => null,
"author" => [
"name" => null,
@@ -1225,30 +1226,22 @@ class yt{
"avatar" => null
],
"date" => null,
"duration" =>
$this->textualtime2int(
$reel
->accessibility
->accessibilityData
->label
),
"views" =>
$this->truncatedcount2int(
$reel
->viewCountText
->simpleText
),
"duration" => null,
"views" => null,
"thumb" => [
"url" =>
$reel
->thumbnail
->thumbnails[0]
->sources[0]
->url,
"ratio" => "9:16"
],
"url" =>
"https://www.youtube.com/watch?v=" .
$reel
->onTap
->innertubeCommand
->reelWatchEndpoint
->videoId
]
);

View File

@@ -133,6 +133,10 @@ $settings = [
"value" => "google",
"text" => "Google"
],
[
"value" => "google_cse",
"text" => "Google CSE"
],
[
"value" => "startpage",
"text" => "Startpage"
@@ -166,8 +170,12 @@ $settings = [
"text" => "Mojeek"
],
[
"value" => "solofield",
"text" => "Solofield"
"value" => "baidu",
"text" => "Baidu"
],
[
"value" => "coccoc",
"text" => "Cốc Cốc"
],
[
"value" => "marginalia",
@@ -203,6 +211,10 @@ $settings = [
"value" => "google",
"text" => "Google"
],
[
"value" => "google_cse",
"text" => "Google CSE"
],
[
"value" => "startpage",
"text" => "Startpage"
@@ -216,13 +228,29 @@ $settings = [
"text" => "Yep"
],
[
"value" => "solofield",
"text" => "Solofield"
"value" => "baidu",
"text" => "Baidu"
],
/*[
[
"value" => "pinterest",
"text" => "Pinterest"
],*/
],
[
"value" => "cara",
"text" => "Cara"
],
[
"value" => "flickr",
"text" => "Flickr"
],
[
"value" => "fivehpx",
"text" => "500px"
],
[
"value" => "vsco",
"text" => "VSCO"
],
[
"value" => "imgur",
"text" => "Imgur"
@@ -241,6 +269,14 @@ $settings = [
"value" => "yt",
"text" => "YouTube"
],
[
"value" => "vimeo",
"text" => "Vimeo"
],
[
"value" => "sepiasearch",
"text" => "Sepia Search"
],
[
"value" => "ddg",
"text" => "DuckDuckGo"
@@ -266,8 +302,12 @@ $settings = [
"text" => "Qwant"
],
[
"value" => "solofield",
"text" => "Solofield"
"value" => "baidu",
"text" => "Baidu"
],
[
"value" => "coccoc",
"text" => "Cốc Cốc"
]
]
],
@@ -302,6 +342,10 @@ $settings = [
[
"value" => "mojeek",
"text" => "Mojeek"
],
[
"value" => "baidu",
"text" => "Baidu"
]
]
],

View File

@@ -1,47 +1,45 @@
:root{
--1d2021: #1d2021;
--282828: #282828;
--3c3836: #3c3836;
--504945: #504945;
--1d2021:#1d2021;
--282828:#282828;
--3c3836:#3c3836;
--504945:#504945;
/* font */
--928374: #928374;
--a89984: #c9c5bf;
--bdae93: #bdae93;
--8ec07c: #8ec07c;
--ebdbb2: #ebdbb2;
--928374:#928374;
--a89984:#c9c5bf;
--bdae93:#bdae93;
--8ec07c:#8ec07c;
--ebdbb2:#ebdbb2;
}
body{
padding:15px 4% 40px;
margin:unset;
}
h1,h2,h3,h4,h5,h6{
h1, h2, h3, h4, h5, h6{
padding:0;
margin:0 0 7px 0;
line-height:initial;
color:var(--bdae93);
}
h3,h4,h5,h6{
h3, h4, h5, h6{
margin-bottom:14px;
}
/*
Web styles
Web styles
*/
.searchbox input[type="submit"]{
float:right;
cursor:pointer;
padding:0 10px;
border-left: 1px solid var(--504945);
background: #723c0b;
border-left:1px solid var(--504945);
background:#723c0b;
}
.searchbox input{
all:unset;
line-height:36px;
@@ -96,7 +94,6 @@ h3,h4,h5,h6{
display:inline-block;
}
.tabs .tab.selected{
border-bottom:2px solid #fc92a5;
}
@@ -106,7 +103,7 @@ h3,h4,h5,h6{
padding-bottom:12px;
padding-top:7px;
margin-bottom:7px;
background-color:#232525
background-color:#232525;
}
.filters .filter{
@@ -169,7 +166,6 @@ h3,h4,h5,h6{
font-size:12px;
}
.web .hover{
display:block;
text-decoration:none;
@@ -193,16 +189,13 @@ h3,h4,h5,h6{
color:#9760b1 !important;
}
.web .text-result .greentext{
font-size:14px;
color:var(--bdae93);
}
/* favicon */
.favicon-dropdown a{
text-decoration:none;
color:#d3d0c1;
@@ -211,39 +204,33 @@ h3,h4,h5,h6{
font-size:13px;
}
.web .favicon img,
.favicon-dropdown img{
.web .favicon img, .favicon-dropdown img{
margin:3px 7px 0 0;
height:16px;
font-size:12px;
line-height:16px;;
line-height:16px;
display:block;
text-align:left;
}
.web .sublinks{
padding:17px 10px;
font-size:15px;
color:var(--#928374);
}
.web .text-result .sublinks:last-child{
padding-bottom:0;
}
/* Wikipedia head */
.wiki-head{
padding:5px;
background-color: #322f2b
background-color:#322f2b;
}
/*
Images tab
Images tab
*/
#images{
@@ -257,17 +244,14 @@ h3,h4,h5,h6{
float:left;
}
#images .image .title{
white-space:nowrap;
overflow:hidden;
margin-bottom:7px;
font-weight:bold;
color:var(--bdae93);
color:var(--bdae93);
}
#popup-status{
display:none;
position:fixed;
@@ -280,43 +264,59 @@ h3,h4,h5,h6{
}
/*
Settings page
Settings page
*/
.web .settings-submit a{
margin-right:17px;
color:#bdae93;
}
/*
Responsive image
*/
@media only screen and (max-width:1454px){
#images .image-wrapper{
width:25%;
}
}
@media only screen and (max-width:1161px){
#images .image-wrapper{
width:25%;
}
}
@media only screen and (max-width:750px){
#images .image-wrapper{
width:50%;
}
}
@media only screen and (max-width:450px){
#images .image-wrapper{
width:100%;
}
}
/*
Responsive image
Responsive design
*/
@media only screen and (max-width: 1454px){ #images .image-wrapper{ width:25%; } }
@media only screen and (max-width: 1161px){ #images .image-wrapper{ width:25%; } }
@media only screen and (max-width: 750px){ #images .image-wrapper{ width:50%; } }
@media only screen and (max-width: 450px){ #images .image-wrapper{ width:100%; } }
/*
Responsive design
*/
@media only screen and (max-width: 1550px){
.web .left,
@media only screen and (max-width:1550px){
.web .left,
.searchbox{
width:60%;
}
}
@media only screen and (max-width: 1000px){
@media only screen and (max-width:1100px){
.web .left,
.searchbox{
width:100%;
}
}
.type{
color:var(--bdae93);
}
.type{
color:var(--bdae93);
}

17
static/themes/Kuuro.css Normal file
View File

@@ -0,0 +1,17 @@
:root{
--1d2021: #101010;
--282828: #1a1a1a;
--3c3836: #1e1e1e;
--504945: #232323;
--928374: #949494;
--a89984: #d2d2d2;
--bdae93: #d2d2d2;
--8ec07c: #99c794;
--ebdbb2: #d2d2d2;
--comment: #5f6364;
--default: #cccece;
--keyword: #c594c5;
--string: #99c794;
}

View File

@@ -89,7 +89,7 @@ if($results["spelling"]["type"] != "no_correction"){
'&' .
$frontend->buildquery($get, true) .
'&spellcheck=no">' .
$results["spelling"]["correction"] .
htmlspecialchars($results["spelling"]["correction"]) .
'</a>?' .
'</div>';
}