forked from lolcat/4get
Compare commits
80 Commits
Author | SHA1 | Date |
---|---|---|
|
2c4dc7da84 | |
|
5a0f5b868a | |
|
7cf403e125 | |
|
73b7922898 | |
|
336cb49d98 | |
|
8cd8e7380f | |
|
3d9d95db34 | |
|
eb73b1f357 | |
|
f7499294de | |
|
60f7150008 | |
|
2b8d90af12 | |
|
0f803804a4 | |
|
f43feff0aa | |
|
0bdd5e73df | |
|
430c0a2f0f | |
|
1a00bf8069 | |
|
502f6d12e4 | |
|
a2bc1e6190 | |
|
f73b5f0298 | |
|
3e1487e614 | |
|
037566bbba | |
|
b61bc6d07c | |
|
8d50667b0d | |
|
a0545b6006 | |
|
78aa2e198f | |
|
b85820cbcd | |
|
4b85841a3e | |
|
8d07e72dfe | |
|
566680fe36 | |
|
077692db49 | |
|
e4bf53cdaa | |
|
4489bb21e5 | |
|
ff8b1addf7 | |
|
3e2c3fc5d9 | |
|
49ddd1a216 | |
|
81ca8eaddc | |
|
c9c8d578f3 | |
|
b2203804c7 | |
|
13dfa9240c | |
|
0a53c3605a | |
|
36b0c570aa | |
|
47a7a2a224 | |
|
0180cf5224 | |
|
eed32a153c | |
|
f9f3c919d6 | |
|
4b0d8f75dc | |
|
033e4cb959 | |
|
91f621e105 | |
|
9f60900875 | |
|
631aa58565 | |
|
b892f90b13 | |
|
463ba0775f | |
|
cfad4fb035 | |
|
4e968b4b1c | |
|
81df52235c | |
|
1ca2626ad9 | |
|
9ca93f34c6 | |
|
0a43b9c849 | |
|
b636fec319 | |
|
774f7113df | |
|
0b3bbe0f15 | |
|
5f0b0a7b83 | |
|
920b9d5b3f | |
|
9cd369ac08 | |
|
e83865be49 | |
|
68dd7f29f6 | |
|
aaa30c79f5 | |
|
070f9d442b | |
|
9c18753ec3 | |
|
d8a729796e | |
|
2bbe5a29a9 | |
|
9ac195ac3b | |
|
d427a48ed4 | |
|
12d5b4ade8 | |
|
c422abbdc6 | |
|
85246cc7ec | |
|
d709d12111 | |
|
19f82a8536 | |
|
155a38d454 | |
|
6926e374af |
|
@ -0,0 +1 @@
|
|||
.git
|
|
@ -1,10 +1,9 @@
|
|||
FROM alpine:latest
|
||||
FROM alpine:3.21
|
||||
WORKDIR /var/www/html/4get
|
||||
|
||||
RUN apk update && apk upgrade
|
||||
RUN apk add php apache2-ssl php83-fileinfo php83-openssl php83-iconv php83-common php83-dom php83-sodium php83-curl curl php83-pecl-apcu php83-apache2 imagemagick php83-pecl-imagick php-mbstring imagemagick-webp imagemagick-jpeg
|
||||
RUN apk add php apache2-ssl php84-fileinfo php84-openssl php84-iconv php84-common php84-dom php84-sodium php84-curl curl php84-pecl-apcu php84-apache2 imagemagick php84-pecl-imagick php84-mbstring imagemagick-webp imagemagick-jpeg
|
||||
|
||||
COPY ./docker/apache/ /etc/apache2/
|
||||
COPY . .
|
||||
|
||||
RUN chmod 777 /var/www/html/4get/icons
|
||||
|
@ -14,4 +13,5 @@ EXPOSE 443
|
|||
|
||||
ENV FOURGET_PROTO=http
|
||||
|
||||
CMD ["./docker/docker-entrypoint.sh"]
|
||||
ENTRYPOINT ["./docker/docker-entrypoint.sh"]
|
||||
CMD ["start"]
|
47
README.md
47
README.md
|
@ -9,9 +9,11 @@ https://4get.ca/about
|
|||
## Official instance
|
||||
https://4get.ca , or visit the official instance list: https://4get.ca/instances
|
||||
|
||||
_NOT to be confused with 4get.ch, 4get.lol and friends! I **don't** host these._
|
||||
|
||||
## Totally unbiased comparison between alternatives
|
||||
|
||||
| | 4get | searx(ng) | libreY | araa | hearch |
|
||||
| | 4get | searx(ng) | libreY | araa | hearch.co |
|
||||
|----------------------------|-------------------------|-----------|-------------|-----------|-------------------|
|
||||
| RAM usage | 200-400mb~ | 2GB~ | 200-400mb~ | 2GB~ | idk |
|
||||
| Does it suck | no (debunked by snopes) | yes | yes | a little | better than searx |
|
||||
|
@ -23,36 +25,37 @@ https://4get.ca , or visit the official instance list: https://4get.ca/instances
|
|||
3. Bot protection that *actually* filters out the bots (when configured)
|
||||
4. Interface doesn't require javascript
|
||||
5. Favicon fetcher with caching support & image proxy
|
||||
6. Bunch of other shit
|
||||
6. Bunch of other shits
|
||||
|
||||
tl;dr the best way to actually browse for shit.
|
||||
tl;dr 4get is the best way to browse for shit.
|
||||
|
||||
# Supported websites
|
||||
|
||||
| Web | Images | Videos | News | Music | Autocompleter |
|
||||
|------------|--------------|------------|------------|------------|---------------|
|
||||
| DuckDuckGo | DuckDuckGo | YouTube | DuckDuckGo | Soundcloud | Brave |
|
||||
| Brave | Brave | DuckDuckGo | Brave | | DuckDuckGo |
|
||||
| Yandex | Yandex | Brave | Google | | Yandex |
|
||||
| Google | Google | Yandex | Startpage | | Google |
|
||||
| Startpage | Startpage | Google | Qwant | | Startpage |
|
||||
| Qwant | Qwant | Startpage | Mojeek | | Kagi |
|
||||
| Ghostery | Yep | Qwant | | | Qwant |
|
||||
| Yep | Solofield | Solofield | | | Ghostery |
|
||||
| Greppr | Imgur | | | | Yep |
|
||||
| Crowdview | FindThatMeme | | | | Marginalia |
|
||||
| Mwmbl | | | | | YouTube |
|
||||
| Mojeek | | | | | Soundcloud |
|
||||
| Solofield | | | | | |
|
||||
| Marginalia | | | | | |
|
||||
| wiby | | | | | |
|
||||
| Curlie | | | | | |
|
||||
| Web | Images | Videos | News | Music | Autocompleter |
|
||||
|------------|--------------|--------------|------------|------------|---------------|
|
||||
| DuckDuckGo | DuckDuckGo | YouTube | DuckDuckGo | Soundcloud | Brave |
|
||||
| Brave | Brave | Sepia Search | Brave | | DuckDuckGo |
|
||||
| Yandex | Yandex | DuckDuckGo | Google | | Yandex |
|
||||
| Google | Google | Brave | Startpage | | Google |
|
||||
| Startpage | Startpage | Yandex | Qwant | | Startpage |
|
||||
| Qwant | Qwant | Google | Mojeek | | Kagi |
|
||||
| Ghostery | Yep | Startpage | Baidu | | Qwant |
|
||||
| Yep | Baidu | Qwant | | | Ghostery |
|
||||
| Greppr | Pinterest | Baidu | | | Yep |
|
||||
| Crowdview | 500px | Coc Coc | | | Marginalia |
|
||||
| Mwmbl | VSCO | | | | YouTube |
|
||||
| Mojeek | Imgur | | | | Soundcloud |
|
||||
| Baidu | FindThatMeme | | | | |
|
||||
| Coc Coc | | | | | |
|
||||
| Marginalia | | | | | |
|
||||
| wiby | | | | | |
|
||||
| Curlie | | | | | |
|
||||
|
||||
# Installation
|
||||
Refer to the <a href="https://git.lolcat.ca/lolcat/4get/src/branch/master/docs/">documentation index</a>. I recommend following the <a href="https://git.lolcat.ca/lolcat/4get/src/branch/master/docs/apache2.md">apache2 guide</a>.
|
||||
|
||||
## Contact
|
||||
Shit breaks all the time but I repair it all the time too... Email me here: <b>will (at) lolcat.ca</b> or create an issue.
|
||||
Shit breaks all the time but I repair it all the time too. Email me here: <b>will (at) lolcat.ca</b> or create an issue.
|
||||
|
||||
## License
|
||||
AGPL
|
||||
|
|
21
api.txt
21
api.txt
|
@ -1,10 +1,17 @@
|
|||
__ __ __
|
||||
/ // / ____ ____ / /_
|
||||
/ // /_/ __ `/ _ \/ __/
|
||||
/__ __/ /_/ / __/ /_
|
||||
/_/ \__, /\___/\__/
|
||||
/____/
|
||||
|
||||
44
|
||||
4444444 44
|
||||
44444444 44444 444
|
||||
44444444 444444 444444444
|
||||
44444 44444444 444444444
|
||||
444444444 4444444
|
||||
4444444444 444444
|
||||
4444444444444
|
||||
444444444444444444
|
||||
444444444444444
|
||||
44444444
|
||||
4444
|
||||
44
|
||||
|
||||
+ Welcome to the 4get API documentation +
|
||||
|
||||
+ Terms of use
|
||||
|
|
|
@ -100,7 +100,6 @@ class config{
|
|||
"https://4get.sijh.net",
|
||||
"https://4get.hbubli.cc",
|
||||
"https://4get.plunked.party",
|
||||
"https://4get.seitan-ayoub.lol",
|
||||
"https://4get.etenie.pl",
|
||||
"https://4get.lunar.icu",
|
||||
"https://4get.dcs0.hu",
|
||||
|
@ -119,7 +118,7 @@ class config{
|
|||
|
||||
// Default user agent to use for scraper requests. Sometimes ignored to get specific webpages
|
||||
// Changing this might break things.
|
||||
const USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0";
|
||||
const USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:141.0) Gecko/20100101 Firefox/141.0";
|
||||
|
||||
// Proxy pool assignments for each scraper
|
||||
// false = Use server's raw IP
|
||||
|
@ -129,8 +128,12 @@ class config{
|
|||
const PROXY_BRAVE = false;
|
||||
const PROXY_FB = false; // facebook
|
||||
const PROXY_GOOGLE = false;
|
||||
const PROXY_GOOGLE_API = false;
|
||||
const PROXY_GOOGLE_CSE = false;
|
||||
const PROXY_STARTPAGE = false;
|
||||
const PROXY_QWANT = false;
|
||||
const PROXY_BAIDU = false;
|
||||
const PROXY_COCCOC = false;
|
||||
const PROXY_GHOSTERY = false;
|
||||
const PROXY_MARGINALIA = false;
|
||||
const PROXY_MOJEEK = false;
|
||||
|
@ -140,8 +143,13 @@ class config{
|
|||
const PROXY_WIBY = false;
|
||||
const PROXY_CURLIE = false;
|
||||
const PROXY_YT = false; // youtube
|
||||
const PROXY_SEPIASEARCH = false;
|
||||
const PROXY_YEP = false;
|
||||
const PROXY_PINTEREST = false;
|
||||
const PROXY_SANKAKUCOMPLEX = false;
|
||||
const PROXY_FLICKR = false;
|
||||
const PROXY_FIVEHPX = false;
|
||||
const PROXY_VSCO = false;
|
||||
const PROXY_SEZNAM = false;
|
||||
const PROXY_NAVER = false;
|
||||
const PROXY_GREPPR = false;
|
||||
|
@ -157,6 +165,9 @@ class config{
|
|||
// Scraper-specific parameters
|
||||
//
|
||||
|
||||
// GOOGLE CSE & GOOGLE API
|
||||
const GOOGLE_CX_ENDPOINT = "d4e68b99b876541f0";
|
||||
|
||||
// MARGINALIA
|
||||
// Use "null" to default out to HTML scraping OR specify a string to
|
||||
// use the API (Eg: "public"). API has less filters.
|
||||
|
|
|
@ -6,14 +6,15 @@ services:
|
|||
image: luuul/4get:latest
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- FOURGET_PROTO=http
|
||||
- FOURGET_SERVER_NAME=4get.ca
|
||||
- FOURGET_INSTANCES=https://4get.ca
|
||||
|
||||
ports:
|
||||
- "80:80"
|
||||
- "443:443"
|
||||
|
||||
volumes:
|
||||
- /etc/letsencrypt/live/domain.tld:/etc/4get/certs
|
||||
# mount custom banners and captcha
|
||||
- ./banners:/var/www/html/4get/banner
|
||||
- ./captcha:/var/www/html/4get/data/captcha
|
||||
# volumes:
|
||||
# - /etc/letsencrypt/live/domain.tld:/etc/4get/certs # mount ssl
|
||||
# - ./banners:/var/www/html/4get/banner # mount custom banners
|
||||
# - ./captcha:/var/www/html/4get/data/captcha # mount captcha images
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
# intentionally blank
|
|
@ -8,18 +8,27 @@ FOURGET_PROTO="${FOURGET_PROTO#\"}"
|
|||
# make lowercase
|
||||
FOURGET_PROTO=`echo $FOURGET_PROTO | awk '{print tolower($0)}'`
|
||||
|
||||
FOURGET_SRC='/var/www/html/4get'
|
||||
|
||||
mkdir -p /etc/apache2
|
||||
|
||||
if [ "$FOURGET_PROTO" = "https" ]; then
|
||||
echo "Using https configuration"
|
||||
cp /etc/apache2/https.conf /etc/apache2/httpd.conf
|
||||
cp -r ${FOURGET_SRC}/docker/apache/https/httpd.conf /etc/apache2
|
||||
cp -r ${FOURGET_SRC}/docker/apache/https/conf.d/* /etc/apache2/conf.d
|
||||
|
||||
else
|
||||
echo "Using http configuration"
|
||||
cp /etc/apache2/http.conf /etc/apache2/httpd.conf
|
||||
cp -r ${FOURGET_SRC}/docker/apache/http/httpd.conf /etc/apache2
|
||||
cp -r ${FOURGET_SRC}/docker/apache/http/conf.d/* /etc/apache2/conf.d
|
||||
fi
|
||||
|
||||
php ./docker/gen_config.php
|
||||
|
||||
|
||||
echo "4get is running"
|
||||
exec httpd -DFOREGROUND
|
||||
if [ "$@" = "start" ]; then
|
||||
echo "4get is running"
|
||||
exec httpd -DFOREGROUND
|
||||
else
|
||||
exec "$@"
|
||||
fi
|
||||
|
||||
|
|
259
docs/nginx.md
259
docs/nginx.md
|
@ -1,103 +1,194 @@
|
|||
# Install on NGINX
|
||||
<h1 align=center>Installation of 4get in NGINX</h1>
|
||||
|
||||
>I do NOT recommend following this guide, only follow this if you *really* need to use nginx. I recommend you use the apache2 steps instead.
|
||||
<div align=right>
|
||||
|
||||
Login as root.
|
||||
> NOTE: As the previous version stated, it is better to follow the <a href="https://git.lolcat.ca/lolcat/4get/src/branch/master/docs/apache2.md">Apache2 guide</a> instead of the Nginx one.
|
||||
|
||||
Create a file in `/etc/nginx/sites-avaliable/` called `4get.conf` or any name you want and put this into the file:
|
||||
> NOTE: This is going to guess that you're using either a <abbr title="(Arch Linux, Artix Linux, Endeavouros, etc...) ">Arch-based system</abbr> or a <abbr title="(Debian, Ubuntu, Devuan, etc...)">Debian-based system</abbr>, although you can still follow it with minor issues.
|
||||
|
||||
```
|
||||
server {
|
||||
# DO YOU REALLY NEED TO LOG SEARCHES?
|
||||
access_log /dev/null;
|
||||
error_log /dev/null;
|
||||
# Change this if you have 4get in other folder.
|
||||
root /var/www/4get;
|
||||
# Change yourdomain by your domain lol
|
||||
server_name www.yourdomain.com yourdomain.com;
|
||||
</div>
|
||||
|
||||
location @php {
|
||||
try_files $uri.php $uri/index.php =404;
|
||||
# Change the unix socket address if it's different for you.
|
||||
fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
|
||||
fastcgi_index index.php;
|
||||
# Change this to `fastcgi_params` if you use a debian based distro.
|
||||
include fastcgi.conf;
|
||||
fastcgi_intercept_errors on;
|
||||
}
|
||||
1. Login as root.
|
||||
2. Upgrade your system:
|
||||
* On Arch-based, run `pacman -Syu`.
|
||||
* On Debian-based, run `apt update`, then `apt upgrade`.
|
||||
3. Install the following dependencies:
|
||||
* `git`: So you can clone <a href="https://git.lolcat.ca/lolcat/4get">this</a> repository.
|
||||
* `nginx`: So you can run Nginx.
|
||||
* `php-fpm`: This is what allows Nginx to run *(and show)* PHP files.
|
||||
* `php-imagick`, `imagemagick`: Image manipulation.
|
||||
* `php-apcu`: Caching module.
|
||||
* `php-curl`, `curl`: Transferring data with URLs.
|
||||
* `php-mbstring`: String utils.
|
||||
* `certbot`, `certbot-nginx`: ACME client. Used to create SSL certificates.
|
||||
* In Arch-based distributions:
|
||||
* `pacman -S nginx certbot php-imagick certbot-nginx imagemagick curl php-apcu git`
|
||||
* In Debian-based distributions:
|
||||
* `apt install php-mbstring nginx certbot-nginx certbot php-imagick imagemagick php-curl curl php-apcu git`
|
||||
|
||||
location / {
|
||||
try_files $uri @php;
|
||||
}
|
||||
<div align=right>
|
||||
|
||||
location ~* ^(.*)\.php$ {
|
||||
return 301 $1;
|
||||
}
|
||||
> IMPORTANT: `php-curl`, `php-mbstring` might be a Debian-only package, but this needs further fact checking.
|
||||
|
||||
> IMPORTANT: If having issues with `php-apcu` or `libsodium`, go to [^1].
|
||||
|
||||
</div>
|
||||
|
||||
4. `cd` to `/etc/nginx` and make the `conf.d/` directory if it doesn't exist:
|
||||
* Again, this guesses you're logged in as root.
|
||||
```sh
|
||||
cd /etc/nginx
|
||||
ls -l conf.d/ # If ls shows conf.d, then it means it exists.
|
||||
# If it does not, run:
|
||||
mkdir conf.d
|
||||
```
|
||||
5. Make a file inside `conf.d/` called `4get.conf` and place the following content:
|
||||
* First run `touch conf.d/4get.conf` then `nano conf.d/4get.conf` to open the nano editor: *(Install it if it is not, or use another editor.)*
|
||||
```sh
|
||||
server {
|
||||
access_log /dev/null; # Search log file. Do you really need to?
|
||||
error_log /dev/null; # Error log file.
|
||||
|
||||
# Change this if you have 4get in another folder.
|
||||
root /var/www/4get;
|
||||
# Change 'yourdomain' to your domain.
|
||||
server_name www.yourdomain.com yourdomain.com;
|
||||
# Port to listen to.
|
||||
listen 80;
|
||||
}
|
||||
```
|
||||
|
||||
That is a very basic config so you will need to adapt it to your needs in case you have a more complicated nginx configuration. Anyways, you can see a real world example [here](https://git.zzls.xyz/Fijxu/etc-configs/src/branch/selfhost/nginx/sites-available/4get.zzls.xyz.conf)
|
||||
location @php {
|
||||
try_files $uri.php $uri/index.php =404;
|
||||
# Change the unix socket address if it's different for you.
|
||||
fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
|
||||
fastcgi_index index.php;
|
||||
# Change this to `fastcgi_params` if you use a debian based distribution.
|
||||
include fastcgi.conf;
|
||||
fastcgi_intercept_errors on;
|
||||
}
|
||||
|
||||
After you save the file you will need to do a symlink of the `4get.conf` file to `/etc/nignx/sites-enabled/`, you can do it with this command:
|
||||
location / {
|
||||
try_files $uri @php;
|
||||
}
|
||||
|
||||
```sh
|
||||
ln -s /etc/nginx/sites-available/4get.conf /etc/nginx/sites-available/4get.conf
|
||||
```
|
||||
location ~* ^(.*)\.php$ {
|
||||
return 301 $1;
|
||||
}
|
||||
|
||||
Now test the nginx config with `nginx -t`, if it says that everything is good, restart nginx using `systemctl restart nginx`
|
||||
|
||||
# Encryption setup
|
||||
|
||||
Generate a certificate for the domain using:
|
||||
|
||||
```sh
|
||||
certbot --nginx --key-type ecdsa -d www.yourdomain.com -d yourdomain.com
|
||||
```
|
||||
(Remember to install the nginx certbot plugin!!!)
|
||||
|
||||
After doing that certbot should deploy the certificate automatically into your 4get nginx config file. It should be ready to use at that point.
|
||||
|
||||
# Tor setup on NGINX
|
||||
|
||||
Important Note: Tor onion addresses are significantly longer than traditional domain names. Before proceeding with Nginx configuration, ensure you increase the `server_names_hash_bucket_size` value in your `nginx.conf` file. This setting in your Nginx configuration controls the internal data structure used to manage multiple server names (hostnames) associated with your web server. Each hostname requires a certain amount of memory within this structure. If the size is insufficient, Nginx will encounter errors.
|
||||
|
||||
1. Open your `nginx.conf` file (that is under `/etc/nginx/nginx.conf`).
|
||||
2. Find the line containing `# server_names_hash_bucket_size 64;`.
|
||||
3. Uncomment the line and adjust the value. Start with 64, but if you encounter issues, incrementally increase it (e.g., 128, 256) until it accommodates your configuration.
|
||||
|
||||
Open your current 4get NGINX config (that is under `/etc/nginx/sites-available/`) and append this to the end of the file:
|
||||
|
||||
```
|
||||
server {
|
||||
access_log /dev/null;
|
||||
error_log /dev/null;
|
||||
|
||||
listen 80;
|
||||
server_name <youronionaddress>;
|
||||
root /var/www/4get;
|
||||
|
||||
location @php {
|
||||
try_files $uri.php $uri/index.php =404;
|
||||
# Change the unix socket address if it's different for you.
|
||||
fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
|
||||
fastcgi_index index.php;
|
||||
# Change this to `fastcgi_params` if you use a debian based distro.
|
||||
include fastcgi.conf;
|
||||
fastcgi_intercept_errors on;
|
||||
}
|
||||
```
|
||||
* The above is a very basic configuration and thus will need tweaking to your personal needs. It should still work as-is, though. A 'real world' example is present in [^2].
|
||||
* After saving the file, check that the `nginx.conf` file inside the main directory includes files inside `conf.d/`:
|
||||
* It should be inside the the http block: *(The following is an example! Don't just Copy and Paste it!)*
|
||||
```sh
|
||||
http {
|
||||
include mime.types;
|
||||
include conf.d/*.conf;
|
||||
types_hash_max_size 4096;
|
||||
# ...
|
||||
}
|
||||
```
|
||||
* Now, test your configuration with `nginx -t`, if it says that everything is good, restart *(or start)* the Nginx daemon:
|
||||
* This depends on the init manager, most distributions use `systemd`, but it's better practice to include most.
|
||||
```sh
|
||||
# systemd
|
||||
systemctl stop nginx
|
||||
systemctl start nginxt
|
||||
# or
|
||||
systemctl restart nginx
|
||||
|
||||
# openrc
|
||||
rc-service nginx stop
|
||||
rc-service nginx start
|
||||
# or
|
||||
rc-service nginx restart
|
||||
|
||||
# runit
|
||||
sv down nginx
|
||||
sv up nginx
|
||||
# or
|
||||
sv restart nginx
|
||||
|
||||
# s6
|
||||
s6-rc -d change nginx
|
||||
s6-rc -u change nginx
|
||||
# or
|
||||
s6-svc -r /run/service/nginx
|
||||
|
||||
# dinit
|
||||
dinitctl stop nginx
|
||||
dinitctl start nginx
|
||||
# or
|
||||
dinitctl restart nginx
|
||||
```
|
||||
6. Clone the repository to `/var/www`:
|
||||
* `git clone --depth 1 https://git.lolcat.ca/lolcat/4get 4get` - It clones the repository with the depth of one commit *(so it takes less time to download)* and saves the cloned repository as '4get'.
|
||||
7. That should be it! There are some extra steps you can take, but it really just depends on you.
|
||||
|
||||
<h2 align=center>Encryption setup</h2>
|
||||
|
||||
1. Generate a certificate for the domain you're using with:
|
||||
* Note that `certbot-nginx` is needed.
|
||||
```sh
|
||||
certbot --nginx --key-type ecdsa -d www.yourdomain.com -d yourdomain.com
|
||||
```
|
||||
2. After that, certbot will deploy the certificate automatically to your 4get conf file; It should be ready to use from there.
|
||||
|
||||
<h2 align=center>Tor Setup</h2>
|
||||
|
||||
<div align=right>
|
||||
|
||||
> IMPORTANT: Tor onion addresses are very long compared to traditional domains, so, Before doing anything, edit `nginx.conf` and increase <abbr title="This setting in your Nginx configuration controls the internal data structure used to manage multiple server names (hostnames) associated with your web server. Each hostname requires a certain amount of memory within this structure. If the size is insufficient, Nginx will encounter errors."><code>server_names_hash_bucket_size</code></abbr> to your needs.
|
||||
|
||||
</div>
|
||||
|
||||
1. `cd` to `/etc/nginx` *(if you haven't)* and open your `nginx.conf` file.
|
||||
2. Find the line containing `# server_names_hash_bucket_size 64;` inside said file.
|
||||
3. Uncomment the line and adjust the value; start with 64, but if you encounter issues, incrementally increase it *(e.g., 128, 256)* until it accommodates your configuration.
|
||||
4. Open *(or duplicate the configuration)* and edit it:
|
||||
* Example configuration, again:
|
||||
```sh
|
||||
server {
|
||||
access_log /dev/null; # Search log file. Do you really need to?
|
||||
error_log /dev/null; # Error log file.
|
||||
|
||||
# Change this if you have 4get in another folder.
|
||||
root /var/www/4get;
|
||||
# Change 'onionadress.onion' to your onion link.
|
||||
server_name onionadress.onion;
|
||||
# Port to listen to.
|
||||
listen 80;
|
||||
|
||||
location @php {
|
||||
try_files $uri.php $uri/index.php =404;
|
||||
# Change the unix socket address if it's different for you.
|
||||
fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
|
||||
fastcgi_index index.php;
|
||||
# Change this to `fastcgi_params` if you use a debian based distribution.
|
||||
include fastcgi.conf;
|
||||
fastcgi_intercept_errors on;
|
||||
}
|
||||
|
||||
location / {
|
||||
try_files $uri @php;
|
||||
}
|
||||
|
||||
location ~* ^(.*)\.php$ {
|
||||
return 301 $1;
|
||||
}
|
||||
|
||||
location / {
|
||||
try_files $uri @php;
|
||||
}
|
||||
```
|
||||
A real world example is present in [^2].
|
||||
5. Once done, check the configuration with `nginx -t`. If everything's fine and dandy, refer to <a href="https://git.lolcat.ca/lolcat/4get/src/branch/master/docs/tor.md">the Tor guide</a> to setup your onion site.
|
||||
|
||||
location ~* ^(.*)\.php$ {
|
||||
return 301 $1;
|
||||
}
|
||||
}
|
||||
```
|
||||
<h2 align=center>Other important things</h2>
|
||||
|
||||
Obviously replace `<youronionaddress>` by the onion address of `/var/lib/tor/4get/hostname` and then check if the nginx config is valid with `nginx -t` if yes, then restart the nginx service and try opening the onion address into the Tor Browser. You can see a real world example [here](https://git.zzls.xyz/Fijxu/etc-configs/src/branch/selfhost/nginx/sites-available/4get.zzls.xyz.conf)
|
||||
1. <a href="https://git.lolcat.ca/lolcat/4get/src/branch/master/docs/configure.md">Configuration guide</a>: Things to do after setup.
|
||||
2. <a href="https://git.lolcat.ca/lolcat/4get/src/branch/master/docs/apache2.md">Apache2 guide</a>: Fallback to this if you couldn't get something to work, or you don't know something.
|
||||
|
||||
Once you did the above, refer to <a href="https://git.lolcat.ca/lolcat/4get/src/branch/master/docs/tor.md">this tor guide</a> to setup your onionsite.
|
||||
<h2 align=center>Known issues</h2>
|
||||
|
||||
1. https://git.lolcat.ca/lolcat/4get/issues
|
||||
|
||||
[^1]: lolcat/4get#40, If having issues with `libsodium`, or `php-apcu`.
|
||||
[^2]: <a href="https://git.nadeko.net/Fijxu/etc-configs/src/branch/selfhost/nginx/conf.d/4get.conf">git.nadeko.net</a> nadeko.net's 4get instance configuration.
|
|
@ -15,7 +15,12 @@ class favicon{
|
|||
|
||||
header("Content-Type: image/png");
|
||||
|
||||
if(substr_count($url, "/") !== 2){
|
||||
if(
|
||||
preg_match(
|
||||
'/^https?:\/\/[A-Za-z0-9.-]+$/',
|
||||
$url
|
||||
) === 0
|
||||
){
|
||||
|
||||
header("X-Error: Only provide the protocol and domain");
|
||||
$this->defaulticon();
|
||||
|
|
|
@ -0,0 +1,100 @@
|
|||
<?php
|
||||
|
||||
//
|
||||
// Reference
|
||||
// https://github.com/TecharoHQ/anubis/blob/ecc716940e34ebe7249974f2789a99a2c7115e4e/web/js/proof-of-work.mjs
|
||||
//
|
||||
|
||||
class anubis{
|
||||
|
||||
public function __construct(){
|
||||
|
||||
include_once "fuckhtml.php";
|
||||
$this->fuckhtml = new fuckhtml();
|
||||
}
|
||||
|
||||
public function scrape($html){
|
||||
|
||||
$this->fuckhtml->load($html);
|
||||
|
||||
$script =
|
||||
$this->fuckhtml
|
||||
->getElementById(
|
||||
"anubis_challenge",
|
||||
"script"
|
||||
);
|
||||
|
||||
if($script === false){
|
||||
|
||||
throw new Exception("Failed to scrape anubis challenge data");
|
||||
}
|
||||
|
||||
$script =
|
||||
json_decode(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$script
|
||||
),
|
||||
true
|
||||
);
|
||||
|
||||
if($script === null){
|
||||
|
||||
throw new Exception("Failed to decode anubis challenge data");
|
||||
}
|
||||
|
||||
if(
|
||||
!isset($script["challenge"]) ||
|
||||
!isset($script["rules"]["difficulty"]) ||
|
||||
!is_int($script["rules"]["difficulty"]) ||
|
||||
!is_string($script["challenge"])
|
||||
){
|
||||
|
||||
throw new Exception("Found invalid challenge data");
|
||||
}
|
||||
|
||||
return $this->rape($script["challenge"], $script["rules"]["difficulty"]);
|
||||
}
|
||||
|
||||
private function is_valid_hash($hash, $difficulty){
|
||||
|
||||
for ($i=0; $i<$difficulty; $i++) {
|
||||
|
||||
$index = (int)floor($i / 2);
|
||||
$nibble = $i % 2;
|
||||
|
||||
$byte = ord($hash[$index]);
|
||||
$nibble = ($byte >> ($nibble === 0 ? 4 : 0)) & 0x0f;
|
||||
|
||||
if($nibble !== 0){
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
public function rape($data, $difficulty = 5){
|
||||
|
||||
$nonce = 0;
|
||||
|
||||
while(true){
|
||||
|
||||
$hash_binary = hash("sha256", $data . $nonce, true);
|
||||
|
||||
if($this->is_valid_hash($hash_binary, $difficulty)){
|
||||
|
||||
$hash_hex = bin2hex($hash_binary);
|
||||
|
||||
return [
|
||||
"response" => $hash_hex,
|
||||
//"data" => $data,
|
||||
//"difficulty" => $difficulty,
|
||||
"nonce" => $nonce
|
||||
];
|
||||
}
|
||||
|
||||
$nonce++;
|
||||
}
|
||||
}
|
||||
}
|
|
@ -75,6 +75,7 @@ class backend{
|
|||
break;
|
||||
|
||||
case "socks5_hostname":
|
||||
case "socks5h":
|
||||
case "socks5a":
|
||||
curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5_HOSTNAME);
|
||||
curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port);
|
||||
|
|
|
@ -838,10 +838,10 @@ class frontend{
|
|||
}
|
||||
|
||||
$payload .=
|
||||
'<a href="https://webcache.googleusercontent.com/search?q=cache:' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://google.com" alt="go">Google cache</a>' .
|
||||
'<a href="https://web.archive.org/web/' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://archive.org" alt="ar">Archive.org</a>' .
|
||||
'<a href="https://archive.ph/newest/' . htmlspecialchars($link) . '" class="list" target="_BLANK"><img src="/favicon?s=https://archive.is" alt="ar">Archive.is</a>' .
|
||||
'<a href="https://ghostarchive.org/search?term=' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://ghostarchive.org" alt="gh">Ghostarchive</a>' .
|
||||
'<a href="https://arquivo.pt/wayback/' . htmlspecialchars($link) . '" class="list" target="_BLANK"><img src="/favicon?s=https://arquivo.pt" alt="ar">Arquivo.pt</a>' .
|
||||
'<a href="https://www.bing.com/search?q=url%3A' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://bing.com" alt="bi">Bing cache</a>' .
|
||||
'<a href="https://megalodon.jp/?url=' . $urlencode . '" class="list" target="_BLANK"><img src="/favicon?s=https://megalodon.jp" alt="me">Megalodon</a>' .
|
||||
'</div>';
|
||||
|
@ -939,6 +939,8 @@ class frontend{
|
|||
"brave" => "Brave",
|
||||
"yandex" => "Yandex",
|
||||
"google" => "Google",
|
||||
//"google_api" => "Google API",
|
||||
"google_cse" => "Google CSE",
|
||||
"startpage" => "Startpage",
|
||||
"qwant" => "Qwant",
|
||||
"ghostery" => "Ghostery",
|
||||
|
@ -947,7 +949,9 @@ class frontend{
|
|||
"crowdview" => "Crowdview",
|
||||
"mwmbl" => "Mwmbl",
|
||||
"mojeek" => "Mojeek",
|
||||
"solofield" => "Solofield",
|
||||
"baidu" => "Baidu",
|
||||
"coccoc" => "Cốc Cốc",
|
||||
//"solofield" => "Solofield",
|
||||
"marginalia" => "Marginalia",
|
||||
"wiby" => "wiby",
|
||||
"curlie" => "Curlie"
|
||||
|
@ -963,13 +967,19 @@ class frontend{
|
|||
"yandex" => "Yandex",
|
||||
"brave" => "Brave",
|
||||
"google" => "Google",
|
||||
"google_cse" => "Google CSE",
|
||||
"startpage" => "Startpage",
|
||||
"qwant" => "Qwant",
|
||||
"yep" => "Yep",
|
||||
"solofield" => "Solofield",
|
||||
//"pinterest" => "Pinterest",
|
||||
"baidu" => "Baidu",
|
||||
//"solofield" => "Solofield",
|
||||
"pinterest" => "Pinterest",
|
||||
"flickr" => "Flickr",
|
||||
"fivehpx" => "500px",
|
||||
"vsco" => "VSCO",
|
||||
"imgur" => "Imgur",
|
||||
"ftm" => "FindThatMeme"
|
||||
"ftm" => "FindThatMeme",
|
||||
//"sankakucomplex" => "SankakuComplex"
|
||||
]
|
||||
];
|
||||
break;
|
||||
|
@ -979,6 +989,7 @@ class frontend{
|
|||
"display" => "Scraper",
|
||||
"option" => [
|
||||
"yt" => "YouTube",
|
||||
"sepiasearch" => "Sepia Search",
|
||||
//"fb" => "Facebook videos",
|
||||
"ddg" => "DuckDuckGo",
|
||||
"brave" => "Brave",
|
||||
|
@ -986,7 +997,9 @@ class frontend{
|
|||
"google" => "Google",
|
||||
"startpage" => "Startpage",
|
||||
"qwant" => "Qwant",
|
||||
"solofield" => "Solofield"
|
||||
"baidu" => "Baidu",
|
||||
"coccoc" => "Cốc Cốc"
|
||||
//"solofield" => "Solofield"
|
||||
]
|
||||
];
|
||||
break;
|
||||
|
@ -1001,7 +1014,8 @@ class frontend{
|
|||
"startpage" => "Startpage",
|
||||
"qwant" => "Qwant",
|
||||
"yep" => "Yep",
|
||||
"mojeek" => "Mojeek"
|
||||
"mojeek" => "Mojeek",
|
||||
"baidu" => "Baidu"
|
||||
]
|
||||
];
|
||||
break;
|
||||
|
|
125
lib/fuckhtml.php
125
lib/fuckhtml.php
|
@ -240,12 +240,13 @@ class fuckhtml{
|
|||
public function getElementsByFuzzyAttributeValue(string $name, string $value, $collection = null){
|
||||
|
||||
$elems = $this->getElementsByAttributeName($name, $collection);
|
||||
|
||||
$value =
|
||||
explode(
|
||||
" ",
|
||||
trim(
|
||||
preg_replace(
|
||||
'/ +/',
|
||||
'/\s+/',
|
||||
" ",
|
||||
$value
|
||||
)
|
||||
|
@ -258,7 +259,18 @@ class fuckhtml{
|
|||
|
||||
foreach($elem["attributes"] as $attrib_name => $attrib_value){
|
||||
|
||||
$attrib_value = explode(" ", $attrib_value);
|
||||
$attrib_value =
|
||||
explode(
|
||||
" ",
|
||||
trim(
|
||||
preg_replace(
|
||||
'/\s+/',
|
||||
" ",
|
||||
$attrib_value
|
||||
)
|
||||
)
|
||||
);
|
||||
|
||||
$ac = count($attrib_value);
|
||||
$nc = count($value);
|
||||
$cr = 0;
|
||||
|
@ -381,6 +393,8 @@ class fuckhtml{
|
|||
$json_out = null;
|
||||
$last_char = null;
|
||||
|
||||
$keyword_check = null;
|
||||
|
||||
for($i=0; $i<strlen($json); $i++){
|
||||
|
||||
switch($json[$i]){
|
||||
|
@ -396,6 +410,7 @@ class fuckhtml{
|
|||
|
||||
$bracket = false;
|
||||
$is_close_bracket = true;
|
||||
|
||||
}else{
|
||||
|
||||
if($bracket === false){
|
||||
|
@ -429,6 +444,31 @@ class fuckhtml{
|
|||
$is_close_bracket === false
|
||||
){
|
||||
|
||||
// do keyword check
|
||||
$keyword_check .= $json[$i];
|
||||
|
||||
if(in_array($json[$i], [":", "{"])){
|
||||
|
||||
$keyword_check = substr($keyword_check, 0, -1);
|
||||
|
||||
if(
|
||||
preg_match(
|
||||
'/function|array|return/i',
|
||||
$keyword_check
|
||||
)
|
||||
){
|
||||
|
||||
$json_out =
|
||||
preg_replace(
|
||||
'/[{"]*' . preg_quote($keyword_check, "/") . '$/',
|
||||
"",
|
||||
$json_out
|
||||
);
|
||||
}
|
||||
|
||||
$keyword_check = null;
|
||||
}
|
||||
|
||||
// here we know we're not iterating over a quoted string
|
||||
switch($json[$i]){
|
||||
|
||||
|
@ -498,4 +538,85 @@ class fuckhtml{
|
|||
$string
|
||||
);
|
||||
}
|
||||
|
||||
public function extract_json($json){
|
||||
|
||||
$len = strlen($json);
|
||||
$array_level = 0;
|
||||
$object_level = 0;
|
||||
$in_quote = null;
|
||||
$start = null;
|
||||
|
||||
for($i=0; $i<$len; $i++){
|
||||
|
||||
switch($json[$i]){
|
||||
|
||||
case "[":
|
||||
if($in_quote === null){
|
||||
|
||||
$array_level++;
|
||||
if($start === null){
|
||||
|
||||
$start = $i;
|
||||
}
|
||||
}
|
||||
break;
|
||||
|
||||
case "]":
|
||||
if($in_quote === null){
|
||||
|
||||
$array_level--;
|
||||
}
|
||||
break;
|
||||
|
||||
case "{":
|
||||
if($in_quote === null){
|
||||
|
||||
$object_level++;
|
||||
if($start === null){
|
||||
|
||||
$start = $i;
|
||||
}
|
||||
}
|
||||
break;
|
||||
|
||||
case "}":
|
||||
if($in_quote === null){
|
||||
|
||||
$object_level--;
|
||||
}
|
||||
break;
|
||||
|
||||
case "\"":
|
||||
case "'":
|
||||
if(
|
||||
$i !== 0 &&
|
||||
$json[$i - 1] !== "\\"
|
||||
){
|
||||
// found a non-escaped quote
|
||||
|
||||
if($in_quote === null){
|
||||
|
||||
// open quote
|
||||
$in_quote = $json[$i];
|
||||
}elseif($in_quote === $json[$i]){
|
||||
|
||||
// close quote
|
||||
$in_quote = null;
|
||||
}
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
if(
|
||||
$start !== null &&
|
||||
$array_level === 0 &&
|
||||
$object_level === 0
|
||||
){
|
||||
|
||||
return substr($json, $start, $i - $start + 1);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
46
proxy.php
46
proxy.php
|
@ -34,22 +34,46 @@ try{
|
|||
)
|
||||
){
|
||||
|
||||
if(
|
||||
!isset($image["query"]) ||
|
||||
!isset($image["path"]) ||
|
||||
$image["path"] != "/th"
|
||||
){
|
||||
if(!isset($image["path"])){
|
||||
|
||||
header("X-Error: Invalid bing image path");
|
||||
header("X-Error: Missing bing image path");
|
||||
$proxy->do404();
|
||||
die();
|
||||
}
|
||||
|
||||
parse_str($image["query"], $str);
|
||||
|
||||
if(!isset($str["id"])){
|
||||
//
|
||||
// get image ID
|
||||
// formations:
|
||||
// https://tse2.mm.bing.net/th/id/OIP.3yLBkUPn8EXA1wlhWP2BHwHaE3
|
||||
// https://tse2.mm.bing.net/th?id=OIP.3yLBkUPn8EXA1wlhWP2BHwHaE3
|
||||
//
|
||||
$id = null;
|
||||
if(isset($image["query"])){
|
||||
|
||||
header("X-Error: Missing bing ID");
|
||||
parse_str($image["query"], $str);
|
||||
|
||||
if(isset($str["id"])){
|
||||
|
||||
$id = $str["id"];
|
||||
}
|
||||
}
|
||||
|
||||
if($id === null){
|
||||
|
||||
$id = explode("/th/id/", $image["path"], 2);
|
||||
|
||||
if(count($id) !== 2){
|
||||
|
||||
// malformed
|
||||
return $url;
|
||||
}
|
||||
|
||||
$id = $id[1];
|
||||
}
|
||||
|
||||
if(is_array($id)){
|
||||
|
||||
header("X-Error: Missing bing id parameter");
|
||||
$proxy->do404();
|
||||
die();
|
||||
}
|
||||
|
@ -63,7 +87,7 @@ try{
|
|||
case "cover": $req = "&w=207&h=270&p=0&qlt=90"; break;
|
||||
}
|
||||
|
||||
$proxy->stream_linear_image("https://" . $image["host"] . "/th?id=" . urlencode($str["id"]) . $req, "https://www.bing.com");
|
||||
$proxy->stream_linear_image("https://" . $image["host"] . "/th?id=" . rawurlencode($id) . $req, "https://www.bing.com");
|
||||
die();
|
||||
}
|
||||
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -210,6 +210,63 @@ class brave{
|
|||
return $data;
|
||||
}
|
||||
|
||||
private function get_js(){
|
||||
|
||||
$script_disc =
|
||||
$this->fuckhtml
|
||||
->getElementsByTagName(
|
||||
"script"
|
||||
);
|
||||
|
||||
$data = null;
|
||||
foreach($script_disc as &$discs){
|
||||
|
||||
if(
|
||||
preg_match(
|
||||
'/kit\.start\(/',
|
||||
$discs["innerHTML"]
|
||||
)
|
||||
){
|
||||
|
||||
$data =
|
||||
explode(
|
||||
"data:",
|
||||
$discs["innerHTML"],
|
||||
2
|
||||
);
|
||||
|
||||
if(count($data) !== 2){
|
||||
|
||||
throw new Exception("Failed to split up data field");
|
||||
}
|
||||
|
||||
$data = $data[1];
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if($data === null){
|
||||
|
||||
throw new Exception("Could not grep JavaScript object");
|
||||
}
|
||||
|
||||
$data =
|
||||
$this->fuckhtml
|
||||
->parseJsObject(
|
||||
$this->fuckhtml
|
||||
->extract_json(
|
||||
$data
|
||||
)
|
||||
);
|
||||
|
||||
if($data === null){
|
||||
|
||||
throw new Exception("Failed to decode JavaScript object");
|
||||
}
|
||||
|
||||
return $data;
|
||||
}
|
||||
|
||||
public function web($get){
|
||||
|
||||
if($get["npt"]){
|
||||
|
@ -293,8 +350,8 @@ class brave{
|
|||
/*
|
||||
$handle = fopen("scraper/brave.html", "r");
|
||||
$html = fread($handle, filesize("scraper/brave.html"));
|
||||
fclose($handle);
|
||||
*/
|
||||
fclose($handle);*/
|
||||
|
||||
|
||||
try{
|
||||
$html =
|
||||
|
@ -346,7 +403,7 @@ class brave{
|
|||
|
||||
$nextpage =
|
||||
$this->fuckhtml
|
||||
->getElementsByClassName("btn", "a");
|
||||
->getElementsByClassName("button", "a");
|
||||
|
||||
if(count($nextpage) !== 0){
|
||||
|
||||
|
@ -382,45 +439,9 @@ class brave{
|
|||
}
|
||||
}
|
||||
|
||||
// do some magic
|
||||
$this->fuckhtml->load($html);
|
||||
|
||||
$script_disc =
|
||||
$this->fuckhtml
|
||||
->getElementsByTagName(
|
||||
"script"
|
||||
);
|
||||
|
||||
$grep = [];
|
||||
foreach($script_disc as $discs){
|
||||
|
||||
preg_match(
|
||||
'/const data ?= ?(\[{.*}]);/',
|
||||
$discs["innerHTML"],
|
||||
$grep
|
||||
);
|
||||
|
||||
if(isset($grep[1])){
|
||||
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if(!isset($grep[1])){
|
||||
|
||||
throw new Exception("Could not grep JavaScript object");
|
||||
}
|
||||
|
||||
$data =
|
||||
$this->fuckhtml
|
||||
->parseJsObject(
|
||||
$grep[1]
|
||||
);
|
||||
unset($grep);
|
||||
|
||||
if($data === null){
|
||||
|
||||
throw new Exception("Failed to decode JavaScript object");
|
||||
}
|
||||
$data = $this->get_js();
|
||||
|
||||
if(
|
||||
isset($data[2]["data"]["title"]) &&
|
||||
|
@ -663,7 +684,10 @@ class brave{
|
|||
$table["Address"] = $result["location"]["postal_address"]["displayAddress"];
|
||||
}
|
||||
|
||||
if(isset($result["location"]["rating"])){
|
||||
if(
|
||||
isset($result["location"]["rating"]) &&
|
||||
$result["location"]["rating"] != "void 0"
|
||||
){
|
||||
|
||||
$table["Rating"] =
|
||||
$result["location"]["rating"]["ratingValue"] . "/" .
|
||||
|
@ -671,13 +695,19 @@ class brave{
|
|||
number_format($result["location"]["rating"]["reviewCount"]) . " votes)";
|
||||
}
|
||||
|
||||
if(isset($result["location"]["contact"]["telephone"])){
|
||||
if(
|
||||
isset($result["location"]["contact"]["telephone"]) &&
|
||||
$result["location"]["contact"]["telephone"] != "void 0"
|
||||
){
|
||||
|
||||
$table["Phone number"] =
|
||||
$result["location"]["contact"]["telephone"];
|
||||
}
|
||||
|
||||
if(isset($result["location"]["price_range"])){
|
||||
if(
|
||||
isset($result["location"]["price_range"]) &&
|
||||
$result["location"]["price_range"] != "void 0"
|
||||
){
|
||||
|
||||
$table["Price"] =
|
||||
$result["location"]["price_range"];
|
||||
|
@ -1160,23 +1190,8 @@ class brave{
|
|||
$proxy
|
||||
);
|
||||
|
||||
preg_match(
|
||||
'/const data ?= ?(\[{.*}]);/',
|
||||
$html,
|
||||
$json
|
||||
);
|
||||
|
||||
if(!isset($json[1])){
|
||||
|
||||
throw new Exception("Failed to grep javascript object");
|
||||
}
|
||||
|
||||
$json = $this->fuckhtml->parseJsObject($json[1], true);
|
||||
|
||||
if($json === null){
|
||||
|
||||
throw new Exception("Failed to parse javascript object");
|
||||
}
|
||||
$this->fuckhtml->load($html);
|
||||
$json = $this->get_js();
|
||||
|
||||
foreach(
|
||||
$json[1]["data"]["body"]["response"]["news"]["results"]
|
||||
|
@ -1258,22 +1273,8 @@ class brave{
|
|||
$html = fread($handle, filesize("scraper/brave-image.html"));
|
||||
fclose($handle);*/
|
||||
|
||||
preg_match(
|
||||
'/const data = (\[{.*}\]);/',
|
||||
$html,
|
||||
$json
|
||||
);
|
||||
|
||||
if(!isset($json[1])){
|
||||
|
||||
throw new Exception("Failed to get data object");
|
||||
}
|
||||
|
||||
$json =
|
||||
$this->fuckhtml
|
||||
->parseJsObject(
|
||||
$json[1]
|
||||
);
|
||||
$this->fuckhtml->load($html);
|
||||
$json = $this->get_js();
|
||||
|
||||
foreach(
|
||||
$json[1]
|
||||
|
@ -1403,22 +1404,8 @@ class brave{
|
|||
$html = fread($handle, filesize("scraper/brave-video.html"));
|
||||
fclose($handle);*/
|
||||
|
||||
preg_match(
|
||||
'/const data = (\[{.*}\]);/',
|
||||
$html,
|
||||
$json
|
||||
);
|
||||
|
||||
if(!isset($json[1])){
|
||||
|
||||
throw new Exception("Failed to get data object");
|
||||
}
|
||||
|
||||
$json =
|
||||
$this->fuckhtml
|
||||
->parseJsObject(
|
||||
$json[1]
|
||||
);
|
||||
$this->fuckhtml->load($html);
|
||||
$json = $this->get_js();
|
||||
|
||||
foreach(
|
||||
$json
|
||||
|
@ -1790,42 +1777,57 @@ class brave{
|
|||
|
||||
$nextpage =
|
||||
$this->fuckhtml
|
||||
->getElementsByClassName("btn", "a");
|
||||
->getElementById(
|
||||
"pagination",
|
||||
"div"
|
||||
);
|
||||
|
||||
if(count($nextpage) !== 0){
|
||||
if($nextpage){
|
||||
|
||||
$this->fuckhtml->load($nextpage);
|
||||
|
||||
$nextpage =
|
||||
$nextpage[count($nextpage) - 1];
|
||||
|
||||
if(
|
||||
strtolower(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$nextpage
|
||||
)
|
||||
) == "next"
|
||||
){
|
||||
|
||||
preg_match(
|
||||
'/offset=([0-9]+)/',
|
||||
$this->fuckhtml->getTextContent($nextpage["attributes"]["href"]),
|
||||
$nextpage
|
||||
$this->fuckhtml
|
||||
->getElementsByClassName(
|
||||
"button",
|
||||
"a"
|
||||
);
|
||||
|
||||
if(count($nextpage) !== 0){
|
||||
|
||||
return
|
||||
$this->backend->store(
|
||||
json_encode(
|
||||
[
|
||||
"q" => $q,
|
||||
"offset" => (int)$nextpage[1],
|
||||
"nsfw" => $nsfw,
|
||||
"country" => $country,
|
||||
"spellcheck" => $spellcheck
|
||||
]
|
||||
),
|
||||
$page,
|
||||
$proxy
|
||||
$nextpage =
|
||||
$nextpage[count($nextpage) - 1];
|
||||
|
||||
if(
|
||||
strtolower(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$nextpage
|
||||
)
|
||||
) == "next"
|
||||
){
|
||||
|
||||
preg_match(
|
||||
'/offset=([0-9]+)/',
|
||||
$this->fuckhtml->getTextContent($nextpage["attributes"]["href"]),
|
||||
$nextpage
|
||||
);
|
||||
|
||||
return
|
||||
$this->backend->store(
|
||||
json_encode(
|
||||
[
|
||||
"q" => $q,
|
||||
"offset" => (int)$nextpage[1],
|
||||
"nsfw" => $nsfw,
|
||||
"country" => $country,
|
||||
"spellcheck" => $spellcheck
|
||||
]
|
||||
),
|
||||
$page,
|
||||
$proxy
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -0,0 +1,672 @@
|
|||
<?php
|
||||
|
||||
class coccoc{
|
||||
|
||||
public function __construct(){
|
||||
|
||||
include "lib/backend.php";
|
||||
$this->backend = new backend("coccoc");
|
||||
|
||||
include "lib/fuckhtml.php";
|
||||
$this->fuckhtml = new fuckhtml();
|
||||
}
|
||||
|
||||
|
||||
private function get($proxy, $url, $get = []){
|
||||
|
||||
$curlproc = curl_init();
|
||||
|
||||
if($get !== []){
|
||||
$get = http_build_query($get);
|
||||
$url .= "?" . $get;
|
||||
}
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_URL, $url);
|
||||
|
||||
// http2 bypass
|
||||
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER, [
|
||||
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip, deflate, br, zstd",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Connection: keep-alive",
|
||||
//"Cookie: _contentAB_15040_vi=V-06_01; split_test_search=new_search; uid=L_bauXyZBY1B; vid=uCVQJQSTgb9QGT3o; ls=1753742684; serp_version=29223843/7621a70; savedS=direct",
|
||||
"Upgrade-Insecure-Requests: 1",
|
||||
"Sec-Fetch-Dest: document",
|
||||
"Sec-Fetch-Mode: navigate",
|
||||
"Sec-Fetch-Site: cross-site",
|
||||
"Priority: u=0, i"
|
||||
]);
|
||||
|
||||
$this->backend->assign_proxy($curlproc, $proxy);
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
|
||||
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
|
||||
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
|
||||
|
||||
$data = curl_exec($curlproc);
|
||||
|
||||
if(curl_errno($curlproc)){
|
||||
throw new Exception(curl_error($curlproc));
|
||||
}
|
||||
|
||||
curl_close($curlproc);
|
||||
return $data;
|
||||
}
|
||||
|
||||
public function getfilters($pagetype){
|
||||
|
||||
return [
|
||||
"nsfw" => [
|
||||
"display" => "NSFW",
|
||||
"option" => [
|
||||
"yes" => "Yes", // nsfw by default????
|
||||
"no" => "No" // &safe=1
|
||||
]
|
||||
],
|
||||
"time" => [
|
||||
"display" => "Time posted",
|
||||
"option" => [
|
||||
"any" => "Any time",
|
||||
"1w" => "1 week ago",
|
||||
"2w" => "2 weeks ago",
|
||||
"1m" => "1 month ago",
|
||||
"3m" => "3 months ago",
|
||||
"6m" => "6 months ago",
|
||||
"1Y" => "1 year ago"
|
||||
]
|
||||
],
|
||||
"filter" => [
|
||||
"display" => "Remove duplicates",
|
||||
"option" => [
|
||||
"no" => "No",
|
||||
"yes" => "Yes" // &filter=0
|
||||
]
|
||||
]
|
||||
];
|
||||
}
|
||||
|
||||
public function web($get){
|
||||
|
||||
if($get["npt"]){
|
||||
|
||||
[$query, $proxy] =
|
||||
$this->backend->get(
|
||||
$get["npt"],
|
||||
"web"
|
||||
);
|
||||
|
||||
$query = json_decode($query, true);
|
||||
}else{
|
||||
|
||||
$proxy = $this->backend->get_ip();
|
||||
|
||||
$query = [
|
||||
"query" => $get["s"]
|
||||
];
|
||||
|
||||
// add filters
|
||||
if($get["nsfw"] == "no"){
|
||||
|
||||
$query["safe"] = 1;
|
||||
}
|
||||
|
||||
if($get["time"] != "any"){
|
||||
|
||||
$query["tbs"] = $get["time"];
|
||||
}
|
||||
|
||||
if($get["filter"] == "yes"){
|
||||
|
||||
$query["filter"] = 0;
|
||||
}
|
||||
}
|
||||
|
||||
try{
|
||||
|
||||
$html =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://coccoc.com/search",
|
||||
$query
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to get search page");
|
||||
}
|
||||
//$html = file_get_contents("scraper/coccoc.html");
|
||||
|
||||
|
||||
$html = explode("window.composerResponse", $html, 2);
|
||||
|
||||
if(count($html) !== 2){
|
||||
|
||||
throw new Exception("Failed to grep window.composerResponse");
|
||||
}
|
||||
|
||||
$html =
|
||||
json_decode(
|
||||
$this->fuckhtml
|
||||
->extract_json(
|
||||
ltrim($html[1], " =")
|
||||
),
|
||||
true
|
||||
);
|
||||
|
||||
if($html === null){
|
||||
|
||||
throw new Exception("Failed to decode JSON");
|
||||
}
|
||||
|
||||
if(!isset($html["search"]["search_results"])){
|
||||
|
||||
throw new Exception("Coc Coc did not return a search_results object");
|
||||
}
|
||||
|
||||
$out = [
|
||||
"status" => "ok",
|
||||
"spelling" => [
|
||||
"type" => "no_correction",
|
||||
"using" => null,
|
||||
"correction" => null
|
||||
],
|
||||
"npt" => null,
|
||||
"answer" => [],
|
||||
"web" => [],
|
||||
"image" => [],
|
||||
"video" => [],
|
||||
"news" => [],
|
||||
"related" => []
|
||||
];
|
||||
|
||||
// word correction
|
||||
foreach($html["top"] as $element){
|
||||
|
||||
if(isset($element["spellChecker"][0]["query"])){
|
||||
|
||||
$out["spelling"] = [
|
||||
"type" => "not_many",
|
||||
"using" => $html["search"]["query"],
|
||||
"correction" => $element["spellChecker"][0]["query"]
|
||||
];
|
||||
}
|
||||
}
|
||||
|
||||
foreach($html["search"]["search_results"] as $result){
|
||||
|
||||
if(isset($result["type"])){
|
||||
|
||||
switch($result["type"]){
|
||||
|
||||
//
|
||||
// Related searches
|
||||
//
|
||||
case "related_queries":
|
||||
$out["related"] = $result["queries"];
|
||||
continue 2;
|
||||
|
||||
//
|
||||
// Videos
|
||||
//
|
||||
case "video_hits":
|
||||
foreach($result["results"] as $video){
|
||||
|
||||
if(
|
||||
isset($video["image_url"]) &&
|
||||
!empty($video["image_url"])
|
||||
){
|
||||
|
||||
$thumb = [
|
||||
"ratio" => "16:9",
|
||||
"url" => $video["image_url"]
|
||||
];
|
||||
}else{
|
||||
|
||||
$thumb = [
|
||||
"ratio" => null,
|
||||
"url" => null
|
||||
];
|
||||
}
|
||||
|
||||
$out["video"][] = [
|
||||
"title" =>
|
||||
$this->titledots(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$video["title"]
|
||||
)
|
||||
),
|
||||
"description" => null,
|
||||
"author" => [
|
||||
"name" => $video["uploader"],
|
||||
"url" => null,
|
||||
"avatar" => null
|
||||
],
|
||||
"date" => (int)$video["date"],
|
||||
"duration" => (int)$video["duration"],
|
||||
"views" => null,
|
||||
"thumb" => $thumb,
|
||||
"url" => $video["url"]
|
||||
];
|
||||
}
|
||||
continue 2;
|
||||
}
|
||||
}
|
||||
|
||||
if(
|
||||
!isset($result["title"]) ||
|
||||
!isset($result["url"])
|
||||
){
|
||||
|
||||
// should not happen
|
||||
continue;
|
||||
}
|
||||
|
||||
if(isset($result["rich"]["data"]["image_url"])){
|
||||
|
||||
$thumb = [
|
||||
"url" => $result["rich"]["data"]["image_url"],
|
||||
"ratio" => "16:9"
|
||||
];
|
||||
}else{
|
||||
|
||||
$thumb = [
|
||||
"url" => null,
|
||||
"ratio" => null
|
||||
];
|
||||
}
|
||||
|
||||
$sublinks = [];
|
||||
|
||||
if(isset($result["rich"]["data"]["linked_docs"])){
|
||||
|
||||
foreach($result["rich"]["data"]["linked_docs"] as $sub){
|
||||
|
||||
$sublinks[] = [
|
||||
"title" =>
|
||||
$this->titledots(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$sub["title"]
|
||||
)
|
||||
),
|
||||
"description" =>
|
||||
$this->titledots(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$sub["content"]
|
||||
)
|
||||
),
|
||||
"date" => null,
|
||||
"url" => $sub["url"]
|
||||
];
|
||||
}
|
||||
}
|
||||
|
||||
// get date
|
||||
if(isset($result["date"])){
|
||||
|
||||
$date = (int)$result["date"];
|
||||
}else{
|
||||
|
||||
$date = null;
|
||||
}
|
||||
|
||||
// probe for metadata
|
||||
$table = [];
|
||||
|
||||
if(isset($result["rich"]["data"]["rating"])){
|
||||
|
||||
$table["Rating"] = $result["rich"]["data"]["rating"];
|
||||
|
||||
if(isset($result["rich"]["data"]["num_rating"])){
|
||||
|
||||
$table["Rating"] .= " (" . number_format($result["rich"]["data"]["num_rating"]) . " ratings)";
|
||||
}
|
||||
}
|
||||
|
||||
if(isset($result["rich"]["data"]["views"])){
|
||||
|
||||
$table["Views"] = number_format($result["rich"]["data"]["views"]);
|
||||
}
|
||||
|
||||
if(isset($result["rich"]["data"]["duration"])){
|
||||
|
||||
$table["Duration"] = $this->int2hms($result["rich"]["data"]["duration"]);
|
||||
}
|
||||
|
||||
if(isset($result["rich"]["data"]["channel_name"])){
|
||||
|
||||
$table["Author"] = $result["rich"]["data"]["channel_name"];
|
||||
}
|
||||
|
||||
if(isset($result["rich"]["data"]["video_quality"])){
|
||||
|
||||
$table["Quality"] = $result["rich"]["data"]["video_quality"];
|
||||
}
|
||||
|
||||
if(isset($result["rich"]["data"]["category"])){
|
||||
|
||||
$table["Category"] = $result["rich"]["data"]["category"];
|
||||
}
|
||||
|
||||
$out["web"][] = [
|
||||
"title" =>
|
||||
$this->titledots(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$result["title"]
|
||||
)
|
||||
),
|
||||
"description" =>
|
||||
$this->titledots(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$result["content"]
|
||||
)
|
||||
),
|
||||
"url" => $result["url"],
|
||||
"date" => $date,
|
||||
"type" => "web",
|
||||
"thumb" => $thumb,
|
||||
"sublink" => $sublinks,
|
||||
"table" => $table
|
||||
];
|
||||
}
|
||||
|
||||
//
|
||||
// Get wikipedia head
|
||||
//
|
||||
if(isset($html["right"])){
|
||||
|
||||
foreach($html["right"] as $wiki){
|
||||
|
||||
$description = [];
|
||||
|
||||
if(isset($wiki["short_intro"])){
|
||||
|
||||
$description[] =
|
||||
[
|
||||
"type" => "quote",
|
||||
"value" => $wiki["short_intro"],
|
||||
];
|
||||
}
|
||||
|
||||
if(isset($wiki["intro"])){
|
||||
|
||||
$description[] =
|
||||
[
|
||||
"type" => "text",
|
||||
"value" => $wiki["intro"],
|
||||
];
|
||||
}
|
||||
|
||||
// get table elements
|
||||
$table = [];
|
||||
|
||||
if(isset($wiki["fields"])){
|
||||
|
||||
foreach($wiki["fields"] as $element){
|
||||
|
||||
$table[$element["title"]] = implode(", ", $element["value"]);
|
||||
}
|
||||
}
|
||||
|
||||
// get sublinks
|
||||
$sublinks = [];
|
||||
|
||||
if(isset($wiki["website"])){
|
||||
|
||||
if(
|
||||
preg_match(
|
||||
'/^http/',
|
||||
$wiki["website"]
|
||||
) === 0
|
||||
){
|
||||
|
||||
$sublinks["Website"] = "https://" . $wiki["website"];
|
||||
}else{
|
||||
|
||||
$sublinks["Website"] = $wiki["website"];
|
||||
}
|
||||
}
|
||||
|
||||
foreach($wiki["profiles"] as $sitename => $url){
|
||||
|
||||
$sitename = explode("_", $sitename);
|
||||
$sitename = ucfirst($sitename[count($sitename) - 1]);
|
||||
|
||||
$sublinks[$sitename] = $url;
|
||||
}
|
||||
|
||||
$out["answer"][] = [
|
||||
"title" =>
|
||||
$this->titledots(
|
||||
$wiki["title"]
|
||||
),
|
||||
"description" => $description,
|
||||
"url" => null,
|
||||
"thumb" => isset($wiki["image"]["contentUrl"]) ? $wiki["image"]["contentUrl"] : null,
|
||||
"table" => $table,
|
||||
"sublink" => $sublinks
|
||||
];
|
||||
}
|
||||
}
|
||||
|
||||
// get next page
|
||||
if((int)$html["search"]["page"] < (int)$html["search"]["max_page"]){
|
||||
|
||||
// https://coccoc.com/composer?_=1754021153532&p=0&q=zbabduiqwhduwqhdnwq&reqid=bwcAs00q&s=direct&apiV=1
|
||||
// ^json endpoint, but we can just do &page=2 lol
|
||||
|
||||
if(!isset($query["page"])){
|
||||
|
||||
$query["page"] = 2;
|
||||
}else{
|
||||
|
||||
$query["page"]++;
|
||||
}
|
||||
|
||||
$out["npt"] =
|
||||
$this->backend
|
||||
->store(
|
||||
json_encode($query),
|
||||
"web",
|
||||
$proxy
|
||||
);
|
||||
}
|
||||
|
||||
return $out;
|
||||
}
|
||||
|
||||
public function video($get){
|
||||
|
||||
//$html = file_get_contents("scraper/coccoc.html");
|
||||
if($get["npt"]){
|
||||
|
||||
[$query, $proxy] =
|
||||
$this->backend->get(
|
||||
$get["npt"],
|
||||
"videos"
|
||||
);
|
||||
|
||||
$query = json_decode($query, true);
|
||||
}else{
|
||||
|
||||
$proxy = $this->backend->get_ip();
|
||||
|
||||
$query = [
|
||||
"query" => $get["s"],
|
||||
"tbm" => "vid"
|
||||
];
|
||||
|
||||
// add filters
|
||||
if($get["nsfw"] == "no"){
|
||||
|
||||
$query["safe"] = 1;
|
||||
}
|
||||
|
||||
if($get["time"] != "any"){
|
||||
|
||||
$query["tbs"] = $get["time"];
|
||||
}
|
||||
|
||||
if($get["filter"] == "yes"){
|
||||
|
||||
$query["filter"] = 0;
|
||||
}
|
||||
}
|
||||
|
||||
try{
|
||||
|
||||
$html =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://coccoc.com/search",
|
||||
$query
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to get search page");
|
||||
}
|
||||
|
||||
$html = explode("window.composerResponse", $html, 2);
|
||||
|
||||
if(count($html) !== 2){
|
||||
|
||||
throw new Exception("Failed to grep window.composerResponse");
|
||||
}
|
||||
|
||||
$html =
|
||||
json_decode(
|
||||
$this->fuckhtml
|
||||
->extract_json(
|
||||
ltrim($html[1], " =")
|
||||
),
|
||||
true
|
||||
);
|
||||
|
||||
if($html === null){
|
||||
|
||||
throw new Exception("Failed to decode JSON");
|
||||
}
|
||||
|
||||
$out = [
|
||||
"status" => "ok",
|
||||
"npt" => null,
|
||||
"video" => [],
|
||||
"author" => [],
|
||||
"livestream" => [],
|
||||
"playlist" => [],
|
||||
"reel" => []
|
||||
];
|
||||
|
||||
if(!isset($html["search_video"]["search_results"])){
|
||||
|
||||
if(isset($html["search_video"]["error"]["title"])){
|
||||
|
||||
if($html["search_video"]["error"]["title"] == "Không tìm thấy kết quả nào"){
|
||||
|
||||
return $out;
|
||||
}
|
||||
|
||||
throw new Exception("Coc Coc returned an error: " . $html["search_video"]["error"]["title"]);
|
||||
}
|
||||
|
||||
throw new Exception("Coc Coc did not supply a search_results object");
|
||||
}
|
||||
|
||||
foreach($html["search_video"]["search_results"] as $video){
|
||||
|
||||
if(isset($video["rich"]["data"]["image_url"])){
|
||||
|
||||
$thumb = [
|
||||
"ratio" => "16:9",
|
||||
"url" => $video["rich"]["data"]["image_url"]
|
||||
];
|
||||
}else{
|
||||
|
||||
$thumb = [
|
||||
"ratio" => null,
|
||||
"url" => null
|
||||
];
|
||||
}
|
||||
|
||||
$out["video"][] = [
|
||||
"title" =>
|
||||
$this->titledots(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$video["title"]
|
||||
)
|
||||
),
|
||||
"description" =>
|
||||
$this->titledots(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$video["content"]
|
||||
)
|
||||
),
|
||||
"author" => [
|
||||
"name" =>
|
||||
isset($video["rich"]["data"]["channel_name"]) ?
|
||||
$video["rich"]["data"]["channel_name"] : null,
|
||||
"url" => null,
|
||||
"avatar" => null
|
||||
],
|
||||
"date" =>
|
||||
isset($video["date"]) ?
|
||||
$video["date"] : null,
|
||||
"duration" =>
|
||||
isset($video["rich"]["data"]["duration"]) ?
|
||||
(int)$video["rich"]["data"]["duration"] : null,
|
||||
"views" => null,
|
||||
"thumb" => $thumb,
|
||||
"url" => $video["url"]
|
||||
];
|
||||
}
|
||||
|
||||
// get next page
|
||||
if((int)$html["search_video"]["page"] < (int)$html["search_video"]["max_page"]){
|
||||
|
||||
if(!isset($query["page"])){
|
||||
|
||||
$query["page"] = 2;
|
||||
}else{
|
||||
|
||||
$query["page"]++;
|
||||
}
|
||||
|
||||
$out["npt"] =
|
||||
$this->backend
|
||||
->store(
|
||||
json_encode($query),
|
||||
"videos",
|
||||
$proxy
|
||||
);
|
||||
}
|
||||
|
||||
return $out;
|
||||
}
|
||||
|
||||
private function titledots($title){
|
||||
|
||||
return trim($title, " .\t\n\r\0\x0B…");
|
||||
}
|
||||
|
||||
private function int2hms($seconds){
|
||||
|
||||
$hours = floor($seconds / 3600);
|
||||
$minutes = floor(($seconds % 3600) / 60);
|
||||
$seconds = $seconds % 60;
|
||||
|
||||
return sprintf("%02d:%02d:%02d", $hours, $minutes, $seconds);
|
||||
}
|
||||
}
|
3675
scraper/ddg.php
3675
scraper/ddg.php
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,262 @@
|
|||
<?php
|
||||
|
||||
class fivehpx{
|
||||
|
||||
public function __construct(){
|
||||
|
||||
include "lib/backend.php";
|
||||
$this->backend = new backend("fivehpx");
|
||||
|
||||
include "lib/fuckhtml.php";
|
||||
$this->fuckhtml = new fuckhtml();
|
||||
}
|
||||
|
||||
public function getfilters($page){
|
||||
|
||||
return [
|
||||
"sort" => [
|
||||
"display" => "Sort",
|
||||
"option" => [
|
||||
"relevance" => "Relevance",
|
||||
"pulse" => "Pulse",
|
||||
"newest" => "Newest"
|
||||
]
|
||||
]
|
||||
];
|
||||
}
|
||||
|
||||
private function get($proxy, $url, $get = [], $post_data = null){
|
||||
|
||||
$curlproc = curl_init();
|
||||
|
||||
if($get !== []){
|
||||
$get = http_build_query($get);
|
||||
$url .= "?" . $get;
|
||||
}
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_URL, $url);
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
|
||||
|
||||
if($post_data === null){
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Connection: keep-alive",
|
||||
"Upgrade-Insecure-Requests: 1",
|
||||
"Sec-Fetch-Dest: document",
|
||||
"Sec-Fetch-Mode: navigate",
|
||||
"Sec-Fetch-Site: same-origin",
|
||||
"Sec-Fetch-User: ?1",
|
||||
"Priority: u=0, i",
|
||||
"TE: trailers"]
|
||||
);
|
||||
}else{
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: */*",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip",
|
||||
"Referer: https://500px.com/",
|
||||
"content-type: application/json",
|
||||
//"x-csrf-token: undefined",
|
||||
"x-500px-source: Search",
|
||||
"Content-Length: " . strlen($post_data),
|
||||
"Origin: https://500px.com",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Connection: keep-alive",
|
||||
// "Cookie: _pin_unauth, _fbp, _sharedID, _sharedID_cst",
|
||||
"Sec-Fetch-Dest: empty",
|
||||
"Sec-Fetch-Mode: cors",
|
||||
"Sec-Fetch-Site: same-site",
|
||||
"Priority: u=4",
|
||||
"TE: trailers"]
|
||||
);
|
||||
|
||||
// set post data
|
||||
curl_setopt($curlproc, CURLOPT_POST, true);
|
||||
curl_setopt($curlproc, CURLOPT_POSTFIELDS, $post_data);
|
||||
}
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
|
||||
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
|
||||
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
|
||||
|
||||
// http2 bypass
|
||||
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
|
||||
|
||||
$this->backend->assign_proxy($curlproc, $proxy);
|
||||
|
||||
$data = curl_exec($curlproc);
|
||||
|
||||
if(curl_errno($curlproc)){
|
||||
|
||||
throw new Exception(curl_error($curlproc));
|
||||
}
|
||||
|
||||
curl_close($curlproc);
|
||||
return $data;
|
||||
}
|
||||
|
||||
public function image($get){
|
||||
|
||||
if($get["npt"]){
|
||||
|
||||
[$pagination, $proxy] =
|
||||
$this->backend->get(
|
||||
$get["npt"], "images"
|
||||
);
|
||||
|
||||
$pagination = json_decode($pagination, true);
|
||||
$search = $pagination["search"];
|
||||
|
||||
}else{
|
||||
|
||||
$search = $get["s"];
|
||||
if(strlen($search) === 0){
|
||||
|
||||
throw new Exception("Search term is empty!");
|
||||
}
|
||||
|
||||
$proxy = $this->backend->get_ip();
|
||||
$pagination = [
|
||||
"sort" => strtoupper($get["sort"]),
|
||||
"search" => $search,
|
||||
"filters" => [],
|
||||
"nlp" => false,
|
||||
];
|
||||
}
|
||||
|
||||
try{
|
||||
|
||||
$json =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://api.500px.com/graphql",
|
||||
[],
|
||||
json_encode([
|
||||
"operationName" => "PhotoSearchPaginationContainerQuery",
|
||||
"variables" => $pagination,
|
||||
"query" =>
|
||||
'query PhotoSearchPaginationContainerQuery(' .
|
||||
(isset($pagination["cursor"]) ? '$cursor: String, ' : "") .
|
||||
'$sort: PhotoSort, $search: String!, $filters: [PhotoSearchFilter!], $nlp: Boolean) { ...PhotoSearchPaginationContainer_query_1vzAZD} fragment PhotoSearchPaginationContainer_query_1vzAZD on Query { photoSearch(sort: $sort, first: 100, ' .
|
||||
(isset($pagination["cursor"]) ? 'after: $cursor, ' : "") .
|
||||
'search: $search, filters: $filters, nlp: $nlp) { edges { node { id legacyId canonicalPath name description width height images(sizes: [33, 36]) { size url id } } } totalCount pageInfo { endCursor hasNextPage } }}'
|
||||
])
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to fetch graphQL object");
|
||||
}
|
||||
|
||||
$json = json_decode($json, true);
|
||||
|
||||
if($json === null){
|
||||
|
||||
throw new Exception("Failed to decode graphQL object");
|
||||
}
|
||||
|
||||
if(isset($json["errors"][0]["message"])){
|
||||
|
||||
throw new Exception("500px returned an API error: " . $json["errors"][0]["message"]);
|
||||
}
|
||||
|
||||
if(!isset($json["data"]["photoSearch"]["edges"])){
|
||||
|
||||
throw new Exception("No edges returned by API");
|
||||
}
|
||||
|
||||
$out = [
|
||||
"status" => "ok",
|
||||
"npt" => null,
|
||||
"image" => []
|
||||
];
|
||||
|
||||
foreach($json["data"]["photoSearch"]["edges"] as $image){
|
||||
|
||||
$image = $image["node"];
|
||||
$title =
|
||||
trim(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$image["name"]
|
||||
) . ": " .
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$image["description"]
|
||||
)
|
||||
, " :"
|
||||
);
|
||||
|
||||
$small = $this->image_ratio(600, $image["width"], $image["height"]);
|
||||
$large = $this->image_ratio(2048, $image["width"], $image["height"]);
|
||||
|
||||
$out["image"][] = [
|
||||
"title" => $title,
|
||||
"source" => [
|
||||
[
|
||||
"url" => $image["images"][1]["url"],
|
||||
"width" => $large[0],
|
||||
"height" => $large[1]
|
||||
],
|
||||
[
|
||||
"url" => $image["images"][0]["url"],
|
||||
"width" => $small[0],
|
||||
"height" => $small[1]
|
||||
]
|
||||
],
|
||||
"url" => "https://500px.com" . $image["canonicalPath"]
|
||||
];
|
||||
}
|
||||
|
||||
// get NPT token
|
||||
if($json["data"]["photoSearch"]["pageInfo"]["hasNextPage"] === true){
|
||||
|
||||
$out["npt"] =
|
||||
$this->backend->store(
|
||||
json_encode([
|
||||
"cursor" => $json["data"]["photoSearch"]["pageInfo"]["endCursor"],
|
||||
"search" => $search,
|
||||
"sort" => $pagination["sort"],
|
||||
"filters" => [],
|
||||
"nlp" => false
|
||||
]),
|
||||
"images",
|
||||
$proxy
|
||||
);
|
||||
}
|
||||
|
||||
return $out;
|
||||
}
|
||||
|
||||
private function image_ratio($longest_edge, $width, $height){
|
||||
|
||||
$ratio = [
|
||||
$longest_edge / $width,
|
||||
$longest_edge / $height
|
||||
];
|
||||
|
||||
if($ratio[0] < $ratio[1]){
|
||||
|
||||
$ratio = $ratio[0];
|
||||
}else{
|
||||
|
||||
$ratio = $ratio[1];
|
||||
}
|
||||
|
||||
return [
|
||||
floor($width * $ratio),
|
||||
floor($height * $ratio)
|
||||
];
|
||||
}
|
||||
}
|
|
@ -0,0 +1,415 @@
|
|||
<?php
|
||||
|
||||
class flickr{
|
||||
|
||||
const req_web = 0;
|
||||
const req_xhr = 1;
|
||||
|
||||
public function __construct(){
|
||||
|
||||
include "lib/backend.php";
|
||||
$this->backend = new backend("flickr");
|
||||
|
||||
include "lib/fuckhtml.php";
|
||||
$this->fuckhtml = new fuckhtml();
|
||||
}
|
||||
|
||||
public function getfilters($page){
|
||||
|
||||
return [
|
||||
"nsfw" => [
|
||||
"display" => "NSFW",
|
||||
"option" => [
|
||||
"yes" => "Yes",
|
||||
"maybe" => "Maybe",
|
||||
"no" => "No",
|
||||
]
|
||||
],
|
||||
"sort" => [
|
||||
"display" => "Sort by",
|
||||
"option" => [
|
||||
"relevance" => "Relevance",
|
||||
"date-posted-desc" => "Newest uploads",
|
||||
"date-posted-asc" => "Oldest uploads",
|
||||
"date-taken-desc" => "Newest taken",
|
||||
"date-taken-asc" => "Oldest taken",
|
||||
"interestingness-desc" => "Interesting"
|
||||
]
|
||||
],
|
||||
"color" => [
|
||||
"display" => "Color",
|
||||
"option" => [
|
||||
"any" => "Any color",
|
||||
// color_codes=
|
||||
"0" => "Red",
|
||||
"1" => "Brown",
|
||||
"2" => "Orange",
|
||||
"b" => "Pink",
|
||||
"4" => "Yellow",
|
||||
"3" => "Golden",
|
||||
"5" => "Lime",
|
||||
"6" => "Green",
|
||||
"7" => "Sky blue",
|
||||
"8" => "Blue",
|
||||
"9" => "Purple",
|
||||
"a" => "Hot pink",
|
||||
"c" => "White",
|
||||
"d" => "Gray",
|
||||
"e" => "Black",
|
||||
// styles= override
|
||||
"blackandwhite" => "Black & white",
|
||||
]
|
||||
],
|
||||
"style" => [ // styles=
|
||||
"display" => "Style",
|
||||
"option" => [
|
||||
"any" => "Any style",
|
||||
"depthoffield" => "Depth of field",
|
||||
"minimalism" => "Minimalism",
|
||||
"pattern" => "Patterns"
|
||||
]
|
||||
],
|
||||
"license" => [
|
||||
"display" => "License",
|
||||
"option" => [
|
||||
"any" => "Any license",
|
||||
"1,2,3,4,5,6,9,11,12,13,14,15,16" => "All creative commons",
|
||||
"4,5,6,9,10,11,12,13" => "Commercial use allowed",
|
||||
"1,2,4,5,9,10,11,12,14,15" => "Modifications allowed",
|
||||
"4,5,9,10,11,12" => "Commercial use & mods allowed",
|
||||
"7,9,10" => "No known copyright restrictions",
|
||||
"8" => "U.S Government works"
|
||||
]
|
||||
]
|
||||
];
|
||||
}
|
||||
|
||||
private function get($proxy, $url, $get = [], $reqtype){
|
||||
|
||||
$curlproc = curl_init();
|
||||
|
||||
if($get !== []){
|
||||
$get = http_build_query($get);
|
||||
$url .= "?" . $get;
|
||||
}
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_URL, $url);
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
|
||||
|
||||
if($reqtype === flickr::req_web){
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Connection: keep-alive",
|
||||
"Upgrade-Insecure-Requests: 1",
|
||||
"Sec-Fetch-Dest: document",
|
||||
"Sec-Fetch-Mode: navigate",
|
||||
"Sec-Fetch-Site: same-origin",
|
||||
"Sec-Fetch-User: ?1",
|
||||
"Priority: u=0, i",
|
||||
"TE: trailers"]
|
||||
);
|
||||
}else{
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: */*",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip",
|
||||
"Origin: https://www.flickr.com",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Connection: keep-alive",
|
||||
"Referer: https://www.flickr.com/",
|
||||
// Cookie:
|
||||
"Sec-Fetch-Dest: empty",
|
||||
"Sec-Fetch-Mode: cors",
|
||||
"Sec-Fetch-Site: same-site",
|
||||
"TE: trailers"]
|
||||
);
|
||||
}
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
|
||||
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
|
||||
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
|
||||
|
||||
// http2 bypass
|
||||
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
|
||||
|
||||
$this->backend->assign_proxy($curlproc, $proxy);
|
||||
|
||||
$data = curl_exec($curlproc);
|
||||
|
||||
if(curl_errno($curlproc)){
|
||||
|
||||
throw new Exception(curl_error($curlproc));
|
||||
}
|
||||
|
||||
curl_close($curlproc);
|
||||
return $data;
|
||||
}
|
||||
|
||||
public function image($get){
|
||||
|
||||
if($get["npt"]){
|
||||
|
||||
[$filters, $proxy] =
|
||||
$this->backend->get(
|
||||
$get["npt"], "images"
|
||||
);
|
||||
|
||||
$filters = json_decode($filters, true);
|
||||
|
||||
// Workaround for the future, if flickr deprecates &page argument on html page
|
||||
/*
|
||||
try{
|
||||
|
||||
$json =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://api.flickr.com/services/rest",
|
||||
[
|
||||
"sort" => $data["sort"],
|
||||
"parse_tags" => 1,
|
||||
// url_s,url_n,url_w,url_m,url_z,url_c,url_l,url_h,url_k,url_3k,url_4k,url_5k,url_6k,url_o
|
||||
"extras" => "can_comment,can_print,count_comments,count_faves,description,isfavorite,license,media,needs_interstitial,owner_name,path_alias,realname,rotation,url_sq,url_q,url_t,url_s,url_n,url_w,url_m,url_z,url_c,url_l",
|
||||
"per_page" => 100,
|
||||
"page" => $data["page"],
|
||||
"lang" => "en-US",
|
||||
"text" => $data["search"],
|
||||
"viewerNSID" => "",
|
||||
"method" => "flickr.photos.search",
|
||||
"csrf" => "",
|
||||
"api_key" => $data["api_key"],
|
||||
"format" => "json",
|
||||
"hermes" => 1,
|
||||
"hermesClient" => 1,
|
||||
"reqId" => $data["reqId"],
|
||||
"nojsoncallback" => 1
|
||||
]
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to fetch JSON");
|
||||
}*/
|
||||
|
||||
}else{
|
||||
|
||||
if(strlen($get["s"]) === 0){
|
||||
|
||||
throw new Exception("Search term is empty!");
|
||||
}
|
||||
|
||||
$proxy = $this->backend->get_ip();
|
||||
|
||||
// compute filters
|
||||
$filters = [
|
||||
"page" => 1,
|
||||
"sort" => $get["sort"]
|
||||
];
|
||||
|
||||
if($get["style"] != "any"){
|
||||
|
||||
$filters["styles"] = $get["style"];
|
||||
}
|
||||
|
||||
if($get["color"] != "any"){
|
||||
|
||||
if($get["color"] != "blackandwhite"){
|
||||
|
||||
$filters["color_codes"] = $get["color"];
|
||||
}else{
|
||||
|
||||
$filters["styles"] = "blackandwhite";
|
||||
}
|
||||
}
|
||||
|
||||
if($get["license"] != "any"){
|
||||
|
||||
$filters["license"] = $get["license"];
|
||||
}
|
||||
|
||||
switch($get["nsfw"]){
|
||||
|
||||
case "yes": $filters["safe_search"] = 0; break;
|
||||
case "maybe": $filters["safe_search"] = 2; break;
|
||||
case "no": $filters["safe_search"] = 1; break;
|
||||
}
|
||||
}
|
||||
|
||||
$get_params = [
|
||||
"text" => $get["s"],
|
||||
"per_page" => 50,
|
||||
// scrape highest resolution
|
||||
"extras" => "url_s,url_n,url_w,url_m,url_z,url_c,url_l,url_h,url_k,url_3k,url_4k,url_5k,url_6k,url_o",
|
||||
"view_all" => 1
|
||||
];
|
||||
|
||||
$get_params = array_merge($get_params, $filters);
|
||||
|
||||
$html =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://www.flickr.com/search/",
|
||||
$get_params,
|
||||
flickr::req_web
|
||||
);
|
||||
|
||||
// @TODO
|
||||
// get api_key and reqId, if flickr deprecates &page
|
||||
|
||||
$this->fuckhtml->load($html);
|
||||
|
||||
//
|
||||
// get response JSON
|
||||
//
|
||||
$scripts =
|
||||
$this->fuckhtml
|
||||
->getElementsByClassName(
|
||||
"modelExport",
|
||||
"script"
|
||||
);
|
||||
|
||||
$found = false;
|
||||
foreach($scripts as $script){
|
||||
|
||||
$json =
|
||||
preg_split(
|
||||
'/modelExport: ?/',
|
||||
$script["innerHTML"],
|
||||
2
|
||||
);
|
||||
|
||||
if(count($json) !== 0){
|
||||
|
||||
$found = true;
|
||||
$json = $json[1];
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if($found === false){
|
||||
|
||||
throw new Exception("Failed to grep JSON");
|
||||
}
|
||||
|
||||
$json =
|
||||
json_decode(
|
||||
$this->fuckhtml
|
||||
->extract_json(
|
||||
$json
|
||||
),
|
||||
true
|
||||
);
|
||||
|
||||
if($json === null){
|
||||
|
||||
throw new Exception("Failed to decode JSON");
|
||||
}
|
||||
|
||||
$out = [
|
||||
"status" => "ok",
|
||||
"npt" => null,
|
||||
"image" => []
|
||||
];
|
||||
|
||||
if(!isset($json["main"]["search-photos-lite-models"][0]["data"]["photos"]["data"]["_data"])){
|
||||
|
||||
throw new Exception("Failed to access data object");
|
||||
}
|
||||
|
||||
foreach($json["main"]["search-photos-lite-models"][0]["data"]["photos"]["data"]["_data"] as $image){
|
||||
|
||||
if(!isset($image["data"])){
|
||||
|
||||
// flickr likes to gives us empty array objects
|
||||
continue;
|
||||
}
|
||||
|
||||
$image = $image["data"];
|
||||
|
||||
$title = [];
|
||||
|
||||
if(isset($image["title"])){
|
||||
|
||||
$title[] =
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$image["title"]
|
||||
);
|
||||
}
|
||||
|
||||
if(isset($image["description"])){
|
||||
|
||||
$title[] =
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
str_replace(
|
||||
"\n",
|
||||
" ",
|
||||
$image["description"]
|
||||
)
|
||||
);
|
||||
}
|
||||
|
||||
$title = implode(": ", $title);
|
||||
|
||||
$sources = array_values($image["sizes"]["data"]);
|
||||
|
||||
$suitable_sizes = ["n", "m", "w", "s"];
|
||||
|
||||
$thumb = &$sources[0]["data"];
|
||||
foreach($suitable_sizes as $testing_size){
|
||||
|
||||
if(isset($image["sizes"]["data"][$testing_size])){
|
||||
|
||||
$thumb = &$image["sizes"]["data"][$testing_size]["data"];
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
$og = &$sources[count($sources) - 1]["data"];
|
||||
|
||||
$out["image"][] = [
|
||||
"title" => $title,
|
||||
"source" => [
|
||||
[
|
||||
"url" => "https:" . $og["displayUrl"],
|
||||
"width" => (int)$og["width"],
|
||||
"height" => (int)$og["height"]
|
||||
],
|
||||
[
|
||||
"url" => "https:" . $thumb["displayUrl"],
|
||||
"width" => (int)$thumb["width"],
|
||||
"height" => (int)$thumb["height"]
|
||||
]
|
||||
],
|
||||
"url" => "https://www.flickr.com/photos/" . $image["ownerNsid"] . "/" . $image["id"] . "/"
|
||||
];
|
||||
}
|
||||
|
||||
$total_items = (int)$json["main"]["search-photos-lite-models"][0]["data"]["photos"]["data"]["totalItems"];
|
||||
|
||||
if(($filters["page"]) * 50 < $total_items){
|
||||
|
||||
$filters["page"]++;
|
||||
|
||||
$out["npt"] =
|
||||
$this->backend->store(
|
||||
json_encode($filters),
|
||||
"images",
|
||||
$proxy
|
||||
);
|
||||
}
|
||||
|
||||
return $out;
|
||||
}
|
||||
}
|
|
@ -136,7 +136,7 @@ class ftm{
|
|||
"source" => [
|
||||
[
|
||||
"url" =>
|
||||
"https://findthatmeme.us-southeast-1.linodeobjects.com/" .
|
||||
"https://s3.thehackerblog.com/findthatmeme/" .
|
||||
$thumb,
|
||||
"width" => null,
|
||||
"height" => null
|
||||
|
|
4620
scraper/google.php
4620
scraper/google.php
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -182,6 +182,23 @@ class imgur{
|
|||
throw new Exception("Failed to fetch HTML");
|
||||
}
|
||||
|
||||
$json = json_decode($html, true);
|
||||
|
||||
if($json){
|
||||
|
||||
// {"data":{"error":"Imgur is temporarily over capacity. Please try again later."},"success":false,"status":403}
|
||||
|
||||
if(isset($json["data"]["error"])){
|
||||
|
||||
if(stripos($json["data"]["error"], "capacity")){
|
||||
|
||||
throw new Exception("Imgur IP blocked this 4get instance or request proxy. Try again");
|
||||
}
|
||||
}
|
||||
|
||||
throw new Exception("Imgur returned an unknown error (IP ban?)");
|
||||
}
|
||||
|
||||
$this->fuckhtml->load($html);
|
||||
|
||||
$posts =
|
||||
|
@ -197,7 +214,14 @@ class imgur{
|
|||
|
||||
$image =
|
||||
$this->fuckhtml
|
||||
->getElementsByTagName("img")[0];
|
||||
->getElementsByTagName("img");
|
||||
|
||||
if(count($image) === 0){
|
||||
|
||||
continue;
|
||||
}
|
||||
|
||||
$image = $image[0];
|
||||
|
||||
$image_url = "https:" . substr($this->fuckhtml->getTextContent($image["attributes"]["src"]), 0, -5);
|
||||
|
||||
|
|
|
@ -3,7 +3,10 @@
|
|||
class marginalia{
|
||||
public function __construct(){
|
||||
|
||||
include "lib/fuckhtml.php";
|
||||
include "lib/anubis.php";
|
||||
$this->anubis = new anubis();
|
||||
|
||||
include_once "lib/fuckhtml.php";
|
||||
$this->fuckhtml = new fuckhtml();
|
||||
|
||||
include "lib/backend.php";
|
||||
|
@ -102,7 +105,40 @@ class marginalia{
|
|||
);
|
||||
}
|
||||
|
||||
private function get($proxy, $url, $get = []){
|
||||
private function get($proxy, $url, $get = [], $get_cookies = 1){
|
||||
|
||||
$curlproc = curl_init();
|
||||
|
||||
switch($get_cookies){
|
||||
|
||||
case 0:
|
||||
$cookies = "";
|
||||
$cookies_tmp = [];
|
||||
curl_setopt($curlproc, CURLOPT_HEADERFUNCTION, function($curlproc, $header) use (&$cookies_tmp){
|
||||
|
||||
$length = strlen($header);
|
||||
|
||||
$header = explode(":", $header, 2);
|
||||
|
||||
if(trim(strtolower($header[0])) == "set-cookie"){
|
||||
|
||||
$cookie_tmp = explode("=", trim($header[1]), 2);
|
||||
|
||||
$cookies_tmp[trim($cookie_tmp[0])] =
|
||||
explode(";", $cookie_tmp[1], 2)[0];
|
||||
}
|
||||
|
||||
return $length;
|
||||
});
|
||||
break;
|
||||
|
||||
case 1:
|
||||
$cookies = "";
|
||||
break;
|
||||
|
||||
default:
|
||||
$cookies = "Cookie: " . $get_cookies;
|
||||
}
|
||||
|
||||
$headers = [
|
||||
"User-Agent: " . config::USER_AGENT,
|
||||
|
@ -110,6 +146,7 @@ class marginalia{
|
|||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip",
|
||||
"DNT: 1",
|
||||
$cookies,
|
||||
"Connection: keep-alive",
|
||||
"Upgrade-Insecure-Requests: 1",
|
||||
"Sec-Fetch-Dest: document",
|
||||
|
@ -118,8 +155,6 @@ class marginalia{
|
|||
"Sec-Fetch-User: ?1"
|
||||
];
|
||||
|
||||
$curlproc = curl_init();
|
||||
|
||||
if($get !== []){
|
||||
$get = http_build_query($get);
|
||||
$url .= "?" . $get;
|
||||
|
@ -145,7 +180,19 @@ class marginalia{
|
|||
throw new Exception(curl_error($curlproc));
|
||||
}
|
||||
|
||||
curl_close($curlproc);
|
||||
if($get_cookies === 0){
|
||||
|
||||
$cookie = [];
|
||||
|
||||
foreach($cookies_tmp as $key => $value){
|
||||
|
||||
$cookie[] = $key . "=" . $value;
|
||||
}
|
||||
|
||||
curl_close($curlproc);
|
||||
return implode(";", $cookie);
|
||||
}
|
||||
|
||||
return $data;
|
||||
}
|
||||
|
||||
|
@ -220,13 +267,14 @@ class marginalia{
|
|||
"related" => []
|
||||
];
|
||||
|
||||
// API scraper
|
||||
if(config::MARGINALIA_API_KEY !== null){
|
||||
|
||||
try{
|
||||
$json =
|
||||
$this->get(
|
||||
$this->backend->get_ip(), // no nextpage
|
||||
"https://api.marginalia.nu/" . config::MARGINALIA_API_KEY . "/search/" . urlencode($search),
|
||||
"https://api.marginalia-search.com/" . config::MARGINALIA_API_KEY . "/search/" . urlencode($search),
|
||||
[
|
||||
"count" => 20
|
||||
]
|
||||
|
@ -263,34 +311,114 @@ class marginalia{
|
|||
return $out;
|
||||
}
|
||||
|
||||
// no more cloudflare!! Parse html by default
|
||||
$params = [
|
||||
"query" => $search
|
||||
];
|
||||
// HTML parser
|
||||
$proxy = $this->backend->get_ip();
|
||||
|
||||
foreach(["adtech", "recent", "intitle"] as $v){
|
||||
//
|
||||
// Bypass anubis check
|
||||
//
|
||||
/*
|
||||
if(($anubis_key = apcu_fetch("marginalia_cookie")) === false){
|
||||
|
||||
if($get[$v] == "yes"){
|
||||
|
||||
switch($v){
|
||||
try{
|
||||
$html =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://old-search.marginalia.nu/search",
|
||||
[
|
||||
"query" => $search
|
||||
]
|
||||
);
|
||||
|
||||
case "adtech": $params["adtech"] = "reduce"; break;
|
||||
case "recent": $params["recent"] = "recent"; break;
|
||||
case "adtech": $params["searchTitle"] = "title"; break;
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to get anubis challenge");
|
||||
}
|
||||
|
||||
try{
|
||||
|
||||
$anubis_data = $this->anubis->scrape($html);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception($error);
|
||||
}
|
||||
|
||||
// send anubis response & get cookies
|
||||
// https://old-search.marginalia.nu/.within.website/x/cmd/anubis/api/pass-challenge?response=0000018966b086834f738bacba6031028adb5aa875974ead197a8b75778baf3a&nonce=39947&redir=https%3A%2F%2Fold-search.marginalia.nu%2F&elapsedTime=1164
|
||||
|
||||
try{
|
||||
|
||||
$anubis_key =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://old-search.marginalia.nu/.within.website/x/cmd/anubis/api/pass-challenge",
|
||||
[
|
||||
"response" => $anubis_data["response"],
|
||||
"nonce" => $anubis_data["nonce"],
|
||||
"redir" => "https://old-search.marginalia.nu/",
|
||||
"elapsedTime" => random_int(1000, 2000)
|
||||
],
|
||||
0
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to submit anubis challenge");
|
||||
}
|
||||
|
||||
apcu_store("marginalia_cookie", $anubis_key);
|
||||
}*/
|
||||
|
||||
if($get["npt"]){
|
||||
|
||||
[$params, $proxy] =
|
||||
$this->backend->get(
|
||||
$get["npt"],
|
||||
"web"
|
||||
);
|
||||
|
||||
try{
|
||||
$html =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://old-search.marginalia.nu/search?" . $params,
|
||||
[],
|
||||
//$anubis_key
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to get HTML");
|
||||
}
|
||||
|
||||
}else{
|
||||
$params = [
|
||||
"query" => $search
|
||||
];
|
||||
|
||||
foreach(["adtech", "recent", "intitle"] as $v){
|
||||
|
||||
if($get[$v] == "yes"){
|
||||
|
||||
switch($v){
|
||||
|
||||
case "adtech": $params["adtech"] = "reduce"; break;
|
||||
case "recent": $params["recent"] = "recent"; break;
|
||||
case "adtech": $params["searchTitle"] = "title"; break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
try{
|
||||
$html =
|
||||
$this->get(
|
||||
$this->backend->get_ip(),
|
||||
"https://search.marginalia.nu/search",
|
||||
$params
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to get HTML");
|
||||
try{
|
||||
$html =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://old-search.marginalia.nu/search",
|
||||
$params,
|
||||
//$anubis_key
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to get HTML");
|
||||
}
|
||||
}
|
||||
|
||||
$this->fuckhtml->load($html);
|
||||
|
@ -387,6 +515,65 @@ class marginalia{
|
|||
];
|
||||
}
|
||||
|
||||
// get next page
|
||||
$this->fuckhtml->load($html);
|
||||
|
||||
$pagination =
|
||||
$this->fuckhtml
|
||||
->getElementsByAttributeValue(
|
||||
"aria-label",
|
||||
"pagination",
|
||||
"nav"
|
||||
);
|
||||
|
||||
if(count($pagination) === 0){
|
||||
|
||||
// no pagination
|
||||
return $out;
|
||||
}
|
||||
|
||||
$this->fuckhtml->load($pagination[0]);
|
||||
|
||||
$pages =
|
||||
$this->fuckhtml
|
||||
->getElementsByClassName(
|
||||
"page-link",
|
||||
"a"
|
||||
);
|
||||
|
||||
$found_current_page = false;
|
||||
|
||||
foreach($pages as $page){
|
||||
|
||||
if(
|
||||
stripos(
|
||||
$page["attributes"]["class"],
|
||||
"active"
|
||||
) !== false
|
||||
){
|
||||
|
||||
$found_current_page = true;
|
||||
continue;
|
||||
}
|
||||
|
||||
if($found_current_page){
|
||||
|
||||
// we found current page index, and we iterated over
|
||||
// the next page <a>
|
||||
|
||||
$out["npt"] =
|
||||
$this->backend->store(
|
||||
parse_url(
|
||||
$page["attributes"]["href"],
|
||||
PHP_URL_QUERY
|
||||
),
|
||||
"web",
|
||||
$proxy
|
||||
);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return $out;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -701,9 +701,11 @@ class mojeek{
|
|||
if(count($thumb) === 2){
|
||||
|
||||
$answer["thumb"] =
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$thumb[1]
|
||||
urldecode(
|
||||
$this->fuckhtml
|
||||
->getTextContent(
|
||||
$thumb[1]
|
||||
)
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
@ -13,31 +13,104 @@ class pinterest{
|
|||
return [];
|
||||
}
|
||||
|
||||
private function get($proxy, $url, $get = []){
|
||||
private function get($proxy, $url, $get = [], &$cookies, $header_data_post = null){
|
||||
|
||||
$curlproc = curl_init();
|
||||
|
||||
if($get !== []){
|
||||
if($header_data_post === null){
|
||||
|
||||
// handling GET
|
||||
|
||||
// extract cookies
|
||||
$cookies_tmp = [];
|
||||
curl_setopt($curlproc, CURLOPT_HEADERFUNCTION, function($curlproc, $header) use (&$cookies_tmp){
|
||||
|
||||
$length = strlen($header);
|
||||
|
||||
$header = explode(":", $header, 2);
|
||||
|
||||
if(trim(strtolower($header[0])) == "set-cookie"){
|
||||
|
||||
$cookie_tmp = explode("=", trim($header[1]), 2);
|
||||
|
||||
$cookies_tmp[trim($cookie_tmp[0])] =
|
||||
explode(";", $cookie_tmp[1], 2)[0];
|
||||
}
|
||||
|
||||
return $length;
|
||||
});
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: application/json, text/javascript, */*, q=0.01",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip",
|
||||
"Referer: https://ca.pinterest.com/",
|
||||
"X-Requested-With: XMLHttpRequest",
|
||||
"X-APP-VERSION: 78f8764",
|
||||
"X-Pinterest-AppState: active",
|
||||
"X-Pinterest-Source-Url: /",
|
||||
"X-Pinterest-PWS-Handler: www/index.js",
|
||||
"screen-dpr: 1",
|
||||
"is-preload-enabled: 1",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Sec-Fetch-Dest: empty",
|
||||
"Sec-Fetch-Mode: cors",
|
||||
"Sec-Fetch-Site: same-origin",
|
||||
"Connection: keep-alive",
|
||||
"Alt-Used: ca.pinterest.com",
|
||||
"Priority: u=0",
|
||||
"TE: trailers"]
|
||||
);
|
||||
|
||||
if($get !== []){
|
||||
$get = http_build_query($get);
|
||||
$url .= "?" . $get;
|
||||
}
|
||||
}else{
|
||||
|
||||
// handling POST (pagination)
|
||||
$get = http_build_query($get);
|
||||
$url .= "?" . $get;
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: application/json, text/javascript, */*, q=0.01",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip",
|
||||
"Content-Type: application/x-www-form-urlencoded",
|
||||
"Content-Length: " . strlen($get),
|
||||
"Referer: https://ca.pinterest.com/",
|
||||
"X-Requested-With: XMLHttpRequest",
|
||||
"X-APP-VERSION: 78f8764",
|
||||
"X-CSRFToken: " . $cookies["csrf"],
|
||||
"X-Pinterest-AppState: active",
|
||||
"X-Pinterest-Source-Url: /search/pins/?rs=ac&len=2&q=" . urlencode($header_data_post) . "&eq=" . urlencode($header_data_post),
|
||||
"X-Pinterest-PWS-Handler: www/search/[scope].js",
|
||||
"screen-dpr: 1",
|
||||
"is-preload-enabled: 1",
|
||||
"Origin: https://ca.pinterest.com",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Sec-Fetch-Dest: empty",
|
||||
"Sec-Fetch-Mode: cors",
|
||||
"Sec-Fetch-Site: same-origin",
|
||||
"Connection: keep-alive",
|
||||
"Alt-Used: ca.pinterest.com",
|
||||
"Cookie: " . $cookies["cookie"],
|
||||
"TE: trailers"]
|
||||
);
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_POST, true);
|
||||
curl_setopt($curlproc, CURLOPT_POSTFIELDS, $get);
|
||||
}
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_URL, $url);
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
|
||||
["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0",
|
||||
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip",
|
||||
"DNT: 1",
|
||||
"Connection: keep-alive",
|
||||
"Upgrade-Insecure-Requests: 1",
|
||||
"Sec-Fetch-Dest: document",
|
||||
"Sec-Fetch-Mode: navigate",
|
||||
"Sec-Fetch-Site: none",
|
||||
"Sec-Fetch-User: ?1"]
|
||||
);
|
||||
|
||||
// http2 bypass
|
||||
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
|
||||
|
@ -54,6 +127,26 @@ class pinterest{
|
|||
throw new Exception(curl_error($curlproc));
|
||||
}
|
||||
|
||||
if($header_data_post === null){
|
||||
|
||||
if(!isset($cookies_tmp["csrftoken"])){
|
||||
|
||||
throw new Exception("Failed to grep CSRF token");
|
||||
}
|
||||
|
||||
$cookies = "";
|
||||
|
||||
foreach($cookies_tmp as $cookie_name => $cookie_value){
|
||||
|
||||
$cookies .= $cookie_name . "=" . $cookie_value . "; ";
|
||||
}
|
||||
|
||||
$cookies = [
|
||||
"csrf" => $cookies_tmp["csrftoken"],
|
||||
"cookie" => rtrim($cookies, " ;")
|
||||
];
|
||||
}
|
||||
|
||||
curl_close($curlproc);
|
||||
return $data;
|
||||
}
|
||||
|
@ -62,17 +155,68 @@ class pinterest{
|
|||
|
||||
if($get["npt"]){
|
||||
|
||||
// @TODO
|
||||
// post data for next page
|
||||
$data = [
|
||||
"source_url" => "/search/pins/?q=" . urlencode($search) . "&rs=typed",
|
||||
"data" =>
|
||||
json_encode(
|
||||
[$data, $proxy] =
|
||||
$this->backend->get(
|
||||
$get["npt"], "images"
|
||||
);
|
||||
|
||||
$data = json_decode($data, true);
|
||||
|
||||
$search = $data["q"];
|
||||
$cookies = $data["cookies"];
|
||||
|
||||
try{
|
||||
$json =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://ca.pinterest.com/resource/BaseSearchResource/get/",
|
||||
[
|
||||
// {"options":{"applied_filters":null,"appliedProductFilters":"---","article":null,"auto_correction_disabled":false,"corpus":null,"customized_rerank_type":null,"domains":null,"filters":null,"journey_depth":null,"page_size":null,"price_max":null,"price_min":null,"query_pin_sigs":null,"query":"higurashi","redux_normalize_feed":true,"rs":"typed","scope":"pins","selected_one_bar_modules":null,"source_id":null,"source_module_id":null,"top_pin_id":null,"bookmarks":["Y2JVSG81V2sxcmNHRlpWM1J5VFVad1ZsWlVRbXhpVmtreVZsZHpOV0pIU2tkV2FscFhVbXhhVkZreU1WSmtNREZWVjIxR1RrMXNTbEJXYlhSaFVtMVdjMVZ1U2xaaWEzQnpXVlJPVTJWV1pISlhhM1JYVm10V05sVldVbE5XVjBwMVVXMUdWVll6VFhoVWJYaFhWMVp3Ums1V1RsTmlSbGt5Vm10YWFtVkdWbkpOU0dSUFZsZG9XRmxzWkc5VlZscHlWbGhrYkdKR1NubFdWelZQWVVaYWRHVkVRbFppUmtwVVZrUktWMlJIVWtWV2JHaHBVakZLU0Zkc1pEUmtNVnBZVW10b2FsSXdXbkJXYlRWRFpHeGFSMWRzVG1oaGVrWllXV3RvVTFVeFpFaFZiRUpoVm5wRk1GbHFSbXRYVjA1R1YyczFWMVpHV2pSWFZtaDNVakZrY2sxWVRsaGlhM0JXV1ZSR1MyRkdiRlZTYm1SVVVteHdXbGxWVlRGVk1VbDVWRmhrVjAxdVVuWlVhMXBTWlVaT2MxcEhSbE5TTWswMVdtdGFWMU5YU2paVmJYaFRUVmhDUjFZeU5YZFVNVkY0VjJ0b1ZXRnJOVlpVVmxwTFVURndXR042VmxOV2ExcGFXVlZWTlZVeFNYZE5WRTVYVWtWYVZGWkhNVTlXTVU1WllVWk9hR1ZyV2s1WFZ6QXhZakpPVjFWWWFHRlNWbkJRVm14U1IwMUdXWGxOVkVKVlRWWnNORll5TURWV1YwVjVWV3hDV21FeGNETmFSVnByVjFkS1IyTkhhR2xYUjJkM1ZtdGFhMlF4VVhsVGJGcE9Wa1p3YjFwWGVFdFZWbFp4VW14YWJGWnRVbHBaTUdoTFZHMUtTR1ZJYUZkV2VrWjJWMVphU21ReVJYcGpSbFpwVW10d1RGZHJVa0pPVms1SFZHNVNUbFl3V2xoVmJYUldaVVpaZUZremFGUk5hM0JYVkZaYVYyRkZNSGxWYkVKYVlrWlZlRnBGV210WFIwNUpVMnMxVTFaR1dscFdWekI0VFVaV1IxTllaR3BUUlhCb1dWUkdWbVZHVm5SbFJuQnNZbFpKTWxSVlVYaFBSVGxGV1hwR1QyVnJSVEZVVlZKT1RrVXhSVkpVUWs5bGJFVXhWRmhzZDFOR1ZsWmtNMFp0VWpGYWIxZFhjRXBsUlRGSVZWaHdUbFl4YTNoVVZWSnFUVVUxV0ZadGFFOVNSVnB6Vkd0a1drMUdiRFpUVkVaT1pXMWplRmRzVWxkaFJuQllWVlJTVDJWdFRqWlVNVkpTWlZad2NWcEhkRTlsYTFwMFZGVlNhMkpWTVZWVFZFcE9Wa1pzTmxkWE1WSk9WVEYwVlcweFVGWXdXVFJXUjNSWFYwZGFRbEJVTVRoUFJHTXhUbnBCTlUxRVRUUk5SRVV3VG5wUk5VMTVjRWhWVlhkeFprUlZlRTlFVVRKWlZHc3lUMWRSTWsxVVVUSk9iVnBvV1RKWmVrNTZXWGhPTWs1cFQwUkZNVTlFVm1sTlZGcHBUV3BTYTFsWFRtcE9SR015VG1wVk5GbHFaR2haVjFacldWUmFiVmxxWkdoYVZGWnFUa1JXT0ZSclZsaG1RVDA5fFVIbzVhRkpYZUc1WFYyUlpWVEpHYkdGNk1XWk5ha1ptVFZSR09FOUVZekZPZWtFMVRVUk5ORTFFUlRCT2VsRTFUWGx3U0ZWVmQzRm1SMWw1VFZSUk1WbDZUVEJhUjFGNVQxZFNhVnB0VlRGT1JFVXdXVlJuZVU1cVRUUk5hbU40VDBSSk1VNXFWVEZOYlZwcVdsUnJlRTFFVVhwWmVsVjNXbXBvYkU1dFJYbE9ha0Y2VDFSSk5VMTZWVEJaYWtJNFZHdFdXR1pCUFQwPXxOb25lfDg3NTcwOTAzODAxNDc0OTMqR1FMKnwzMjM3YjM3ZGNhMGU3YjYyYzYzYzAyZGJkNGU1MjdlNzMyMTExMTNlMmUyMzEyOWM2MDAzYmU1ZTlmZjkwYjAwfE5FV3w="]},"context":{}}
|
||||
]
|
||||
"source_url" => "/search/pins/?q=" . urlencode($search) . "&rs=typed",
|
||||
"data" => json_encode(
|
||||
[
|
||||
"options" => [
|
||||
"applied_unified_filters" => null,
|
||||
"appliedProductFilters" => "---",
|
||||
"article" => null,
|
||||
"auto_correction_disabled" => false,
|
||||
"corpus" => null,
|
||||
"customized_rerank_type" => null,
|
||||
"domains" => null,
|
||||
"dynamicPageSizeExpGroup" => null,
|
||||
"filters" => null,
|
||||
"journey_depth" => null,
|
||||
"page_size" => null,
|
||||
"price_max" => null,
|
||||
"price_min" => null,
|
||||
"query_pin_sigs" => null,
|
||||
"query" => $data["q"],
|
||||
"redux_normalize_feed" => true,
|
||||
"request_params" => null,
|
||||
"rs" => "typed",
|
||||
"scope" => "pins",
|
||||
"selected_one_bar_modules" => null,
|
||||
"source_id" => null,
|
||||
"source_module_id" => null,
|
||||
"source_url" => "/search/pins/?q=" . urlencode($search) . "&rs=typed",
|
||||
"top_pin_id" => null,
|
||||
"top_pin_ids" => null,
|
||||
"bookmarks" => [
|
||||
$data["bookmark"]
|
||||
]
|
||||
],
|
||||
"context" => []
|
||||
],
|
||||
JSON_UNESCAPED_SLASHES
|
||||
)
|
||||
],
|
||||
$cookies,
|
||||
$search
|
||||
);
|
||||
];
|
||||
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to fetch JSON");
|
||||
}
|
||||
|
||||
}else{
|
||||
|
||||
|
@ -81,27 +225,45 @@ class pinterest{
|
|||
|
||||
throw new Exception("Search term is empty!");
|
||||
}
|
||||
|
||||
// https://ca.pinterest.com/resource/BaseSearchResource/get/?source_url=%2Fsearch%2Fpins%2F%3Feq%3Dhigurashi%26etslf%3D5966%26len%3D2%26q%3Dhigurashi%2520when%2520they%2520cry%26rs%3Dac&data=%7B%22options%22%3A%7B%22applied_unified_filters%22%3Anull%2C%22appliedProductFilters%22%3A%22---%22%2C%22article%22%3Anull%2C%22auto_correction_disabled%22%3Afalse%2C%22corpus%22%3Anull%2C%22customized_rerank_type%22%3Anull%2C%22domains%22%3Anull%2C%22dynamicPageSizeExpGroup%22%3Anull%2C%22filters%22%3Anull%2C%22journey_depth%22%3Anull%2C%22page_size%22%3Anull%2C%22price_max%22%3Anull%2C%22price_min%22%3Anull%2C%22query_pin_sigs%22%3Anull%2C%22query%22%3A%22higurashi%20when%20they%20cry%22%2C%22redux_normalize_feed%22%3Atrue%2C%22request_params%22%3Anull%2C%22rs%22%3A%22ac%22%2C%22scope%22%3A%22pins%22%2C%22selected_one_bar_modules%22%3Anull%2C%22source_id%22%3Anull%2C%22source_module_id%22%3Anull%2C%22source_url%22%3A%22%2Fsearch%2Fpins%2F%3Feq%3Dhigurashi%26etslf%3D5966%26len%3D2%26q%3Dhigurashi%2520when%2520they%2520cry%26rs%3Dac%22%2C%22top_pin_id%22%3Anull%2C%22top_pin_ids%22%3Anull%7D%2C%22context%22%3A%7B%7D%7D&_=1736116313987
|
||||
// source_url=%2Fsearch%2Fpins%2F%3Feq%3Dhigurashi%26etslf%3D5966%26len%3D2%26q%3Dhigurashi%2520when%2520they%2520cry%26rs%3Dac
|
||||
// &data=%7B%22options%22%3A%7B%22applied_unified_filters%22%3Anull%2C%22appliedProductFilters%22%3A%22---%22%2C%22article%22%3Anull%2C%22auto_correction_disabled%22%3Afalse%2C%22corpus%22%3Anull%2C%22customized_rerank_type%22%3Anull%2C%22domains%22%3Anull%2C%22dynamicPageSizeExpGroup%22%3Anull%2C%22filters%22%3Anull%2C%22journey_depth%22%3Anull%2C%22page_size%22%3Anull%2C%22price_max%22%3Anull%2C%22price_min%22%3Anull%2C%22query_pin_sigs%22%3Anull%2C%22query%22%3A%22higurashi%20when%20they%20cry%22%2C%22redux_normalize_feed%22%3Atrue%2C%22request_params%22%3Anull%2C%22rs%22%3A%22ac%22%2C%22scope%22%3A%22pins%22%2C%22selected_one_bar_modules%22%3Anull%2C%22source_id%22%3Anull%2C%22source_module_id%22%3Anull%2C%22source_url%22%3A%22%2Fsearch%2Fpins%2F%3Feq%3Dhigurashi%26etslf%3D5966%26len%3D2%26q%3Dhigurashi%2520when%2520they%2520cry%26rs%3Dac%22%2C%22top_pin_id%22%3Anull%2C%22top_pin_ids%22%3Anull%7D%2C%22context%22%3A%7B%7D%7D
|
||||
// &_=1736116313987
|
||||
|
||||
$source_url = "/search/pins/?q=" . urlencode($search) . "&rs=" . urlencode($search);
|
||||
|
||||
$filter = [
|
||||
"source_url" => "/search/pins/?q=" . urlencode($search),
|
||||
"source_url" => $source_url,
|
||||
"rs" => "typed",
|
||||
"data" =>
|
||||
json_encode(
|
||||
[
|
||||
"options" => [
|
||||
"article" => null,
|
||||
"applied_filters" => null,
|
||||
"applied_unified_filters" => null,
|
||||
"appliedProductFilters" => "---",
|
||||
"auto_correction_disabled" => false,
|
||||
"article" => null,
|
||||
"corpus" => null,
|
||||
"customized_rerank_type" => null,
|
||||
"domains" => null,
|
||||
"dynamicPageSizeExpGroup" => null,
|
||||
"filters" => null,
|
||||
"query" => $search,
|
||||
"journey_depth" => null,
|
||||
"page_size" => null,
|
||||
"price_max" => null,
|
||||
"price_min" => null,
|
||||
"query_pin_sigs" => null,
|
||||
"query" => $search,
|
||||
"redux_normalize_feed" => true,
|
||||
"rs" => "typed",
|
||||
"request_params" => null,
|
||||
"rs" => "ac",
|
||||
"scope" => "pins", // pins, boards, videos,
|
||||
"source_id" => null
|
||||
"selected_one_bar_modules" => null,
|
||||
"source_id" => null,
|
||||
"source_module_id" => null,
|
||||
"source_url" => $source_url,
|
||||
"top_pin_id" => null,
|
||||
"top_pin_ids" => null
|
||||
],
|
||||
"context" => []
|
||||
]
|
||||
|
@ -110,24 +272,26 @@ class pinterest{
|
|||
];
|
||||
|
||||
$proxy = $this->backend->get_ip();
|
||||
}
|
||||
|
||||
try{
|
||||
$json =
|
||||
json_decode(
|
||||
$cookies = [];
|
||||
|
||||
try{
|
||||
$json =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://www.pinterest.ca/resource/BaseSearchResource/get/",
|
||||
$filter
|
||||
),
|
||||
true
|
||||
);
|
||||
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to fetch JSON");
|
||||
"https://ca.pinterest.com/resource/BaseSearchResource/get/",
|
||||
$filter,
|
||||
$cookies,
|
||||
null
|
||||
);
|
||||
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to fetch JSON");
|
||||
}
|
||||
}
|
||||
|
||||
$json = json_decode($json, true);
|
||||
|
||||
if($json === null){
|
||||
|
||||
throw new Exception("Failed to decode JSON");
|
||||
|
@ -139,6 +303,60 @@ class pinterest{
|
|||
"image" => []
|
||||
];
|
||||
|
||||
if(
|
||||
!isset(
|
||||
$json["resource_response"]
|
||||
["status"]
|
||||
)
|
||||
){
|
||||
|
||||
throw new Exception("Unknown API failure");
|
||||
}
|
||||
|
||||
if($json["resource_response"]["status"] != "success"){
|
||||
|
||||
$status = "Got non-OK response: " . $json["resource_response"]["status"];
|
||||
|
||||
if(
|
||||
isset(
|
||||
$json["resource_response"]["message"]
|
||||
)
|
||||
){
|
||||
|
||||
$status .= " - " . $json["resource_response"]["message"];
|
||||
}
|
||||
|
||||
throw new Exception($status);
|
||||
}
|
||||
|
||||
if(
|
||||
isset(
|
||||
$json["resource_response"]["sensitivity"]
|
||||
["notices"][0]["description"]["text"]
|
||||
)
|
||||
){
|
||||
|
||||
throw new Exception(
|
||||
"Pinterest returned a notice: " .
|
||||
$json["resource_response"]["sensitivity"]["notices"][0]["description"]["text"]
|
||||
);
|
||||
}
|
||||
|
||||
// get NPT
|
||||
if(isset($json["resource_response"]["bookmark"])){
|
||||
|
||||
$out["npt"] =
|
||||
$this->backend->store(
|
||||
json_encode([
|
||||
"q" => $search,
|
||||
"bookmark" => $json["resource_response"]["bookmark"],
|
||||
"cookies" => $cookies
|
||||
]),
|
||||
"images",
|
||||
$proxy
|
||||
);
|
||||
}
|
||||
|
||||
foreach(
|
||||
$json
|
||||
["resource_response"]
|
||||
|
@ -150,6 +368,7 @@ class pinterest{
|
|||
switch($item["type"]){
|
||||
|
||||
case "pin":
|
||||
case "board":
|
||||
|
||||
/*
|
||||
Handle image object
|
||||
|
@ -206,42 +425,15 @@ class pinterest{
|
|||
"height" => (int)$thumb["height"]
|
||||
]
|
||||
],
|
||||
"url" => "https://www.pinterest.com/pin/" . $item["id"]
|
||||
"url" =>
|
||||
$item["link"] === null ?
|
||||
"https://ca.pinterest.com/pin/" . $item["id"] :
|
||||
$item["link"]
|
||||
];
|
||||
break;
|
||||
|
||||
case "board":
|
||||
if(isset($item["cover_pin"]["image_url"])){
|
||||
|
||||
$image = [
|
||||
"url" => $item["cover_pin"]["image_url"],
|
||||
"width" => (int)$item["cover_pin"]["size"][0],
|
||||
"height" => (int)$item["cover_pin"]["size"][1]
|
||||
];
|
||||
}elseif(isset($item["image_cover_url_hd"])){
|
||||
/*
|
||||
$image = [
|
||||
"url" =>
|
||||
"width" => null,
|
||||
"height" => null
|
||||
];*/
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return $out;
|
||||
}
|
||||
|
||||
private function getfullresimage($image, $has_og){
|
||||
|
||||
$has_og = $has_og ? "1200x" : "originals";
|
||||
|
||||
return
|
||||
preg_replace(
|
||||
'/https:\/\/i\.pinimg\.com\/[^\/]+\//',
|
||||
"https://i.pinimg.com/" . $has_og . "/",
|
||||
$image
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
@ -410,10 +410,7 @@ class qwant{
|
|||
"thumb" =>
|
||||
$answer["data"]["result"]["thumbnail"]["landscape"] == null ?
|
||||
null :
|
||||
$this->unshitimage(
|
||||
$answer["data"]["result"]["thumbnail"]["landscape"],
|
||||
false
|
||||
),
|
||||
$this->unshitimage($answer["data"]["result"]["thumbnail"]["landscape"]),
|
||||
"table" => [],
|
||||
"sublink" => []
|
||||
];
|
||||
|
@ -770,7 +767,7 @@ class qwant{
|
|||
}else{
|
||||
|
||||
$thumb = [
|
||||
"url" => $this->unshitimage($video["thumbnail"], false),
|
||||
"url" => $this->unshitimage($video["thumbnail"]),
|
||||
"ratio" => "16:9"
|
||||
];
|
||||
}
|
||||
|
@ -870,7 +867,7 @@ class qwant{
|
|||
}else{
|
||||
|
||||
$thumb = [
|
||||
"url" => $this->unshitimage($news["media"][0]["pict_big"]["url"], false),
|
||||
"url" => $this->unshitimage($news["media"][0]["pict_big"]["url"]),
|
||||
"ratio" => "16:9"
|
||||
];
|
||||
}
|
||||
|
@ -920,18 +917,77 @@ class qwant{
|
|||
return trim($text, ". ");
|
||||
}
|
||||
|
||||
private function unshitimage($url, $is_bing = true){
|
||||
private function unshitimage($url){
|
||||
|
||||
// https://s1.qwant.com/thumbr/0x0/8/d/f6de4deb2c2b12f55d8bdcaae576f9f62fd58a05ec0feeac117b354d1bf5c2/th.jpg?u=https%3A%2F%2Fwww.bing.com%2Fth%3Fid%3DOIP.vvDWsagzxjoKKP_rOqhwrQAAAA%26w%3D160%26h%3D160%26c%3D7%26pid%3D5.1&q=0&b=1&p=0&a=0
|
||||
parse_str(parse_url($url)["query"], $parts);
|
||||
// https://s2.qwant.com/thumbr/474x289/7/f/412d13b3fe3a03eb2b89633c8e88b609b7d0b93cdd9a5e52db3c663e41e65e/th.jpg?u=https%3A%2F%2Ftse.mm.bing.net%2Fth%3Fid%3DOIP.9Tm_Eo6m7V7ltN19mxduDgHaEh%26pid%3DApi&q=0&b=1&p=0&a=0
|
||||
|
||||
if($is_bing){
|
||||
$parse = parse_url($parts["u"]);
|
||||
parse_str($parse["query"], $parts);
|
||||
$image = parse_url($url);
|
||||
|
||||
if(
|
||||
!isset($image["host"]) ||
|
||||
!isset($image["query"])
|
||||
){
|
||||
|
||||
return "https://" . $parse["host"] . "/th?id=" . urlencode($parts["id"]);
|
||||
// cant do anything
|
||||
return $url;
|
||||
}
|
||||
|
||||
return $parts["u"];
|
||||
$id = null;
|
||||
|
||||
if(
|
||||
preg_match(
|
||||
'/s[0-9]+\.qwant\.com$/',
|
||||
$image["host"]
|
||||
)
|
||||
){
|
||||
|
||||
parse_str($image["query"], $str);
|
||||
|
||||
// we're being served a proxy URL
|
||||
if(isset($str["u"])){
|
||||
|
||||
$bing_url = $str["u"];
|
||||
}else{
|
||||
|
||||
// give up
|
||||
return $url;
|
||||
}
|
||||
}
|
||||
|
||||
// parse bing URL
|
||||
$id = null;
|
||||
$image = parse_url($bing_url);
|
||||
|
||||
if(isset($image["query"])){
|
||||
|
||||
parse_str($image["query"], $str);
|
||||
|
||||
if(isset($str["id"])){
|
||||
|
||||
$id = $str["id"];
|
||||
}
|
||||
}
|
||||
|
||||
if($id === null){
|
||||
|
||||
$id = explode("/th/id/", $image["path"], 2);
|
||||
|
||||
if(count($id) !== 2){
|
||||
|
||||
// malformed
|
||||
return $url;
|
||||
}
|
||||
|
||||
$id = $id[1];
|
||||
}
|
||||
|
||||
if(is_array($id)){
|
||||
|
||||
// fuck off, let proxy.php deal with it
|
||||
return $url;
|
||||
}
|
||||
|
||||
return "https://" . $image["host"] . "/th?id=" . rawurlencode($id);
|
||||
}
|
||||
}
|
||||
|
|
|
@ -0,0 +1,541 @@
|
|||
<?php
|
||||
|
||||
class sepiasearch{
|
||||
|
||||
public function __construct(){
|
||||
|
||||
include "lib/backend.php";
|
||||
$this->backend = new backend("sepiasearch");
|
||||
}
|
||||
|
||||
public function getfilters($page){
|
||||
|
||||
return [
|
||||
"nsfw" => [
|
||||
"display" => "NSFW",
|
||||
"option" => [
|
||||
"yes" => "Yes", // &sensitiveContent=both
|
||||
"no" => "No" // &sensitiveContent=false
|
||||
]
|
||||
],
|
||||
"language" => [
|
||||
"display" => "Language", // &language=
|
||||
"option" => [
|
||||
"any" => "Any language",
|
||||
"en" => "English",
|
||||
"fr" => "Français",
|
||||
"ar" => "العربية",
|
||||
"ca" => "Català",
|
||||
"cs" => "Čeština",
|
||||
"de" => "Deutsch",
|
||||
"el" => "ελληνικά",
|
||||
"eo" => "Esperanto",
|
||||
"es" => "Español",
|
||||
"eu" => "Euskara",
|
||||
"fa" => "فارسی",
|
||||
"fi" => "Suomi",
|
||||
"gd" => "Gàidhlig",
|
||||
"gl" => "Galego",
|
||||
"hr" => "Hrvatski",
|
||||
"hu" => "Magyar",
|
||||
"is" => "Íslenska",
|
||||
"it" => "Italiano",
|
||||
"ja" => "日本語",
|
||||
"kab" => "Taqbaylit",
|
||||
"nl" => "Nederlands",
|
||||
"no" => "Norsk",
|
||||
"oc" => "Occitan",
|
||||
"pl" => "Polski",
|
||||
"pt" => "Português (Brasil)",
|
||||
"pt-PT" => "Português (Portugal)",
|
||||
"ru" => "Pусский",
|
||||
"sk" => "Slovenčina",
|
||||
"sq" => "Shqip",
|
||||
"sv" => "Svenska",
|
||||
"th" => "ไทย",
|
||||
"tok" => "Toki Pona",
|
||||
"tr" => "Türkçe",
|
||||
"uk" => "украї́нська мо́ва",
|
||||
"vi" => "Tiếng Việt",
|
||||
"zh-Hans" => "简体中文(中国)",
|
||||
"zh-Hant" => "繁體中文(台灣)"
|
||||
]
|
||||
],
|
||||
"type" => [
|
||||
"display" => "Result type", // i handle this
|
||||
"option" => [
|
||||
"videos" => "Videos",
|
||||
"playlists" => "Playlists",
|
||||
"channels" => "Channels"
|
||||
]
|
||||
],
|
||||
"sort" => [
|
||||
"display" => "Sort by",
|
||||
"option" => [
|
||||
"best" => "Best match", // no filter
|
||||
"-publishedAt" => "Newest", // sort=-publishedAt
|
||||
"publishedAt" => "Oldest" // sort=publishedAt
|
||||
]
|
||||
],
|
||||
"newer" => [ // &startDate=2025-07-26T04:00:00.000Z
|
||||
"display" => "Newer than",
|
||||
"option" => "_DATE"
|
||||
],
|
||||
"duration" => [
|
||||
"display" => "Duration",
|
||||
"option" => [
|
||||
"any" => "Any duration",
|
||||
"short" => "Short (0-4mins)", // &durationRange=short
|
||||
"medium" => "Medium (4-10 mins)",
|
||||
"long" => "Long (10+ mins)",
|
||||
]
|
||||
],
|
||||
"category" => [
|
||||
"display" => "Category", // &categoryOneOf[]=
|
||||
"option" => [
|
||||
"any" => "Any category",
|
||||
"1" => "Music",
|
||||
"2" => "Films",
|
||||
"3" => "Vehicles",
|
||||
"4" => "Art",
|
||||
"5" => "Sports",
|
||||
"6" => "Travels",
|
||||
"7" => "Gaming",
|
||||
"8" => "People",
|
||||
"9" => "Comedy",
|
||||
"10" => "Entertainment",
|
||||
"11" => "News & Politics",
|
||||
"12" => "How To",
|
||||
"13" => "Education",
|
||||
"14" => "Activism",
|
||||
"15" => "Science & Technology",
|
||||
"16" => "Animals",
|
||||
"17" => "Kids",
|
||||
"18" => "Food"
|
||||
]
|
||||
],
|
||||
"display" => [
|
||||
"display" => "Display",
|
||||
"option" => [
|
||||
"any" => "Everything",
|
||||
"true" => "Live videos", // &isLive=true
|
||||
"false" => "VODs" // &isLive=false
|
||||
]
|
||||
],
|
||||
"license" => [
|
||||
"display" => "License", // &license=
|
||||
"option" => [
|
||||
"any" => "Any license",
|
||||
"1" => "Attribution",
|
||||
"2" => "Attribution - Share Alike",
|
||||
"3" => "Attribution - No Derivatives",
|
||||
"4" => "Attribution - Non Commercial",
|
||||
"5" => "Attribution - Non Commercial - Share Alike",
|
||||
"6" => "Attribution - Non Commercial - No Derivatives",
|
||||
"7" => "Public Domain Dedication"
|
||||
]
|
||||
]
|
||||
];
|
||||
}
|
||||
|
||||
private function get($proxy, $url, $get = []){
|
||||
|
||||
$curlproc = curl_init();
|
||||
|
||||
if($get !== []){
|
||||
$get = http_build_query($get);
|
||||
$url .= "?" . $get;
|
||||
}
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_URL, $url);
|
||||
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
|
||||
|
||||
curl_setopt(
|
||||
$curlproc,
|
||||
CURLOPT_HTTPHEADER,
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: application/json, text/plain, */*",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip, deflate, br, zstd",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Connection: keep-alive",
|
||||
"Referer: https://sepiasearch.org/search",
|
||||
"Sec-Fetch-Dest: empty",
|
||||
"Sec-Fetch-Mode: cors",
|
||||
"Sec-Fetch-Site: same-origin",
|
||||
"Priority: u=0",
|
||||
"TE: trailers"]
|
||||
);
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
|
||||
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
|
||||
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
|
||||
|
||||
$this->backend->assign_proxy($curlproc, $proxy);
|
||||
|
||||
$data = curl_exec($curlproc);
|
||||
|
||||
if(curl_errno($curlproc)){
|
||||
|
||||
throw new Exception(curl_error($curlproc));
|
||||
}
|
||||
|
||||
curl_close($curlproc);
|
||||
return $data;
|
||||
}
|
||||
|
||||
public function video($get){
|
||||
|
||||
if($get["npt"]){
|
||||
|
||||
[$npt, $proxy] =
|
||||
$this->backend
|
||||
->get(
|
||||
$get["npt"],
|
||||
"videos"
|
||||
);
|
||||
|
||||
$npt = json_decode($npt, true);
|
||||
$type = $npt["type"];
|
||||
$npt = $npt["npt"];
|
||||
}else{
|
||||
|
||||
$proxy = $this->backend->get_ip();
|
||||
|
||||
$npt = [
|
||||
"search" => $get["s"],
|
||||
"start" => 0,
|
||||
"count" => 20
|
||||
];
|
||||
|
||||
if($get["type"] == "videos"){
|
||||
|
||||
//
|
||||
// Parse video filters
|
||||
//
|
||||
switch($get["nsfw"]){
|
||||
|
||||
case "yes": $npt["nsfw"] = "both"; break;
|
||||
case "no": $npt["nsfw"] = "false"; break;
|
||||
}
|
||||
|
||||
$npt["boostLanguages[]"] = "en";
|
||||
if($get["language"] != "any"){
|
||||
|
||||
$npt["languageOneOf[]"] = $get["language"];
|
||||
}
|
||||
|
||||
if($get["sort"] != "best"){
|
||||
|
||||
$npt["sort"] = $get["sort"];
|
||||
}
|
||||
|
||||
if($get["newer"] !== false){
|
||||
|
||||
$date = new DateTime("@{$get["newer"]}");
|
||||
$date->setTimezone(new DateTimeZone("UTC"));
|
||||
$formatted = $date->format("Y-m-d\TH:i:s.000\Z");
|
||||
|
||||
$npt["startDate"] = $formatted;
|
||||
}
|
||||
|
||||
switch($get["duration"]){
|
||||
|
||||
case "short":
|
||||
$npt["durationMax"] = 240;
|
||||
break;
|
||||
|
||||
case "medium":
|
||||
$npt["durationMin"] = 240;
|
||||
$npt["durationMax"] = 600;
|
||||
break;
|
||||
|
||||
case "long":
|
||||
$npt["durationMin"] = 600;
|
||||
break;
|
||||
}
|
||||
|
||||
if($get["category"] != "any"){
|
||||
|
||||
$npt["categoryOneOf[]"] = $get["category"];
|
||||
}
|
||||
|
||||
if($get["display"] != "any"){
|
||||
|
||||
$npt["isLive"] = $get["display"];
|
||||
}
|
||||
|
||||
if($get["license"] != "any"){
|
||||
|
||||
// typo in license, lol
|
||||
$npt["licenceOneOf[]"] = $get["license"];
|
||||
}
|
||||
}
|
||||
|
||||
$type = $get["type"];
|
||||
}
|
||||
|
||||
switch($type){
|
||||
|
||||
case "videos":
|
||||
$url = "https://sepiasearch.org/api/v1/search/videos";
|
||||
break;
|
||||
|
||||
case "channels":
|
||||
$url = "https://sepiasearch.org/api/v1/search/video-channels";
|
||||
break;
|
||||
|
||||
case "playlists":
|
||||
$url = "https://sepiasearch.org/api/v1/search/video-playlists";
|
||||
break;
|
||||
}
|
||||
|
||||
//$json = file_get_contents("scraper/sepia.json");
|
||||
try{
|
||||
|
||||
$json =
|
||||
$this->get(
|
||||
$proxy,
|
||||
$url,
|
||||
$npt
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to fetch JSON");
|
||||
}
|
||||
|
||||
$json = json_decode($json, true);
|
||||
|
||||
if($json === null){
|
||||
|
||||
throw new Exception("Failed to parse JSON");
|
||||
}
|
||||
|
||||
if(isset($json["errors"])){
|
||||
|
||||
$msg = [];
|
||||
foreach($json["errors"] as $error){
|
||||
|
||||
if(isset($error["msg"])){
|
||||
|
||||
$msg[] = $error["msg"];
|
||||
}
|
||||
}
|
||||
|
||||
throw new Exception("Sepia Search returned error(s): " . implode(", ", $msg));
|
||||
}
|
||||
|
||||
if(!isset($json["data"])){
|
||||
|
||||
throw new Exception("Sepia Search did not return a data object");
|
||||
}
|
||||
|
||||
$out = [
|
||||
"status" => "ok",
|
||||
"npt" => null,
|
||||
"video" => [],
|
||||
"author" => [],
|
||||
"livestream" => [],
|
||||
"playlist" => [],
|
||||
"reel" => []
|
||||
];
|
||||
|
||||
|
||||
switch($get["type"]){
|
||||
|
||||
case "videos":
|
||||
foreach($json["data"] as $video){
|
||||
|
||||
if(count($video["account"]["avatars"]) !== 0){
|
||||
|
||||
$avatar =
|
||||
$video["account"]["avatars"][count($video["account"]["avatars"]) - 1]["url"];
|
||||
}else{
|
||||
|
||||
$avatar = null;
|
||||
}
|
||||
|
||||
if($video["thumbnailUrl"] === null){
|
||||
|
||||
$thumb = [
|
||||
"ratio" => null,
|
||||
"url" => null
|
||||
];
|
||||
}else{
|
||||
|
||||
$thumb = [
|
||||
"ratio" => "16:9",
|
||||
"url" => $video["thumbnailUrl"]
|
||||
];
|
||||
}
|
||||
|
||||
if($video["isLive"]){
|
||||
|
||||
$append = "livestream";
|
||||
}else{
|
||||
|
||||
$append = "video";
|
||||
}
|
||||
|
||||
$out[$append][] = [
|
||||
"title" => $video["name"],
|
||||
"description" =>
|
||||
$this->limitstrlen(
|
||||
$this->titledots(
|
||||
$video["description"]
|
||||
)
|
||||
),
|
||||
"author" => [
|
||||
"name" => $video["account"]["displayName"] . " ({$video["account"]["name"]})",
|
||||
"url" => $video["account"]["url"],
|
||||
"avatar" => $avatar
|
||||
],
|
||||
"date" => strtotime($video["publishedAt"]),
|
||||
"duration" => $video["isLive"] ? "_LIVE" : $video["duration"],
|
||||
"views" => $video["views"],
|
||||
"thumb" => $thumb,
|
||||
"url" => $video["url"]
|
||||
];
|
||||
}
|
||||
break;
|
||||
|
||||
case "playlists":
|
||||
foreach($json["data"] as $playlist){
|
||||
|
||||
if(count($playlist["ownerAccount"]["avatars"]) !== 0){
|
||||
|
||||
$avatar =
|
||||
$playlist["ownerAccount"]["avatars"][count($playlist["ownerAccount"]["avatars"]) - 1]["url"];
|
||||
}else{
|
||||
|
||||
$avatar = null;
|
||||
}
|
||||
|
||||
if($playlist["thumbnailUrl"] === null){
|
||||
|
||||
$thumb = [
|
||||
"ratio" => null,
|
||||
"url" => null
|
||||
];
|
||||
}else{
|
||||
|
||||
$thumb = [
|
||||
"ratio" => "16:9",
|
||||
"url" => $playlist["thumbnailUrl"]
|
||||
];
|
||||
}
|
||||
|
||||
$out["playlist"][] = [
|
||||
"title" => $playlist["displayName"],
|
||||
"description" =>
|
||||
$this->limitstrlen(
|
||||
$this->titledots(
|
||||
$playlist["description"]
|
||||
)
|
||||
),
|
||||
"author" => [
|
||||
"name" => $playlist["ownerAccount"]["displayName"] . " ({$playlist["ownerAccount"]["name"]})",
|
||||
"url" => $playlist["ownerAccount"]["url"],
|
||||
"avatar" => $avatar
|
||||
],
|
||||
"date" => strtotime($playlist["createdAt"]),
|
||||
"duration" => $playlist["videosLength"],
|
||||
"views" => null,
|
||||
"thumb" => $thumb,
|
||||
"url" => $playlist["url"]
|
||||
];
|
||||
}
|
||||
break;
|
||||
|
||||
case "channels":
|
||||
foreach($json["data"] as $channel){
|
||||
|
||||
if(count($channel["avatars"]) !== 0){
|
||||
|
||||
$thumb = [
|
||||
"ratio" => "1:1",
|
||||
"url" => $channel["avatars"][count($channel["avatars"]) - 1]["url"]
|
||||
];
|
||||
}else{
|
||||
|
||||
$thumb = [
|
||||
"ratio" => null,
|
||||
"url" => null
|
||||
];
|
||||
}
|
||||
|
||||
$out["author"][] = [
|
||||
"title" => $channel["displayName"] . " ({$channel["name"]})",
|
||||
"followers" => $channel["followersCount"],
|
||||
"description" =>
|
||||
$channel["videosCount"] . " videos. " .
|
||||
$this->limitstrlen(
|
||||
$this->titledots(
|
||||
$channel["description"]
|
||||
)
|
||||
),
|
||||
"thumb" => $thumb,
|
||||
"url" => $channel["url"]
|
||||
];
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
// get next page
|
||||
if($json["total"] - 20 > $npt["start"]){
|
||||
|
||||
$npt["start"] += 20;
|
||||
|
||||
$npt = [
|
||||
"type" => $get["type"],
|
||||
"npt" => $npt
|
||||
];
|
||||
|
||||
$out["npt"] =
|
||||
$this->backend
|
||||
->store(
|
||||
json_encode($npt),
|
||||
"videos",
|
||||
$proxy
|
||||
);
|
||||
}
|
||||
|
||||
return $out;
|
||||
}
|
||||
|
||||
private function titledots($title){
|
||||
|
||||
$substr = substr($title, -3);
|
||||
|
||||
if(
|
||||
$substr == "..." ||
|
||||
$substr == "…"
|
||||
){
|
||||
|
||||
return trim(substr($title, 0, -3), " \n\r\t\v\x00\0\x0B\xc2\xa0");
|
||||
}
|
||||
|
||||
return trim($title, " \n\r\t\v\x00\0\x0B\xc2\xa0");
|
||||
}
|
||||
|
||||
private function limitstrlen($text){
|
||||
|
||||
return
|
||||
explode(
|
||||
"\n",
|
||||
wordwrap(
|
||||
str_replace(
|
||||
["\n\r", "\r\n", "\n", "\r"],
|
||||
" ",
|
||||
$text
|
||||
),
|
||||
300,
|
||||
"\n"
|
||||
),
|
||||
2
|
||||
)[0];
|
||||
}
|
||||
}
|
|
@ -1226,7 +1226,12 @@ class startpage{
|
|||
// get results
|
||||
foreach($json["render"]["presenter"]["regions"]["mainline"] as $category){
|
||||
|
||||
if($category["display_type"] == "video-youtube"){
|
||||
if(
|
||||
preg_match(
|
||||
'/^video-/i',
|
||||
$category["display_type"]
|
||||
)
|
||||
){
|
||||
|
||||
foreach($category["results"] as $video){
|
||||
|
||||
|
@ -1248,7 +1253,7 @@ class startpage{
|
|||
}
|
||||
|
||||
$out["video"][] = [
|
||||
"title" => $video["title"],
|
||||
"title" => str_replace(["", ""], "", $video["title"]),
|
||||
"description" => $this->limitstrlen($video["description"]),
|
||||
"author" => [
|
||||
"name" => $video["channelTitle"],
|
||||
|
@ -1256,7 +1261,7 @@ class startpage{
|
|||
"avatar" => null
|
||||
],
|
||||
"date" => strtotime($video["publishDate"]),
|
||||
"duration" => $this->hms2int($video["duration"]),
|
||||
"duration" => $this->hms2int($category["display_type"] == "video-youtube" ? $video["duration"] : $video["duration"] / 1000),
|
||||
"views" => (int)$video["viewCount"],
|
||||
"thumb" => $thumb,
|
||||
"url" => $video["clickUrl"]
|
||||
|
|
|
@ -0,0 +1,257 @@
|
|||
<?php
|
||||
|
||||
class vsco{
|
||||
|
||||
public function __construct(){
|
||||
|
||||
include "lib/backend.php";
|
||||
$this->backend = new backend("vsco");
|
||||
}
|
||||
|
||||
public function getfilters($page){
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
private function get($proxy, $url, $get = [], $bearer = null){
|
||||
|
||||
$curlproc = curl_init();
|
||||
|
||||
if($get !== []){
|
||||
$get_tmp = http_build_query($get);
|
||||
$url .= "?" . $get_tmp;
|
||||
}
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_URL, $url);
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
|
||||
|
||||
if($bearer === null){
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"Accept-Encoding: gzip",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Connection: keep-alive",
|
||||
"Upgrade-Insecure-Requests: 1",
|
||||
"Sec-Fetch-Dest: document",
|
||||
"Sec-Fetch-Mode: navigate",
|
||||
"Sec-Fetch-Site: same-origin",
|
||||
"Sec-Fetch-User: ?1",
|
||||
"Priority: u=0, i",
|
||||
"TE: trailers"]
|
||||
);
|
||||
}else{
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: */*",
|
||||
"Accept-Language: en-US",
|
||||
"Accept-Encoding: gzip",
|
||||
"Referer: https://vsco.co/search/images/" . urlencode($get["query"]),
|
||||
"authorization: Bearer " . $bearer,
|
||||
"content-type: application/json",
|
||||
"x-client-build: 1",
|
||||
"x-client-platform: web",
|
||||
"DNT: 1",
|
||||
"Sec-GPC: 1",
|
||||
"Connection: keep-alive",
|
||||
"Sec-Fetch-Dest: empty",
|
||||
"Sec-Fetch-Mode: cors",
|
||||
"Sec-Fetch-Site: same-origin",
|
||||
"Priority: u=0",
|
||||
"TE: trailers"]
|
||||
);
|
||||
}
|
||||
|
||||
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
|
||||
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
|
||||
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
|
||||
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
|
||||
|
||||
// http2 bypass
|
||||
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
|
||||
|
||||
$this->backend->assign_proxy($curlproc, $proxy);
|
||||
|
||||
$data = curl_exec($curlproc);
|
||||
|
||||
if(curl_errno($curlproc)){
|
||||
|
||||
throw new Exception(curl_error($curlproc));
|
||||
}
|
||||
|
||||
curl_close($curlproc);
|
||||
return $data;
|
||||
}
|
||||
|
||||
public function image($get){
|
||||
|
||||
if($get["npt"]){
|
||||
|
||||
[$data, $proxy] =
|
||||
$this->backend->get(
|
||||
$get["npt"], "images"
|
||||
);
|
||||
|
||||
$data = json_decode($data, true);
|
||||
|
||||
}else{
|
||||
|
||||
$search = $get["s"];
|
||||
if(strlen($search) === 0){
|
||||
|
||||
throw new Exception("Search term is empty!");
|
||||
}
|
||||
|
||||
$proxy = $this->backend->get_ip();
|
||||
|
||||
// get bearer token
|
||||
try{
|
||||
|
||||
$html =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://vsco.co/feed"
|
||||
);
|
||||
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to fetch feed page");
|
||||
}
|
||||
|
||||
preg_match(
|
||||
'/"tkn":"([A-z0-9]+)"/',
|
||||
$html,
|
||||
$bearer
|
||||
);
|
||||
|
||||
if(!isset($bearer[1])){
|
||||
|
||||
throw new Exception("Failed to grep bearer token");
|
||||
}
|
||||
|
||||
$data = [
|
||||
"pagination" => [
|
||||
"query" => $search,
|
||||
"page" => 0,
|
||||
"size" => 100
|
||||
],
|
||||
"bearer" => $bearer[1]
|
||||
];
|
||||
}
|
||||
|
||||
try{
|
||||
|
||||
$json =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://vsco.co/api/2.0/search/images",
|
||||
$data["pagination"],
|
||||
$data["bearer"]
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
throw new Exception("Failed to fetch JSON");
|
||||
}
|
||||
|
||||
$json = json_decode($json, true);
|
||||
|
||||
if($json === null){
|
||||
|
||||
throw new Exception("Failed to decode JSON");
|
||||
}
|
||||
|
||||
$out = [
|
||||
"status" => "ok",
|
||||
"npt" => null,
|
||||
"image" => []
|
||||
];
|
||||
|
||||
if(!isset($json["results"])){
|
||||
|
||||
throw new Exception("Failed to access results object");
|
||||
}
|
||||
|
||||
foreach($json["results"] as $image){
|
||||
|
||||
$image_domain = parse_url("https://" . $image["responsive_url"], PHP_URL_HOST);
|
||||
$thumbnail = explode($image_domain, $image["responsive_url"], 2)[1];
|
||||
|
||||
if(substr($thumbnail, 0, 3) != "/1/"){
|
||||
|
||||
$thumbnail =
|
||||
preg_replace(
|
||||
'/^\/[^\/]+/',
|
||||
"",
|
||||
$thumbnail
|
||||
);
|
||||
}
|
||||
|
||||
$thumbnail = "https://img.vsco.co/cdn-cgi/image/width=480,height=360" . $thumbnail;
|
||||
$size =
|
||||
$this->image_ratio(
|
||||
(int)$image["dimensions"]["width"],
|
||||
(int)$image["dimensions"]["height"]
|
||||
);
|
||||
|
||||
$out["image"][] = [
|
||||
"title" => $image["description"],
|
||||
"source" => [
|
||||
[
|
||||
"url" => "https://" . $image["responsive_url"],
|
||||
"width" => (int)$image["dimensions"]["width"],
|
||||
"height" => (int)$image["dimensions"]["height"]
|
||||
],
|
||||
[
|
||||
"url" => $thumbnail,
|
||||
"width" => $size[0],
|
||||
"height" => $size[1]
|
||||
]
|
||||
],
|
||||
"url" => "https://" . $image["grid"]["domain"] . "/media/" . $image["imageId"]
|
||||
];
|
||||
}
|
||||
|
||||
// get NPT
|
||||
$max_page = ceil($json["total"] / 100);
|
||||
$data["pagination"]["page"]++;
|
||||
|
||||
if($max_page > $data["pagination"]["page"]){
|
||||
|
||||
$out["npt"] =
|
||||
$this->backend->store(
|
||||
json_encode($data),
|
||||
"images",
|
||||
$proxy
|
||||
);
|
||||
}
|
||||
|
||||
return $out;
|
||||
}
|
||||
|
||||
private function image_ratio($width, $height){
|
||||
|
||||
$ratio = [
|
||||
480 / $width,
|
||||
360 / $height
|
||||
];
|
||||
|
||||
if($ratio[0] < $ratio[1]){
|
||||
|
||||
$ratio = $ratio[0];
|
||||
}else{
|
||||
|
||||
$ratio = $ratio[1];
|
||||
}
|
||||
|
||||
return [
|
||||
floor($width * $ratio),
|
||||
floor($height * $ratio)
|
||||
];
|
||||
}
|
||||
}
|
|
@ -14,7 +14,7 @@ class yandex{
|
|||
// backend included in the scraper functions
|
||||
}
|
||||
|
||||
private function get($proxy, $url, $get = [], $nsfw){
|
||||
private function get($proxy, $url, $get = [], $nsfw, $get_cookie = 1){
|
||||
|
||||
$curlproc = curl_init();
|
||||
|
||||
|
@ -25,19 +25,55 @@ class yandex{
|
|||
|
||||
curl_setopt($curlproc, CURLOPT_URL, $url);
|
||||
|
||||
// extract "i" cookie
|
||||
if($get_cookie === 0){
|
||||
|
||||
$cookies_tmp = [];
|
||||
curl_setopt($curlproc, CURLOPT_HEADERFUNCTION, function($curlproc, $header) use (&$cookies_tmp){
|
||||
|
||||
$length = strlen($header);
|
||||
|
||||
$header = explode(":", $header, 2);
|
||||
|
||||
if(trim(strtolower($header[0])) == "set-cookie"){
|
||||
|
||||
$cookie_tmp = explode("=", trim($header[1]), 2);
|
||||
|
||||
$cookies_tmp[trim($cookie_tmp[0])] =
|
||||
explode(";", $cookie_tmp[1], 2)[0];
|
||||
}
|
||||
|
||||
return $length;
|
||||
});
|
||||
}
|
||||
|
||||
switch($nsfw){
|
||||
case "yes": $nsfw = "0"; break;
|
||||
case "maybe": $nsfw = "1"; break;
|
||||
case "no": $nsfw = "2"; break;
|
||||
}
|
||||
|
||||
switch($get_cookie){
|
||||
|
||||
case 0:
|
||||
$cookie = "";
|
||||
break;
|
||||
|
||||
case 1:
|
||||
$cookie = "Cookie: yp=" . (time() - 4000033) . ".szm.1:1920x1080:876x1000#" . time() . ".sp.family:" . $nsfw;
|
||||
break;
|
||||
|
||||
default:
|
||||
$cookie = "Cookie: i=" . $get_cookie;
|
||||
}
|
||||
|
||||
$headers =
|
||||
["User-Agent: " . config::USER_AGENT,
|
||||
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
|
||||
"Accept-Encoding: gzip",
|
||||
"Accept-Language: en-US,en;q=0.5",
|
||||
"DNT: 1",
|
||||
"Cookie: yp=1716337604.sp.family%3A{$nsfw}#1685406411.szm.1:1920x1080:1920x999",
|
||||
$cookie,
|
||||
"Referer: https://yandex.com/images/search",
|
||||
"Connection: keep-alive",
|
||||
"Upgrade-Insecure-Requests: 1",
|
||||
|
@ -59,6 +95,17 @@ class yandex{
|
|||
|
||||
$data = curl_exec($curlproc);
|
||||
|
||||
if($get_cookie === 0){
|
||||
|
||||
if(isset($cookies_tmp["i"])){
|
||||
|
||||
return $cookies_tmp["i"];
|
||||
}else{
|
||||
|
||||
throw new Exception("Failed to get Yandex clearance cookie");
|
||||
}
|
||||
}
|
||||
|
||||
if(curl_errno($curlproc)){
|
||||
|
||||
throw new Exception(curl_error($curlproc));
|
||||
|
@ -217,6 +264,23 @@ class yandex{
|
|||
// https://yandex.com/search/site/?text=minecraft&web=1&frame=1&v=2.0&searchid=3131712
|
||||
// &within=777&from_day=26&from_month=8&from_year=2023&to_day=26&to_month=8&to_year=2023
|
||||
|
||||
// get clearance cookie
|
||||
if(($cookie = apcu_fetch("yandexweb_cookie")) === false){
|
||||
|
||||
$proxy = $this->backend->get_ip();
|
||||
|
||||
$cookie =
|
||||
$this->get(
|
||||
$proxy,
|
||||
"https://yandex.ru/support2/smart-captcha/ru/",
|
||||
[],
|
||||
false,
|
||||
0
|
||||
);
|
||||
|
||||
apcu_store("yandexweb_cookie", $cookie);
|
||||
}
|
||||
|
||||
if($get["npt"]){
|
||||
|
||||
[$npt, $proxy] = $this->backend->get($get["npt"], "web");
|
||||
|
@ -226,7 +290,8 @@ class yandex{
|
|||
$proxy,
|
||||
"https://yandex.com" . $npt,
|
||||
[],
|
||||
"yes"
|
||||
"yes",
|
||||
$cookie
|
||||
);
|
||||
}else{
|
||||
|
||||
|
@ -236,7 +301,7 @@ class yandex{
|
|||
throw new Exception("Search term is empty!");
|
||||
}
|
||||
|
||||
$proxy = $this->backend->get_ip();
|
||||
$proxy = !isset($proxy) ? $this->backend->get_ip() : $proxy;
|
||||
$lang = $get["lang"];
|
||||
$older = $get["older"];
|
||||
$newer = $get["newer"];
|
||||
|
@ -283,7 +348,8 @@ class yandex{
|
|||
$proxy,
|
||||
"https://yandex.com/search/site/",
|
||||
$params,
|
||||
"yes"
|
||||
"yes",
|
||||
$cookie
|
||||
);
|
||||
}catch(Exception $error){
|
||||
|
||||
|
@ -314,6 +380,19 @@ class yandex{
|
|||
|
||||
$this->fuckhtml->load($html);
|
||||
|
||||
// Scrape page blocked error
|
||||
$title =
|
||||
$this->fuckhtml
|
||||
->getElementsByTagName("title");
|
||||
|
||||
if(
|
||||
count($title) !== 0 &&
|
||||
$title[0]["innerHTML"] == "403"
|
||||
){
|
||||
|
||||
throw new Exception("Yandex blocked this proxy or 4get instance.");
|
||||
}
|
||||
|
||||
// get nextpage
|
||||
$npt =
|
||||
$this->fuckhtml
|
||||
|
@ -668,7 +747,6 @@ class yandex{
|
|||
foreach($json["blocks"] as $block){
|
||||
|
||||
$html .= $block["html"];
|
||||
|
||||
// get next page
|
||||
if(
|
||||
isset($block["params"]["nextPageUrl"]) &&
|
||||
|
|
|
@ -1209,15 +1209,16 @@ class yt{
|
|||
|
||||
$reel =
|
||||
$reel
|
||||
->reelItemRenderer;
|
||||
->shortsLockupViewModel;
|
||||
|
||||
array_push(
|
||||
$this->out["reel"],
|
||||
[
|
||||
"title" =>
|
||||
$reel
|
||||
->headline
|
||||
->simpleText,
|
||||
->overlayMetadata
|
||||
->primaryText
|
||||
->content,
|
||||
"description" => null,
|
||||
"author" => [
|
||||
"name" => null,
|
||||
|
@ -1225,30 +1226,22 @@ class yt{
|
|||
"avatar" => null
|
||||
],
|
||||
"date" => null,
|
||||
"duration" =>
|
||||
$this->textualtime2int(
|
||||
$reel
|
||||
->accessibility
|
||||
->accessibilityData
|
||||
->label
|
||||
),
|
||||
"views" =>
|
||||
$this->truncatedcount2int(
|
||||
$reel
|
||||
->viewCountText
|
||||
->simpleText
|
||||
),
|
||||
"duration" => null,
|
||||
"views" => null,
|
||||
"thumb" => [
|
||||
"url" =>
|
||||
$reel
|
||||
->thumbnail
|
||||
->thumbnails[0]
|
||||
->sources[0]
|
||||
->url,
|
||||
"ratio" => "9:16"
|
||||
],
|
||||
"url" =>
|
||||
"https://www.youtube.com/watch?v=" .
|
||||
$reel
|
||||
->onTap
|
||||
->innertubeCommand
|
||||
->reelWatchEndpoint
|
||||
->videoId
|
||||
]
|
||||
);
|
||||
|
|
52
settings.php
52
settings.php
|
@ -133,6 +133,10 @@ $settings = [
|
|||
"value" => "google",
|
||||
"text" => "Google"
|
||||
],
|
||||
[
|
||||
"value" => "google_cse",
|
||||
"text" => "Google CSE"
|
||||
],
|
||||
[
|
||||
"value" => "startpage",
|
||||
"text" => "Startpage"
|
||||
|
@ -166,8 +170,12 @@ $settings = [
|
|||
"text" => "Mojeek"
|
||||
],
|
||||
[
|
||||
"value" => "solofield",
|
||||
"text" => "Solofield"
|
||||
"value" => "baidu",
|
||||
"text" => "Baidu"
|
||||
],
|
||||
[
|
||||
"value" => "coccoc",
|
||||
"text" => "Cốc Cốc"
|
||||
],
|
||||
[
|
||||
"value" => "marginalia",
|
||||
|
@ -203,6 +211,10 @@ $settings = [
|
|||
"value" => "google",
|
||||
"text" => "Google"
|
||||
],
|
||||
[
|
||||
"value" => "google_cse",
|
||||
"text" => "Google CSE"
|
||||
],
|
||||
[
|
||||
"value" => "startpage",
|
||||
"text" => "Startpage"
|
||||
|
@ -216,13 +228,25 @@ $settings = [
|
|||
"text" => "Yep"
|
||||
],
|
||||
[
|
||||
"value" => "solofield",
|
||||
"text" => "Solofield"
|
||||
"value" => "baidu",
|
||||
"text" => "Baidu"
|
||||
],
|
||||
/*[
|
||||
[
|
||||
"value" => "pinterest",
|
||||
"text" => "Pinterest"
|
||||
],*/
|
||||
],
|
||||
[
|
||||
"value" => "flickr",
|
||||
"text" => "Flickr"
|
||||
],
|
||||
[
|
||||
"value" => "fivehpx",
|
||||
"text" => "500px"
|
||||
],
|
||||
[
|
||||
"value" => "vsco",
|
||||
"text" => "VSCO"
|
||||
],
|
||||
[
|
||||
"value" => "imgur",
|
||||
"text" => "Imgur"
|
||||
|
@ -241,6 +265,10 @@ $settings = [
|
|||
"value" => "yt",
|
||||
"text" => "YouTube"
|
||||
],
|
||||
[
|
||||
"value" => "sepiasearch",
|
||||
"text" => "Sepia Search"
|
||||
],
|
||||
[
|
||||
"value" => "ddg",
|
||||
"text" => "DuckDuckGo"
|
||||
|
@ -266,8 +294,12 @@ $settings = [
|
|||
"text" => "Qwant"
|
||||
],
|
||||
[
|
||||
"value" => "solofield",
|
||||
"text" => "Solofield"
|
||||
"value" => "baidu",
|
||||
"text" => "Baidu"
|
||||
],
|
||||
[
|
||||
"value" => "coccoc",
|
||||
"text" => "Cốc Cốc"
|
||||
]
|
||||
]
|
||||
],
|
||||
|
@ -302,6 +334,10 @@ $settings = [
|
|||
[
|
||||
"value" => "mojeek",
|
||||
"text" => "Mojeek"
|
||||
],
|
||||
[
|
||||
"value" => "baidu",
|
||||
"text" => "Baidu"
|
||||
]
|
||||
]
|
||||
],
|
||||
|
|
|
@ -1,47 +1,45 @@
|
|||
:root{
|
||||
--1d2021: #1d2021;
|
||||
--282828: #282828;
|
||||
--3c3836: #3c3836;
|
||||
--504945: #504945;
|
||||
|
||||
--1d2021:#1d2021;
|
||||
--282828:#282828;
|
||||
--3c3836:#3c3836;
|
||||
--504945:#504945;
|
||||
|
||||
/* font */
|
||||
--928374: #928374;
|
||||
--a89984: #c9c5bf;
|
||||
--bdae93: #bdae93;
|
||||
--8ec07c: #8ec07c;
|
||||
--ebdbb2: #ebdbb2;
|
||||
--928374:#928374;
|
||||
--a89984:#c9c5bf;
|
||||
--bdae93:#bdae93;
|
||||
--8ec07c:#8ec07c;
|
||||
--ebdbb2:#ebdbb2;
|
||||
}
|
||||
|
||||
|
||||
|
||||
body{
|
||||
padding:15px 4% 40px;
|
||||
margin:unset;
|
||||
}
|
||||
|
||||
h1,h2,h3,h4,h5,h6{
|
||||
h1, h2, h3, h4, h5, h6{
|
||||
padding:0;
|
||||
margin:0 0 7px 0;
|
||||
line-height:initial;
|
||||
color:var(--bdae93);
|
||||
}
|
||||
|
||||
h3,h4,h5,h6{
|
||||
h3, h4, h5, h6{
|
||||
margin-bottom:14px;
|
||||
}
|
||||
|
||||
/*
|
||||
Web styles
|
||||
Web styles
|
||||
*/
|
||||
|
||||
.searchbox input[type="submit"]{
|
||||
float:right;
|
||||
cursor:pointer;
|
||||
padding:0 10px;
|
||||
border-left: 1px solid var(--504945);
|
||||
background: #723c0b;
|
||||
border-left:1px solid var(--504945);
|
||||
background:#723c0b;
|
||||
}
|
||||
|
||||
|
||||
.searchbox input{
|
||||
all:unset;
|
||||
line-height:36px;
|
||||
|
@ -96,7 +94,6 @@ h3,h4,h5,h6{
|
|||
display:inline-block;
|
||||
}
|
||||
|
||||
|
||||
.tabs .tab.selected{
|
||||
border-bottom:2px solid #fc92a5;
|
||||
}
|
||||
|
@ -106,7 +103,7 @@ h3,h4,h5,h6{
|
|||
padding-bottom:12px;
|
||||
padding-top:7px;
|
||||
margin-bottom:7px;
|
||||
background-color:#232525
|
||||
background-color:#232525;
|
||||
}
|
||||
|
||||
.filters .filter{
|
||||
|
@ -169,7 +166,6 @@ h3,h4,h5,h6{
|
|||
font-size:12px;
|
||||
}
|
||||
|
||||
|
||||
.web .hover{
|
||||
display:block;
|
||||
text-decoration:none;
|
||||
|
@ -193,16 +189,13 @@ h3,h4,h5,h6{
|
|||
color:#9760b1 !important;
|
||||
}
|
||||
|
||||
|
||||
.web .text-result .greentext{
|
||||
font-size:14px;
|
||||
color:var(--bdae93);
|
||||
}
|
||||
|
||||
|
||||
/* favicon */
|
||||
|
||||
|
||||
.favicon-dropdown a{
|
||||
text-decoration:none;
|
||||
color:#d3d0c1;
|
||||
|
@ -211,39 +204,33 @@ h3,h4,h5,h6{
|
|||
font-size:13px;
|
||||
}
|
||||
|
||||
|
||||
.web .favicon img,
|
||||
.favicon-dropdown img{
|
||||
.web .favicon img, .favicon-dropdown img{
|
||||
margin:3px 7px 0 0;
|
||||
height:16px;
|
||||
font-size:12px;
|
||||
line-height:16px;;
|
||||
line-height:16px;
|
||||
display:block;
|
||||
text-align:left;
|
||||
}
|
||||
|
||||
|
||||
.web .sublinks{
|
||||
padding:17px 10px;
|
||||
font-size:15px;
|
||||
color:var(--#928374);
|
||||
}
|
||||
|
||||
|
||||
.web .text-result .sublinks:last-child{
|
||||
padding-bottom:0;
|
||||
}
|
||||
|
||||
|
||||
/* Wikipedia head */
|
||||
.wiki-head{
|
||||
padding:5px;
|
||||
background-color: #322f2b
|
||||
background-color:#322f2b;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Images tab
|
||||
Images tab
|
||||
*/
|
||||
|
||||
#images{
|
||||
|
@ -257,17 +244,14 @@ h3,h4,h5,h6{
|
|||
float:left;
|
||||
}
|
||||
|
||||
|
||||
|
||||
#images .image .title{
|
||||
white-space:nowrap;
|
||||
overflow:hidden;
|
||||
margin-bottom:7px;
|
||||
font-weight:bold;
|
||||
color:var(--bdae93);
|
||||
color:var(--bdae93);
|
||||
}
|
||||
|
||||
|
||||
#popup-status{
|
||||
display:none;
|
||||
position:fixed;
|
||||
|
@ -280,43 +264,59 @@ h3,h4,h5,h6{
|
|||
}
|
||||
|
||||
/*
|
||||
Settings page
|
||||
Settings page
|
||||
*/
|
||||
|
||||
|
||||
.web .settings-submit a{
|
||||
margin-right:17px;
|
||||
color:#bdae93;
|
||||
}
|
||||
|
||||
/*
|
||||
Responsive image
|
||||
*/
|
||||
@media only screen and (max-width:1454px){
|
||||
#images .image-wrapper{
|
||||
width:25%;
|
||||
}
|
||||
}
|
||||
|
||||
@media only screen and (max-width:1161px){
|
||||
#images .image-wrapper{
|
||||
width:25%;
|
||||
}
|
||||
}
|
||||
|
||||
@media only screen and (max-width:750px){
|
||||
#images .image-wrapper{
|
||||
width:50%;
|
||||
}
|
||||
}
|
||||
|
||||
@media only screen and (max-width:450px){
|
||||
#images .image-wrapper{
|
||||
width:100%;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
Responsive image
|
||||
Responsive design
|
||||
*/
|
||||
@media only screen and (max-width: 1454px){ #images .image-wrapper{ width:25%; } }
|
||||
@media only screen and (max-width: 1161px){ #images .image-wrapper{ width:25%; } }
|
||||
@media only screen and (max-width: 750px){ #images .image-wrapper{ width:50%; } }
|
||||
@media only screen and (max-width: 450px){ #images .image-wrapper{ width:100%; } }
|
||||
|
||||
|
||||
/*
|
||||
Responsive design
|
||||
*/
|
||||
@media only screen and (max-width: 1550px){
|
||||
|
||||
|
||||
.web .left,
|
||||
@media only screen and (max-width:1550px){
|
||||
.web .left,
|
||||
.searchbox{
|
||||
width:60%;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
@media only screen and (max-width: 1000px){
|
||||
|
||||
@media only screen and (max-width:1100px){
|
||||
.web .left,
|
||||
.searchbox{
|
||||
width:100%;
|
||||
}
|
||||
}
|
||||
|
||||
.type{
|
||||
color:var(--bdae93);
|
||||
}
|
||||
.type{
|
||||
color:var(--bdae93);
|
||||
}
|
||||
|
|
|
@ -0,0 +1,17 @@
|
|||
:root{
|
||||
--1d2021: #101010;
|
||||
--282828: #1a1a1a;
|
||||
--3c3836: #1e1e1e;
|
||||
--504945: #232323;
|
||||
|
||||
--928374: #949494;
|
||||
--a89984: #d2d2d2;
|
||||
--bdae93: #d2d2d2;
|
||||
--8ec07c: #99c794;
|
||||
--ebdbb2: #d2d2d2;
|
||||
|
||||
--comment: #5f6364;
|
||||
--default: #cccece;
|
||||
--keyword: #c594c5;
|
||||
--string: #99c794;
|
||||
}
|
|
@ -0,0 +1,40 @@
|
|||
:root
|
||||
{
|
||||
--accent : #f79e98;
|
||||
--1d2021 : #180d0c;
|
||||
--282828 : #180d0c;
|
||||
--3c3836 : #251615;
|
||||
--504945 : #251615;
|
||||
--928374 : var(--accent);
|
||||
--a89984 : #d8c5c4;
|
||||
--bdae93 : #d8c5c4;
|
||||
--8ec07c : var(--accent);
|
||||
--ebdbb2 : #d8c5c4;
|
||||
--comment: #928374;
|
||||
--default: #DCC9BC;
|
||||
--keyword: #F07342;
|
||||
--string : var(--accent);
|
||||
--green : #959A6B;
|
||||
--yellow : #E39C45;
|
||||
--red : #CF223E;
|
||||
--white : var(--a89984);
|
||||
--black : var(--1d2021);
|
||||
--hover : #b18884
|
||||
}
|
||||
|
||||
a.link, a { color: var(--accent); text-decoration: none; }
|
||||
.searchbox { width: 23%; }
|
||||
.filters filter select { color: #E39C45; }
|
||||
.web .separator::before { color: var(--white) }
|
||||
.searchbox input[type="text"]::placeholder { color: var(--white); }
|
||||
a.link:hover
|
||||
{
|
||||
color: var(--hover);
|
||||
text-shadow: 0 0 .2rem var(--hover);
|
||||
}
|
||||
.code-inline
|
||||
{ border-color: var(--default); font-family: monospace;}
|
||||
.home #center a
|
||||
{ color: var(--accent); }
|
||||
.home .subtext
|
||||
{ color: var(--white); }
|
Loading…
Reference in New Issue