94 Commits

Author SHA1 Message Date
46e6ed12e3 fix invalid sublinks on google scraper 2025-10-21 00:35:20 -04:00
ce75cbda81 google engineers on suicide watch 2025-10-19 03:56:37 -04:00
560b9b04da handle google api no results 2025-10-10 00:15:39 -04:00
56ea4811d7 woops 2025-10-08 22:18:03 -04:00
036a097d4d fix offset on next page 2025-10-08 01:32:47 -04:00
a4a44709b4 google quote on quote fix 2025-10-08 00:42:36 -04:00
4b16fd5897 filter ads from the html endpoint 2025-09-30 19:56:30 -04:00
61deefb75b fucking operators are blocked too 2025-09-28 15:27:07 -04:00
8198287ec0 fallback to html endpoint when user searches with quotes 2025-09-28 15:23:57 -04:00
bf6319839e chrome ua does nothing, nevermind 2025-09-28 13:18:06 -04:00
fa4aa9a0fd bypass ddg challenge 2025-09-28 13:14:11 -04:00
c69abf41b0 google scraper fix 2025-09-11 02:05:21 -04:00
1b6182bc3c now scraping image resolutions on brave 2025-09-10 01:09:48 -04:00
1cfeabeb7f forgot the fucking frontend duhhhhh 2025-09-07 14:32:28 -04:00
73f8472eec re-added solofield, added mullvad for brave and google 2025-09-07 14:26:51 -04:00
0c90c4bc9e remove deep_simple_no_results nag 2025-09-06 16:24:02 -04:00
cfcc4ec8d1 please stop fucking up everything you piece of shit extract_json function 2025-09-06 16:11:51 -04:00
6d34d43a01 fixed greppr, again 2025-09-06 11:25:09 -04:00
c44d6292a0 holy shit can i stop fucking up this function for fucks sake 2025-09-06 09:44:05 -04:00
0b350d4d6e messed up json extractor again 2025-09-06 01:50:20 -04:00
de328fff1b messed up json extractor 2025-09-06 01:49:22 -04:00
8613c1e0f4 brave crash fix 2025-09-05 01:59:01 -04:00
d4aaebcd80 remove that thing 2025-08-30 23:41:43 -04:00
a2e056b47b workaround to get bigger wiki images 2025-08-30 23:32:01 -04:00
8a0a8359a8 handle captcha on coc coc 2025-08-30 19:21:19 -04:00
dcf5901809 fix warning in goolag scraper 2025-08-24 11:22:43 -04:00
aa9806300a doc changes 2025-08-19 23:45:05 -04:00
91ce5c1563 remove ssl ciphers they bug out the new method 2025-08-19 23:28:54 -04:00
cdf958d293 fix wikipedia crash 2025-08-10 21:55:15 -04:00
2d63475b07 fix MDN answers not rendering properly 2025-08-10 21:49:51 -04:00
ae31274db9 these comments were too long 2025-08-10 21:39:19 -04:00
20ef5b3e3a re-added stackoverflow instant answers 2025-08-10 21:37:45 -04:00
7c970031d0 fix #2 for real this time 2025-08-10 17:22:58 -04:00
2c2bd28a9f fix syntax highlighter 2025-08-10 17:15:42 -04:00
dea8b0a362 this was so much pain to figure out 2025-08-10 16:39:11 -04:00
da1ea1d6e8 add more hacks 2025-08-10 15:53:02 -04:00
2ca8fb0006 remove backtick 2025-08-10 15:44:46 -04:00
362cf61508 doc config changes number twoo 2025-08-10 15:43:49 -04:00
1828c63233 doc config changes 2025-08-10 15:42:14 -04:00
4215f2678d i always forget the fucking config 2025-08-09 13:34:45 -04:00
acd02d83d4 added cara.app 2025-08-09 13:28:36 -04:00
319640cd77 greppr fix 2025-08-09 11:00:48 -04:00
ad535a1609 fix broken mojeek instant answer iamge 2025-08-07 17:56:39 -04:00
7ac53c6e11 bypass mojeek bot check 2025-08-07 17:49:45 -04:00
f33f02e816 so fucking dumb 2025-08-06 21:10:58 -04:00
27b8509ac0 fix ddg answer image 404 2025-08-06 21:09:33 -04:00
1e52982cb9 fix ddg returning weird reresults when no match is found 2025-08-06 20:19:13 -04:00
3f2bfcb8c7 aaaaaa part 3 2025-08-04 00:39:47 -04:00
d3ef1a67c0 aaaaaa part 2 2025-08-04 00:39:13 -04:00
9afef55d89 aaaaaa 2025-08-04 00:37:12 -04:00
706b490bf3 bypass captcha??? hopefullly........ 2025-08-04 00:34:06 -04:00
74f7c920f6 handle vimeo captchas 2025-08-04 00:18:38 -04:00
3dbcf60a3e added vimeo 2025-08-04 00:00:39 -04:00
f30872134f handle mojeek block 2025-08-03 12:28:57 -04:00
2c4dc7da84 lol forgot the request is still for yt 2025-08-02 17:44:10 -04:00
5a0f5b868a i hate git 2025-08-02 16:48:58 -04:00
7cf403e125 forgot the fucking config 2025-08-02 16:39:34 -04:00
73b7922898 added sepia search 2025-08-02 16:38:36 -04:00
336cb49d98 im so retarded 2025-08-01 20:54:08 -04:00
8cd8e7380f readme 2025-07-30 23:21:12 -04:00
3d9d95db34 ffffffffffukccccccccc 2025-07-30 15:49:05 -04:00
eb73b1f357 dementia 2025-07-30 15:47:29 -04:00
f7499294de added cock cock, removed solofield 2025-07-30 15:35:27 -04:00
60f7150008 readme 2025-07-28 18:53:21 -04:00
2b8d90af12 forgot the config fucking dementiacatwill 2025-07-27 21:53:20 -04:00
0f803804a4 forgot the settings damn it 2025-07-27 21:48:10 -04:00
f43feff0aa added baidu, the best search engine 2025-07-27 21:46:03 -04:00
0bdd5e73df added kuuro theme 2025-07-27 11:52:09 -04:00
430c0a2f0f fix potential xss woops 2025-07-08 23:10:13 -04:00
1a00bf8069 duckduckgo spelling fix 2025-07-08 23:08:12 -04:00
502f6d12e4 marginalia crash fix 2025-07-03 19:43:58 -04:00
a2bc1e6190 bypass anubis bullshit on marginalia 2025-06-20 01:18:57 -04:00
f73b5f0298 fix yandex web 2025-06-18 10:30:31 -04:00
3e1487e614 maybe we should call the function chucknuts 2025-06-12 10:30:47 -04:00
037566bbba handle google captcha 2025-06-11 18:44:41 -04:00
b61bc6d07c fix google image crash 2025-06-01 13:03:39 -04:00
8d50667b0d handle imgur ip block 2025-05-27 20:03:40 -04:00
a0545b6006 fix startpage videos 2025-05-24 20:49:49 -04:00
78aa2e198f fuiwhwehfuiewuf 2025-04-19 10:42:21 -04:00
b85820cbcd hopefully this fixes bing images my fucking god 2025-04-19 10:37:35 -04:00
4b85841a3e forgot nsfw filter god damn 2025-04-17 21:41:12 -04:00
8d07e72dfe forgot settings, god damn i have dementia 2025-04-17 20:08:15 -04:00
566680fe36 ok its unfucked now 2025-04-17 20:06:42 -04:00
077692db49 i fucking hate bing 2025-04-17 20:05:58 -04:00
e4bf53cdaa forgot config 2025-04-17 20:02:02 -04:00
4489bb21e5 forgot flickr 2025-04-17 20:00:33 -04:00
ff8b1addf7 fixed bing images failing to load, added flickr 2025-04-17 19:54:34 -04:00
3e2c3fc5d9 fixed google videos 2025-04-02 21:40:53 -04:00
49ddd1a216 duckduckgo images nsfw fix 2025-03-20 21:05:36 -04:00
81ca8eaddc gore theme fix 2025-03-16 03:39:58 -04:00
c9c8d578f3 Merge branch 'master' of https://git.lolcat.ca/lolcat/4get 2025-03-02 21:58:34 -05:00
b2203804c7 path traversal exploit (this is what you get for using free software) 2025-03-02 21:58:18 -05:00
13dfa9240c Merge pull request 'fix Dockerfile build.' (#67) from Fijxu/4get:dockerfile-fix into master
Reviewed-on: lolcat/4get#67
2025-02-04 07:05:12 +00:00
0a53c3605a fix Dockerfile build.
The `alpine:latest` image do not longer include php83 on their repos.
Using a specific image tag is better to prevent breakages on the future.

Ref: https://github.com/dnaprawa/dockerfile-best-practices?tab=readme-ov-file#the-latest-is-an-evil-choose-specific-image-tag
2025-02-02 01:37:14 -03:00
39 changed files with 9513 additions and 2694 deletions

View File

@@ -1,48 +0,0 @@
name: '4get CI'
on:
workflow_dispatch:
push:
branches:
- '*'
paths-ignore:
- 'README.md'
- 'docker-compose.yaml'
- '.gitignore'
- 'docs/**'
jobs:
build:
runs-on: docker
steps:
- uses: actions/checkout@v4
name: Checkout 4get repository
- uses: docker/setup-buildx-action@v3
name: Setup Docker BuildX system
- name: Login to Docker Container Registry
uses: docker/login-action@v3
with:
registry: git.lolcat.ca
username: ${{ secrets.USERNAME }}
password: ${{ secrets.TOKEN }}
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: git.lolcat.ca/lolcat/4get
tags: |
type=sha,format=short,prefix={{date 'YYYY.MM.DD'}}-,enable=${{ github.ref == format('refs/heads/{0}', 'master') }}
type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', 'master') }}
- uses: docker/build-push-action@v6
name: Build images
with:
context: .
file: Dockerfile
tags: ${{ steps.meta.outputs.tags }}
platforms: linux/amd64
push: true

View File

@@ -1,8 +1,8 @@
FROM alpine:latest
FROM alpine:3.21
WORKDIR /var/www/html/4get
RUN apk update && apk upgrade
RUN apk add php apache2-ssl php83-fileinfo php83-openssl php83-iconv php83-common php83-dom php83-sodium php83-curl curl php83-pecl-apcu php83-apache2 imagemagick php83-pecl-imagick php-mbstring imagemagick-webp imagemagick-jpeg
RUN apk add php apache2-ssl php84-fileinfo php84-openssl php84-iconv php84-common php84-dom php84-sodium php84-curl curl php84-pecl-apcu php84-apache2 imagemagick php84-pecl-imagick php84-mbstring imagemagick-webp imagemagick-jpeg
COPY . .

View File

@@ -32,20 +32,21 @@ tl;dr 4get is the best way to browse for shit.
# Supported websites
| Web | Images | Videos | News | Music | Autocompleter |
|------------|--------------|------------|------------|------------|---------------|
|------------|--------------|--------------|------------|------------|---------------|
| DuckDuckGo | DuckDuckGo | YouTube | DuckDuckGo | Soundcloud | Brave |
| Brave | Brave | DuckDuckGo | Brave | | DuckDuckGo |
| Yandex | Yandex | Brave | Google | | Yandex |
| Google | Google | Yandex | Startpage | | Google |
| Startpage | Startpage | Google | Qwant | | Startpage |
| Qwant | Qwant | Startpage | Mojeek | | Kagi |
| Ghostery | Yep | Qwant | | | Qwant |
| Yep | Solofield | Solofield | | | Ghostery |
| Greppr | Pinterest | | | | Yep |
| Crowdview | 500px | | | | Marginalia |
| Brave | Brave | Sepia Search | Brave | | DuckDuckGo |
| Yandex | Yandex | DuckDuckGo | Google | | Yandex |
| Google | Google | Brave | Startpage | | Google |
| Startpage | Startpage | Yandex | Qwant | | Startpage |
| Qwant | Qwant | Google | Mojeek | | Kagi |
| Ghostery | Yep | Startpage | Baidu | | Qwant |
| Yep | Baidu | Qwant | | | Ghostery |
| Greppr | Pinterest | Baidu | | | Yep |
| Crowdview | 500px | Coc Coc | | | Marginalia |
| Mwmbl | VSCO | | | | YouTube |
| Mojeek | Imgur | | | | Soundcloud |
| Solofield | FindThatMeme | | | | |
| Baidu | FindThatMeme | | | | |
| Coc Coc | | | | | |
| Marginalia | | | | | |
| wiby | | | | | |
| Curlie | | | | | |

View File

@@ -0,0 +1,8 @@
# Specify API keys for the Google API in the following format:
# <key>
#
# Generate keys here:
# https://developers.google.com/custom-search/v1/overview
# Make sure to use a different Google account for each key, cause I'm
# pretty sure the ratelimit is on a per-account basis :P
#

View File

@@ -43,7 +43,7 @@ class config{
// If this regex expression matches on the user agent, it blocks the request
// Not useful at all against a targetted attack
const HEADER_REGEX = '/bot|wget|curl|python-requests|scrapy|go-http-client|ruby|yahoo|spider|qwant/i';
const HEADER_REGEX = '/bot|wget|curl|python-requests|scrapy|go-http-client|ruby|yahoo|spider|qwant|meta/i';
// Block clients who present any of the following headers in their request (SPECIFY IN !!lowercase!!)
// Eg: ["x-forwarded-for", "x-via", "forwarded-for", "via"];
@@ -100,7 +100,6 @@ class config{
"https://4get.sijh.net",
"https://4get.hbubli.cc",
"https://4get.plunked.party",
"https://4get.seitan-ayoub.lol",
"https://4get.etenie.pl",
"https://4get.lunar.icu",
"https://4get.dcs0.hu",
@@ -119,7 +118,7 @@ class config{
// Default user agent to use for scraper requests. Sometimes ignored to get specific webpages
// Changing this might break things.
const USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0";
const USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:142.0) Gecko/20100101 Firefox/142.0";
// Proxy pool assignments for each scraper
// false = Use server's raw IP
@@ -129,9 +128,14 @@ class config{
const PROXY_BRAVE = false;
const PROXY_FB = false; // facebook
const PROXY_GOOGLE = false;
const PROXY_GOOGLE_API = false;
const PROXY_GOOGLE_CSE = false;
const PROXY_MULLVAD_GOOGLE = false;
const PROXY_MULLVAD_BRAVE = false;
const PROXY_STARTPAGE = false;
const PROXY_QWANT = false;
const PROXY_BAIDU = false;
const PROXY_COCCOC = false;
const PROXY_GHOSTERY = false;
const PROXY_MARGINALIA = false;
const PROXY_MOJEEK = false;
@@ -141,8 +145,14 @@ class config{
const PROXY_WIBY = false;
const PROXY_CURLIE = false;
const PROXY_YT = false; // youtube
const PROXY_ARCHIVEORG = false;
const PROXY_SEPIASEARCH = false;
const PROXY_ODYSEE = false;
const PROXY_VIMEO = false;
const PROXY_YEP = false;
const PROXY_PINTEREST = false;
const PROXY_SANKAKUCOMPLEX = false;
const PROXY_FLICKR = false;
const PROXY_FIVEHPX = false;
const PROXY_VSCO = false;
const PROXY_SEZNAM = false;
@@ -152,6 +162,7 @@ class config{
const PROXY_MWMBL = false;
const PROXY_FTM = false; // findthatmeme
const PROXY_IMGUR = false;
const PROXY_CARA = false;
const PROXY_YANDEX_W = false; // yandex web
const PROXY_YANDEX_I = false; // yandex images
const PROXY_YANDEX_V = false; // yandex videos
@@ -160,7 +171,7 @@ class config{
// Scraper-specific parameters
//
// GOOGLE CSE
// GOOGLE CSE & GOOGLE API
const GOOGLE_CX_ENDPOINT = "d4e68b99b876541f0";
// MARGINALIA

View File

@@ -7,27 +7,35 @@ Then, install the following dependencies:
```sh
apt update
apt upgrade
apt install php-mbstring apache2 certbot php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-php
apt install php-mbstring apache2 certbot php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-fcgid php-fpm
```
Enable the required modules:
```sh
a2dismod mpm_prefork
a2enmod mpm_event
a2enmod ssl
a2enmod rewrite
a2enmod proxy_fcgi setenvif actions alias
a2enmod http2
a2enmod headers
a2enmod proxy
```
And enable these optional ones, which might be useful to you later on. The `proxy` module is useful for setting up reverse proxies to services like gitea, and `headers` is useful to tweak global header values:
Tune the performance of php-fpm. You will need to edit this file according to your server specs and number of users. Edit the file at `/etc/php/8.4/pool.d/www.conf`:
```sh
a2enmod proxy
a2enmod headers
pm = static
pm.max_children = 50
```
These values are what I currently use on 4get.ca, but for personal use, you can set `pm` to `ondemand` and `pm.max_children` to `20` (if you want those thumbnails to load fast!)
Now, restart apache2:
```sh
service apache2 restart
```
Just for good measure, please check if your webserver is running. Access it through HTTP, not HTTPS. You should see the apache2 default landing page.
Just for good measure, please check if your webserver is running. Access it through HTTP, not HTTPS. You should see the apache2 default landing page. Just a note, http2 won't work just yet since you don't have SSL yet.
## 000-default.conf
Now, edit the following file: `/etc/apache2/sites-available/000-default.conf`, remove everything and carefully add each rule specified here, while making sure to replace my domains with your own:
@@ -74,12 +82,27 @@ Now, edit the following file: `/etc/apache2/sites-available/000-default.conf`, r
DocumentRoot /var/www/4get
<FilesMatch \.php$>
SetHandler "proxy:unix:/run/php/php8.1-fpm.sock|fcgi://localhost/"
</FilesMatch>
Options -MultiViews
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.php [NC,L]
<Directory /var/www/4get>
Options -MultiViews
AllowOverride All
Require all granted
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.php [NC,L]
</Directory>
# deny access to private resources
<Directory /var/www/4get/data/>
Order Deny,allow
@@ -116,6 +139,7 @@ Make sure to replace `4get.ca` with your own domain under the `SSLCertificate*`
ServerAdmin will@lolcat.ca
DocumentRoot /var/www/4get
Protocols h2 http/1.1
SSLEngine On
SSLOptions +StdEnvVars
@@ -128,6 +152,10 @@ Make sure to replace `4get.ca` with your own domain under the `SSLCertificate*`
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/css
<FilesMatch \.php$>
SetHandler "proxy:unix:/run/php/php8.1-fpm.sock|fcgi://localhost/"
</FilesMatch>
SSLCertificateFile /etc/letsencrypt/live/4get.ca/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/4get.ca/privkey.pem
SSLCertificateChainFile /etc/letsencrypt/live/4get.ca/chain.pem

View File

@@ -9,34 +9,47 @@ Welcome! This guide assumes that you have a working 4get instance. This will hel
4. The captcha font is located in `data/fonts/captcha.ttf`
# Cloudflare bypass (TLS check)
**Note: this only allows you to bypass the browser integrity checks. Captchas & javascript challenges will not be bypassed.**
>These instructions have been updated to work with Debian 13 Trixie.
Configuring this lets you fetch images sitting behind Cloudflare and allows you to scrape the **Yep** & the **Mwmbl** search engines. Please be aware that APT will fight against you and will re-install the openSSL-version of curl constantly when updating.
**Note: this only allows you to bypass the browser integrity checks. Captchas & javascript challenges will not be bypassed by this program!**
First, follow these instructions. Only install the Firefox modules:
Configuring this lets you fetch images sitting behind Cloudflare and allows you to scrape the **Yep** search engine.
https://github.com/lwthiker/curl-impersonate/blob/main/INSTALL.md#native-build
Once you did this, you should be able to run the following inside your terminal:
To come up with this set of instructions, I used [this guide](https://github.com/lwthiker/curl-impersonate/blob/main/INSTALL.md#native-build) as a reference, but trust me you probably want to stick to what's written on this page.
First, compile curl-impersonate (the firefox flavor).
```sh
$ curl_ff117 --version
curl 8.1.1 (x86_64-pc-linux-gnu) libcurl/8.1.1 NSS/3.92 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 nghttp2/1.56.0
Release-Date: 2023-05-23
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB SSL threadsafe UnixSockets zstd
```
Now, after compiling, you should have a `libcurl-impersonate-ff.so` sitting somewhere. Mine (on my debian install) is located at `/usr/local/lib/libcurl-impersonate-ff.so`.
Find the `libcurl.so.4` file used by your current installation of curl. For me, this file is located at `/usr/lib/x86_64-linux-gnu/libcurl.so.4`
Now comes the sketchy part: replace `libcurl.so.4` with `libcurl-impersonate-ff.so`. You can do this in the following way:
```sh
sudo rm /usr/lib/x86_64-linux-gnu/libcurl.so.4
sudo cp /usr/local/lib/libcurl-impersonate-ff.so /usr/lib/x86_64-linux-gnu/libcurl.so.4
git clone https://github.com/lwthiker/curl-impersonate/
cd curl-impersonate
sudo apt install build-essential pkg-config cmake ninja-build curl autoconf automake libtool python3-pip libnss3 libnss3-dev
mkdir build
cd build
../configure
make firefox-build
sudo make firefox-install
sudo ldconfig
```
Make sure to restart your webserver and/or PHP daemon, otherwise it will keep using the old library. You should now be able to bypass Cloudflare's shitty checks!!
Now, after compiling, you should have a `libcurl-impersonate-ff.so` sitting somewhere. Mine is located at `/usr/local/lib/libcurl-impersonate-ff.so`. Patch your PHP install so that it loads the right library:
```sh
sudo systemctl edit php8.4-fpm.service
```
^This will open a text editor. Add the following shit in there, in between those 2 comments I pasted for ya just for reference:
```sh
### Editing /etc/systemd/system/php8.4-fpm.service.d/override.conf
### Anything between here and the comment below will become the contents of the>
[Service]
Environment="LD_PRELOAD=/usr/local/lib/libcurl-impersonate-ff.so"
Environment="CURL_IMPERSONATE=firefox117"
### Edits below this comment will be discarded
```
Restart php8.4-fpm. (`sudo service php8.4-fpm restart`). To test things out, try making a search on "Yep", they check for SSL. If you get results (or a timeout, this piece of shit engine is slow as fuck) that means it works!
# Robots.txt
Make sure you configure this right to optimize your search engine presence! Head over to `/robots.txt` and change the 4get.ca domain to your own domain.

View File

@@ -15,7 +15,12 @@ class favicon{
header("Content-Type: image/png");
if(substr_count($url, "/") !== 2){
if(
preg_match(
'/^https?:\/\/[A-Za-z0-9.-]+$/',
$url
) === 0
){
header("X-Error: Only provide the protocol and domain");
$this->defaulticon();

100
lib/anubis.php Normal file
View File

@@ -0,0 +1,100 @@
<?php
//
// Reference
// https://github.com/TecharoHQ/anubis/blob/ecc716940e34ebe7249974f2789a99a2c7115e4e/web/js/proof-of-work.mjs
//
class anubis{
public function __construct(){
include_once "fuckhtml.php";
$this->fuckhtml = new fuckhtml();
}
public function scrape($html){
$this->fuckhtml->load($html);
$script =
$this->fuckhtml
->getElementById(
"anubis_challenge",
"script"
);
if($script === false){
throw new Exception("Failed to scrape anubis challenge data");
}
$script =
json_decode(
$this->fuckhtml
->getTextContent(
$script
),
true
);
if($script === null){
throw new Exception("Failed to decode anubis challenge data");
}
if(
!isset($script["challenge"]) ||
!isset($script["rules"]["difficulty"]) ||
!is_int($script["rules"]["difficulty"]) ||
!is_string($script["challenge"])
){
throw new Exception("Found invalid challenge data");
}
return $this->rape($script["challenge"], $script["rules"]["difficulty"]);
}
private function is_valid_hash($hash, $difficulty){
for ($i=0; $i<$difficulty; $i++) {
$index = (int)floor($i / 2);
$nibble = $i % 2;
$byte = ord($hash[$index]);
$nibble = ($byte >> ($nibble === 0 ? 4 : 0)) & 0x0f;
if($nibble !== 0){
return false;
}
}
return true;
}
public function rape($data, $difficulty = 5){
$nonce = 0;
while(true){
$hash_binary = hash("sha256", $data . $nonce, true);
if($this->is_valid_hash($hash_binary, $difficulty)){
$hash_hex = bin2hex($hash_binary);
return [
"response" => $hash_hex,
//"data" => $data,
//"difficulty" => $difficulty,
"nonce" => $nonce
];
}
$nonce++;
}
}
}

View File

@@ -9,7 +9,7 @@ class backend{
/*
Proxy stuff
*/
public function get_ip(){
public function get_ip($proxy_index_raw = null){
$pool = constant("config::PROXY_" . strtoupper($this->scraper));
if($pool === false){
@@ -19,7 +19,10 @@ class backend{
}
// indent
if($proxy_index_raw === null){
$proxy_index_raw = apcu_inc("p." . $this->scraper);
}
$proxylist = file_get_contents("data/proxies/" . $pool . ".txt");
$proxylist = explode("\n", $proxylist);
@@ -32,6 +35,12 @@ class backend{
$proxylist = array_values($proxylist);
if(count($proxylist) === 0){
throw new Exception("A proxy list was specified but it's empty!");
}
//echo $proxylist[$proxy_index_raw % count($proxylist)];
return $proxylist[$proxy_index_raw % count($proxylist)];
}
@@ -88,6 +97,30 @@ class backend{
}
}
// API key rotation
public function get_key(){
$keys = file_get_contents("data/api_keys/" . $this->scraper . ".txt");
$keys = explode("\n", $keys);
$keys = array_filter($keys, function($entry){
$entry = ltrim($entry);
return strlen($entry) > 0 && substr($entry, 0, 1) != "#";
});
$keys = array_values($keys);
if(count($keys) === 0){
throw new Exception("Please specify API keys in data/api_keys/" . $this->scraper . ".txt");
}
$increment = apcu_inc("s." . $this->scraper) % count($keys);
return [
"key" => $keys[$increment],
"increment" => $increment
];
}
/*

View File

@@ -1,144 +0,0 @@
<?php
// https://www.bing.com/search?q=url%3Ahttps%3A%2F%2Flolcat.ca
// https://cc.bingj.com/cache.aspx?q=url%3ahttps%3a%2f%2flolcat.ca&d=4769685974291356&mkt=en-CA&setlang=en-US&w=tEsWuE7HW3Z5AIPQMVkDH4WaotS4LrK-
// <div class="b_attribution" u="0N|5119|4769685974291356|tEsWuE7HW3Z5AIPQMVkDH4WaotS4LrK-" tabindex="0">
new bingcache();
class bingcache{
public function __construct(){
if(
!isset($_GET["s"]) ||
$this->validate_url($_GET["s"]) === false
){
var_dump($this->validate_url($_GET["s"]));
$this->do404("Please provide a valid URL.");
}
$url = $_GET["s"];
$curlproc = curl_init();
curl_setopt(
$curlproc,
CURLOPT_URL,
"https://www.bing.com/search?q=url%3A" .
urlencode($url)
);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt(
$curlproc,
CURLOPT_HTTPHEADER,
["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 5);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
$this->do404("Failed to connect to bing servers. Please try again later.");
}
curl_close($curlproc);
preg_match(
'/<div class="b_attribution" u="(.*)" tabindex="0">/',
$data,
$keys
);
print_r($keys);
if(count($keys) === 0){
$this->do404("Bing has not archived this URL.");
}
$keys = explode("|", $keys[1]);
$count = count($keys);
//header("Location: https://cc.bingj.com/cache.aspx?d=" . $keys[$count - 2] . "&w=" . $keys[$count - 1]);
echo("Location: https://cc.bingj.com/cache.aspx?d=" . $keys[$count - 2] . "&w=" . $keys[$count - 1]);
}
public function do404($text){
include "lib/frontend.php";
$frontend = new frontend();
echo
$frontend->load(
"error.html",
[
"title" => "Shit",
"text" => $text
]
);
die();
}
public function validate_url($url){
$url_parts = parse_url($url);
// check if required parts are there
if(
!isset($url_parts["scheme"]) ||
!(
$url_parts["scheme"] == "http" ||
$url_parts["scheme"] == "https"
) ||
!isset($url_parts["host"])
){
return false;
}
if(
// if its not an RFC-valid URL
!filter_var($url, FILTER_VALIDATE_URL)
){
return false;
}
$ip =
str_replace(
["[", "]"], // handle ipv6
"",
$url_parts["host"]
);
// if its not an IP
if(!filter_var($ip, FILTER_VALIDATE_IP)){
// resolve domain's IP
$ip = gethostbyname($url_parts["host"] . ".");
}
// check if its localhost
return filter_var(
$ip,
FILTER_VALIDATE_IP, FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE
);
}
}

View File

@@ -403,27 +403,28 @@ class frontend{
$text =
trim(
preg_replace(
'/<\/span>$/',
"", // remove stray ending span because of the <?php stuff
'/<code [^>]+>/',
"",
str_replace(
[
'<br />',
'&nbsp;'
],
[
"\n", // replace <br> with newlines
" " // replace html entity to space
],
str_replace(
[
// leading <?php garbage
"<span style=\"color: c-default\">\n&lt;?php&nbsp;",
"<code>",
"<br />",
"&nbsp;",
"<pre>",
"</pre>",
"</code>"
],
[
"\n",
" ",
highlight_string("<?php " . $text, true)
)
"",
"",
""
],
explode(
"&lt;?php",
highlight_string("<?php " . $text, true),
2
)[1]
)
)
);
@@ -936,10 +937,14 @@ class frontend{
"display" => "Scraper",
"option" => [
"ddg" => "DuckDuckGo",
//"yahoo" => "Yahoo!",
"brave" => "Brave",
"mullvad_brave" => "Mullvad (Brave)",
"yandex" => "Yandex",
"google" => "Google",
"google_api" => "Google API",
"google_cse" => "Google CSE",
"mullvad_google" => "Mullvad (Google)",
"startpage" => "Startpage",
"qwant" => "Qwant",
"ghostery" => "Ghostery",
@@ -948,6 +953,8 @@ class frontend{
"crowdview" => "Crowdview",
"mwmbl" => "Mwmbl",
"mojeek" => "Mojeek",
"baidu" => "Baidu",
"coccoc" => "Cốc Cốc",
"solofield" => "Solofield",
"marginalia" => "Marginalia",
"wiby" => "wiby",
@@ -968,12 +975,16 @@ class frontend{
"startpage" => "Startpage",
"qwant" => "Qwant",
"yep" => "Yep",
"baidu" => "Baidu",
"solofield" => "Solofield",
"pinterest" => "Pinterest",
"cara" => "Cara",
"flickr" => "Flickr",
"fivehpx" => "500px",
"vsco" => "VSCO",
"imgur" => "Imgur",
"ftm" => "FindThatMeme"
"ftm" => "FindThatMeme",
//"sankakucomplex" => "SankakuComplex"
]
];
break;
@@ -983,6 +994,10 @@ class frontend{
"display" => "Scraper",
"option" => [
"yt" => "YouTube",
//"archiveorg" => "Archive.org",
"vimeo" => "Vimeo",
//"odysee" => "Odysee",
"sepiasearch" => "Sepia Search",
//"fb" => "Facebook videos",
"ddg" => "DuckDuckGo",
"brave" => "Brave",
@@ -990,6 +1005,8 @@ class frontend{
"google" => "Google",
"startpage" => "Startpage",
"qwant" => "Qwant",
"baidu" => "Baidu",
"coccoc" => "Cốc Cốc",
"solofield" => "Solofield"
]
];
@@ -1005,7 +1022,8 @@ class frontend{
"startpage" => "Startpage",
"qwant" => "Qwant",
"yep" => "Yep",
"mojeek" => "Mojeek"
"mojeek" => "Mojeek",
"baidu" => "Baidu"
]
];
break;
@@ -1330,6 +1348,7 @@ class frontend{
return htmlspecialchars($image);
}
//return "https://4get.ca/proxy?i=" . urlencode($image) . "&s=" . $format;
return "/proxy?i=" . urlencode($image) . "&s=" . $format;
}

View File

@@ -240,12 +240,13 @@ class fuckhtml{
public function getElementsByFuzzyAttributeValue(string $name, string $value, $collection = null){
$elems = $this->getElementsByAttributeName($name, $collection);
$value =
explode(
" ",
trim(
preg_replace(
'/ +/',
'/\s+/',
" ",
$value
)
@@ -258,7 +259,18 @@ class fuckhtml{
foreach($elem["attributes"] as $attrib_name => $attrib_value){
$attrib_value = explode(" ", $attrib_value);
$attrib_value =
explode(
" ",
trim(
preg_replace(
'/\s+/',
" ",
$attrib_value
)
)
);
$ac = count($attrib_value);
$nc = count($value);
$cr = 0;
@@ -539,6 +551,36 @@ class fuckhtml{
switch($json[$i]){
case "\"":
case "'":
if(
$i !== 0 && // only check if a quote could be there
(
(
$json[$i - 1] === "\\" &&
(
$i === 2 ||
$json[$i - 2] === "\\"
)
) ||
$json[$i - 1] !== "\\"
)
){
// found a non-escaped quote
if($in_quote === null){
// open quote
$in_quote = $json[$i];
}elseif($in_quote === $json[$i]){
// close quote
$in_quote = null;
}
}
break;
case "[":
if($in_quote === null){
@@ -574,37 +616,20 @@ class fuckhtml{
$object_level--;
}
break;
case "\"":
case "'":
if(
$i !== 0 &&
$json[$i - 1] !== "\\"
){
// found a non-escaped quote
if($in_quote === null){
// open quote
$in_quote = $json[$i];
}elseif($in_quote === $json[$i]){
// close quote
$in_quote = null;
}
}
break;
}
if(
$start !== null &&
$array_level === 0 &&
$object_level === 0
$object_level === 0 &&
$start !== null
){
return substr($json, $start, $i - $start + 1);
break;
}
}
// fallback
return "[]";
}
}

View File

@@ -34,22 +34,46 @@ try{
)
){
if(
!isset($image["query"]) ||
!isset($image["path"]) ||
$image["path"] != "/th"
){
if(!isset($image["path"])){
header("X-Error: Invalid bing image path");
header("X-Error: Missing bing image path");
$proxy->do404();
die();
}
//
// get image ID
// formations:
// https://tse2.mm.bing.net/th/id/OIP.3yLBkUPn8EXA1wlhWP2BHwHaE3
// https://tse2.mm.bing.net/th?id=OIP.3yLBkUPn8EXA1wlhWP2BHwHaE3
//
$id = null;
if(isset($image["query"])){
parse_str($image["query"], $str);
if(!isset($str["id"])){
if(isset($str["id"])){
header("X-Error: Missing bing ID");
$id = $str["id"];
}
}
if($id === null){
$id = explode("/th/id/", $image["path"], 2);
if(count($id) !== 2){
// malformed
return $url;
}
$id = $id[1];
}
if(is_array($id)){
header("X-Error: Missing bing id parameter");
$proxy->do404();
die();
}
@@ -63,7 +87,7 @@ try{
case "cover": $req = "&w=207&h=270&p=0&qlt=90"; break;
}
$proxy->stream_linear_image("https://" . $image["host"] . "/th?id=" . urlencode($str["id"]) . $req, "https://www.bing.com");
$proxy->stream_linear_image("https://" . $image["host"] . "/th?id=" . rawurlencode($id) . $req, "https://www.bing.com");
die();
}

2229
scraper/baidu.php Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -352,7 +352,6 @@ class brave{
$html = fread($handle, filesize("scraper/brave.html"));
fclose($handle);*/
try{
$html =
$this->get(
@@ -1290,13 +1289,13 @@ class brave{
"source" => [
[
"url" => $result["properties"]["url"],
"width" => null,
"height" => null
"width" => (int)$result["properties"]["width"],
"height" => (int)$result["properties"]["height"]
],
[
"url" => $result["thumbnail"]["src"],
"width" => null,
"height" => null
"width" => (int)$result["thumbnail"]["width"],
"height" => (int)$result["thumbnail"]["height"]
]
],
"url" => $result["url"]

847
scraper/cara.php Normal file
View File

@@ -0,0 +1,847 @@
<?php
class cara{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("cara");
}
public function getfilters($page){
return [
"sort" => [
"display" => "Sort by",
"option" => [
"Top" => "Top",
"MostRecent" => "Most Recent"
]
],
"type" => [
"display" => "Post type",
"option" => [
"any" => "Any type",
"portfolio" => "Portfolio", // {"posts":["portfolio"]}
"timeline" => "Timeline" // {"posts":["timeline"]}
]
],
"fields" => [
"display" => "Field/Medium",
"option" => [
"any" => "Any field",
"2D" => "2D Work",
"3D" => "3D Work",
"3DPrinting" => "3D Printing",
"Acrylic" => "Acrylic",
"AlcoholMarkers" => "Alcohol Markers",
"Animation" => "Animation",
"Chalk" => "Chalk",
"Charcoal" => "Charcoal",
"Colored pencil" => "Colored pencil",
"Conte" => "Conte",
"Crayon" => "Crayon",
"Digital" => "Digital",
"Gouache" => "Gouache",
"Ink" => "Ink",
"MixedMedia" => "Mixed-Media",
"Oil" => "Oil",
"Oil-based Markers" => "Oil-based Markers",
"Other" => "Other",
"Pastels" => "Pastels",
"Photography" => "Photography",
"Sculpture" => "Sculpture",
"Sketches" => "Sketches",
"Tattoos" => "Tattoos",
"Traditional" => "Traditional",
"VFX" => "VFX",
"Watercolor" => "Watercolor"
]
],
"category" => [
"display" => "Category",
"option" => [
"any" => "Any category",
"3DScanning" => "3D Scanning",
"Abstract" => "Abstract",
"Adoptable" => "Adoptable",
"Anatomy" => "Anatomy",
"Animals" => "Animals",
"Anime" => "Anime",
"App" => "App",
"ArchitecturalConcepts" => "Architectural Concepts",
"ArchitecturalVisualization" => "Architectural Visualization",
"AugmentedReality" => "Augmented Reality",
"Automotive" => "Automotive",
"BoardGameArt" => "Board Game Art",
"BookIllustration" => "Book Illustration",
"CardGameArt" => "Card Game Art",
"CeramicsPottery" => "Ceramics/Pottery",
"CharacterAnimation" => "Character Animation",
"CharacterDesign" => "Character Design",
"CharacterModeling" => "Character Modeling",
"ChildrensArt" => "Children's Illustration",
"Collectibles" => "Collectibles",
"ColoringPage" => "Coloring Page",
"ComicArt" => "Comic Art",
"ConceptArt" => "Concept Art",
"Cosplay" => "Cosplay",
"CostumeDesign" => "Costume Design",
"CoverArt" => "Cover Art",
"Creatures" => "Creatures",
"Diorama" => "Diorama",
"EditorialIllustration" => "Editorial Illustration",
"EmbroiderySewing" => "Embroidery/Sewing",
"EnvironmentalConceptArt" => "Environmental Concept Art",
"EnvironmentalConceptDesign" => "Environmental Concept Design",
"FanArt" => "Fan Art",
"Fantasy" => "Fantasy",
"Fashion" => "Fashion",
"FashionStyling" => "Fashion Styling",
"FiberArts" => "Fiber Arts",
"Furry" => "Furry",
"GameArt" => "Game Art",
"GameplayDesign" => "Gameplay Design",
"GamesEnvironmentArt" => "Games Environment Art",
"Gem" => "Gem",
"GraphicDesign" => "Graphic Design",
"Handicraft" => "Handicraft",
"HairStyling" => "Hair Styling",
"HardSurface" => "Hard Surface",
"Horror" => "Horror",
"Illustration" => "Illustration",
"IllustrationVisualization" => "Illustration Visualization",
"IndustrialDesign" => "Industrial Design",
"Jewelry" => "Jewelry",
"KnittingCrochet" => "Knitting/Crochet",
"Landscape" => "Landscape",
"LevelDesign" => "Level Design",
"Lighting" => "Lighting",
"Makeup" => "Makeup",
"Manga" => "Manga",
"MapsCartography" => "Maps/Cartography",
"MattePainting" => "Matte Painting",
"Materials" => "Materials",
"MechanicalDesign" => "Mechanical Design",
"Medical" => "Medical",
"Mecha" => "Mecha",
"MiniatureArt" => "Miniature Art",
"MotionGraphics" => "Motion Graphics",
"FrescoMurals" => "Fresco/Murals",
"Natural" => "Natural",
"Original Character" => "Original Character",
"Overlay" => "Overlay",
"PleinAir" => "Plein Air",
"Photogrammetry" => "Photogrammetry",
"PixelArt" => "Pixel Art",
"Portraits" => "Portraits",
"Props" => "Props",
"ProductDesign" => "Product Design",
"PublicDomain" => "Public Domain or Royalty Free",
"Real-Time3DEnvironmentArt" => "Real-Time 3D Environment Art",
"Realism" => "Realism",
"ScienceFiction" => "Science Fiction",
"ScientificVisualization" => "Scientific Visualization",
"Scripts" => "Scripts",
"StillLife" => "Still Life",
"Storyboards" => "Storyboards",
"Stylized" => "Stylized",
"Surreal" => "Surreal",
"TechnicalArt" => "Technical Art",
"Textures" => "Textures",
"Tools" => "Tools",
"Toys" => "Toys",
"ToyPackaging" => "Toy Packaging",
"Tutorials" => "Tutorials",
"UIArt" => "User Interface (UI) Art",
"UrbanSketch" => "Urban Sketch",
"VFXforAnimation" => "VFX for Animation",
"VFXforFilm" => "VFX for Film",
"VFXforGames" => "VFX for Games",
"VFXforRealTime" => "VFX for Real-Time",
"VFXforTV" => "VFX for TV",
"Vehicles" => "Vehicles",
"VirtualReality" => "Virtual Reality",
"VisualDevelopment" => "Visual Development",
"VoxelArt" => "Voxel Art",
"Vtubers" => "Vtubers",
"WIP" => "WIP (Work in Progress)",
"Web" => "Web",
"Weapons" => "Weapons",
"Wildlife" => "Wildlife",
"Woodcutting" => "Woodcutting"
]
],
"software" => [
"display" => "Software",
"option" => [
"any" => "Any software",
"123D" => "123D",
"123DCatch" => "123D Catch",
"3DBee" => "3DBee",
"3DCoat" => "3DCoat",
"3DCoatPrint" => "3DCoatPrint",
"3DCoatTextura" => "3DCoatTextura",
"3DEqualizer" => "3DEqualizer",
"3DFZephyr" => "3DF Zephyr",
"3Delight" => "3Delight",
"3dpeople" => "3dpeople",
"3dsMax" => "3ds Max",
"3DSPaint" => "3DS Paint",
"ACDSeeCanvas" => "ACDSee Canvas",
"AbletonLive" => "Ableton Live",
"Acrobat" => "Acrobat",
"AdobeDraw" => "Adobe Draw",
"AdobeFlash" => "Adobe Flash",
"AdobeFresco" => "Adobe Fresco",
"AdobeSubstance3Dassets" => "Adobe Substance 3D assets",
"AdobeXD" => "Adobe XD",
"AffinityDesigner" => "Affinity Designer",
"AffinityPhoto" => "Affinity Photo",
"AfterEffects" => "After Effects",
"Akeytsu" => "Akeytsu",
"Alchemy" => "Alchemy",
"AliasDesign" => "Alias Design",
"AlightMotion" => "Alight Motion",
"Amadine" => "Amadine",
"Amberlight" => "Amberlight",
"Animate" => "Animate",
"AnimationMaster" => "Animation:Master",
"AnimeStudio" => "Anime Studio",
"Apophysis" => "Apophysis",
"ArchiCAD" => "ArchiCAD",
"Arion" => "Arion",
"ArionFX" => "ArionFX",
"Arnold" => "Arnold",
"ArtEngine" => "ArtEngine",
"ArtFlow" => "ArtFlow",
"ArtRage" => "ArtRage",
"ArtstudioPro" => "Artstudio Pro",
"Artweaver" => "Artweaver",
"Aseprite" => "Aseprite",
"Audition" => "Audition",
"AutoCAD" => "AutoCAD",
"AutodeskSketchBook" => "Autodesk SketchBook",
"AvidMediaComposer" => "Avid Media Composer",
"AzPainter" => "AzPainter",
"babylonjs" => "babylon.js",
"BalsamiqMockup" => "Balsamiq Mockup",
"Bforartists" => "Bforartists",
"BlackInk" => "Black Ink",
"BlackmagicDesignFusion" => "Blackmagic Design Fusion",
"Blender" => "Blender",
"Blender DeepPaint" => "Blender DeepPaint",
"BlenderGreasePencil" => "Blender Grease Pencil",
"Blockbench" => "Blockbench",
"BodyPaint" => "BodyPaint",
"Boxcutter" => "Boxcutter",
"BraidMaker" => "Braid Maker",
"BrickLinkStudio" => "BrickLink Studio",
"Bridge" => "Bridge",
"Brushifyio" => "Brushify.io",
"C" => "C",
"C#" => "C#",
"C++" => "C++",
"CACANi" => "CACANi",
"CLIPSTUDIOPAINT" => "CLIP STUDIO PAINT",
"CLO" => "CLO",
"CRYENGINE" => "CRYENGINE",
"Callipeg" => "Callipeg",
"Canva" => "Canva",
"CaptureOne" => "Capture One",
"CartoonAnimator" => "Cartoon Animator",
"Carveco" => "Carveco",
"Cavalry" => "Cavalry",
"Chaotica" => "Chaotica",
"CharacterAnimator" => "Character Animator",
"CharacterCreator" => "Character Creator",
"Cinema4D" => "Cinema 4D",
"ClarisseiFX" => "Clarisse iFX",
"Coiffure" => "Coiffure",
"ColorsLive" => "Colors Live",
"Combustion" => "Combustion",
"Construct2" => "Construct 2",
"Core" => "Core",
"CorelPainter" => "Corel Painter",
"CorelDRAWGraphicsSuite" => "CorelDRAW Graphics Suite",
"CoronaRenderer" => "Corona Renderer",
"ProMotionNG" => "Cosmigo Pro Motion NG",
"CrazyBump" => "CrazyBump",
"Crocotile3D" => "Crocotile 3D",
"Curvy3D" => "Curvy 3D",
"Cycles4D" => "Cycles 4D",
"Darkroom" => "Darkroom",
"DAZStudio" => "DAZ Studio",
"DDO" => "DDO",
"DECIMA" => "DECIMA",
"Darktable" => "Darktable",
"DaVinciResolve" => "DaVinci Resolve",
"Dimension" => "Dimension",
"DragonBones" => "DragonBones",
"Dragonframe" => "Dragonframe",
"Drawpile" => "Drawpile",
"Dreams" => "Dreams",
"Dreamweaver" => "Dreamweaver",
"DxOPhotoLab" => "DxO PhotoLab",
"ECycles" => "E-Cycles",
"EmberGen" => "EmberGen",
"Encore" => "Encore",
"Expresii" => "Expresii",
"FStorm" => "FStorm",
"FadeIn" => "FadeIn",
"Feather3D" => "Feather 3D",
"FiberShop" => "FiberShop",
"Figma" => "Figma",
"FilmoraWondershare" => "Filmora Wondershare",
"FilterForge" => "Filter Forge",
"FinalCutPro" => "Final Cut Pro",
"FinalDraft" => "Final Draft",
"finalRender" => "finalRender",
"FireAlpaca" => "FireAlpaca",
"Fireworks" => "Fireworks",
"FlamePainter" => "Flame Painter",
"Flash" => "Flash",
"FlipaClip" => "FlipaClip",
"FlipnoteStudio" => "Flipnote Studio",
"Fluent" => "Fluent",
"ForestPack" => "Forest Pack",
"FormZ" => "Form-Z",
"Fractorium" => "Fractorium",
"FreeCAD" => "FreeCAD",
"FreeHand" => "FreeHand",
"Forger" => "Forger",
"FrostbiteEngine" => "Frostbite Engine",
"fSpy" => "fSpy",
"FumeFX" => "FumeFX",
"Fusion360" => "Fusion 360",
"GIMP" => "GIMP",
"GSCurveTools" => "GS CurveTools",
"GSToolbox" => "GS Toolbox",
"Gaea" => "Gaea",
"GameTextures" => "Game Textures",
"GameMakerStudio" => "GameMaker: Studio",
"GarageFarmNET" => "GarageFarm.NET",
"GeoGlyph" => "GeoGlyph",
"GigapixelAl" => "Gigapixel Al",
"Glaxnimate" => "Glaxnimate",
"GnomePaint" => "Gnome Paint",
"Godot" => "Godot",
"Goxel" => "Goxel",
"Graphite" => "Graphite",
"Graswald" => "Graswald",
"GravitySketch" => "Gravity Sketch",
"GuerillaRender" => "GuerillaRender",
"HDRLightStudio" => "HDR Light Studio",
"HairStrandDesigner" => "Hair Strand Designer",
"HairTGHairFur" => "HairTG - Hair &amp; Fur",
"HairTGSurfaceFeatherEdition" => "HairTG - Surface, Feather Edition",
"HairTGSurfaceHairEdition" => "HairTG - Surface, Hair Edition",
"Handplane" => "Handplane",
"Hansoft" => "Hansoft",
"HardOps" => "Hard Ops",
"HardMesh" => "HardMesh",
"Harmony" => "Harmony",
"HeavypaintWebbypaint" => "Heavypaint/Webbypaint",
"HelloPaint" => "HelloPaint",
"HeliconFocus" => "Helicon Focus",
"Hexels" => "Hexels",
"HiPaint" => "HiPaint",
"Houdini" => "Houdini",
"HydraRenderer" => "Hydra Renderer",
"iArtbook" => "iArtbook",
"IbisPaint" => "ibisPaint",
"Ideas" => "Ideas",
"IllustStudio" => "Illust Studio",
"Illustrator" => "Illustrator",
"IllustratorDraw" => "Illustrator Draw",
"InDesign" => "InDesign",
"Inochi2D" => "Inochi2D",
"InVision" => "InVision",
"InVisionCraft" => "InVision Craft",
"InfinitePainter" => "Infinite Painter",
"Inkscape" => "Inkscape",
"Inspirit" => "Inspirit",
"InstaLOD" => "InstaLOD",
"InstaMAT" => "InstaMAT",
"InstantLightRealtimePBR" => "Instant Light Realtime PBR",
"InstantMeshes" => "Instant Meshes",
"InstantTerra" => "Instant Terra",
"Inventor" => "Inventor",
"Iray" => "Iray",
"JWildfire" => "JWildfire",
"Java" => "Java",
"Jira" => "Jira",
"JumpPaint" => "Jump Paint by MediBang",
"JSPaint" => "JS Paint",
"Katana" => "Katana",
"Keyshot" => "Keyshot",
"KidPix" => "Kid Pix",
"KitBash3D" => "KitBash3D",
"Knald" => "Knald",
"Kodon" => "Kodon",
"KolourPaint" => "KolourPaint",
"Krakatoa" => "Krakatoa",
"KRESKA" => "KRESKA",
"Krita" => "Krita",
"LensStudio" => "Lens Studio",
"LibreSprite" => "LibreSprite",
"LightWave3D" => "LightWave 3D",
"Lightroom" => "Lightroom",
"Linearity" => "Linearity",
"LiquiGen" => "LiquiGen",
"Live2DCubism" => "Live2D Cubism",
"LookatmyHair" => "Look at my Hair",
"Lotpixel" => "Lotpixel",
"Lumion" => "Lumion",
"LuxRender" => "LuxRender",
"MacPaint" => "MacPaint",
"MagicaCSG" => "MagicaCSG",
"MagicaVoxel" => "MagicaVoxel",
"Magma" => "Magma",
"MakeHuman" => "MakeHuman",
"Malmal" => "Malmal",
"Mandelbulb3D" => "Mandelbulb 3D",
"Mandelbulber" => "Mandelbulber",
"MangaStudio" => "Manga Studio",
"Mari" => "Mari",
"MarmosetToolbag" => "Marmoset Toolbag",
"MarvelousDesigner" => "Marvelous Designer",
"MasterpieceStudioPro" => "Masterpiece Studio Pro",
"MasterpieceVR" => "MasterpieceVR",
"Maverick" => "Maverick",
"MaxwellRender" => "Maxwell Render",
"Maya" => "Maya",
"MediBangPaint" => "MediBang Paint",
"MediumbyAdobe" => "Medium by Adobe",
"Megascans" => "Megascans",
"mentalray" => "mental ray",
"MeshLab" => "MeshLab",
"Meshroom" => "Meshroom",
"MetaHumanCreator" => "MetaHuman Creator",
"Metashape" => "Metashape",
"MightyBake" => "MightyBake",
"MikuMikuDance" => "MikuMikuDance",
"Minecraft" => "Minecraft",
"Mischief" => "Mischief",
"Mixamo" => "Mixamo",
"Mixer" => "Mixer",
"MoI3D" => "MoI3D",
"Mocha" => "Mocha",
"Modo" => "Modo",
"Moho" => "Moho",
"MotionBuilder" => "MotionBuilder",
"Mudbox" => "Mudbox",
"Muse" => "Muse",
"MSPaint" => "MS Paint",
"MyPaint" => "MyPaint",
"NDO" => "NDO",
"NX" => "NX",
"NdotCAD" => "NdotCAD",
"NintendoNotes" => "Nintendo Notes",
"NomadSculpt" => "Nomad Sculpt",
"Notability" => "Notability",
"Nuke" => "Nuke",
"Nvil" => "Nvil",
"OctaneRender" => "Octane Render",
"Omniverse" => "Omniverse",
"OmniverseCreate" => "Omniverse Create",
"ON1PhotoRAW" => "ON1 Photo RAW",
"Open3DEngine" => "Open 3D Engine",
"OpenCanvas" => "OpenCanvas",
"OpenGL" => "OpenGL",
"OpenToonz" => "OpenToonz",
"Ornatrix" => "Ornatrix",
"OsciRender" => "Osci-Render",
"OurPaint" => "Our Paint",
"PBRMAX" => "PBRMAX",
"PFTrack" => "PFTrack",
"PTGui" => "PTGui",
"Paintbrush" => "Paintbrush",
"PaintNET" => "Paint.NET",
"PaintShopPro" => "PaintShop Pro",
"PaintToolSAI" => "Paint Tool SAI",
"PaintstormStudio" => "Paintstorm Studio",
"Paper" => "Paper",
"Pencil2D" => "Pencil2D",
"Penpot" => "Penpot",
"PhoenixFD" => "Phoenix FD",
"Phonto" => "Phonto",
"PhotoLab2" => "PhotoLab 2",
"Photopea" => "Photopea",
"Photoscan" => "Photoscan",
"Photoshop" => "Photoshop",
"PhotoshopElements" => "Photoshop Elements",
"PicoCAD" => "picoCAD",
"PicoCAD2" => "picoCAD 2",
"Pinta" => "Pinta",
"Piskel" => "Piskel",
"Pixilart" => "Pixilart",
"Pixelitor" => "Pixelitor",
"Pixelmator" => "Pixelmator",
"Pixelorama" => "Pixelorama",
"PixivSketch" => "pixiv Sketch",
"Pixquare" => "Pixquare",
"PlantCatalog" => "PlantCatalog",
"PlantFactory" => "PlantFactory",
"Plasticity" => "Plasticity",
"PNGtuberPlus" => "PNGtuber Plus",
"Poliigon" => "Poliigon",
"Polybrush" => "Polybrush",
"PopcornFx" => "PopcornFx",
"Poser" => "Poser",
"Premiere" => "Premiere",
"PremiereElements" => "Premiere Elements",
"PresagisCreator" => "Presagis Creator",
"ProTools" => "Pro Tools",
"Procreate" => "Procreate",
"ProcreateDreams" => "Procreate Dreams",
"Producer" => "Producer",
"PrometheanAI" => "Promethean AI",
"PureRef" => "PureRef",
"Python" => "Python",
"PyxelEdit" => "PyxelEdit",
"QuadRemesher" => "Quad Remesher",
"QuarkXPress" => "QuarkXPress",
"Qubicle" => "Qubicle",
"Quill" => "Quill",
"QuixelBridge" => "Quixel Bridge",
"QuixelMegascans" => "Quixel Megascans",
"QuixelMixer" => "Quixel Mixer",
"QuixelSuite" => "Quixel Suite",
"R3DSWrap" => "R3DS Wrap",
"R3DSZWRAP" => "R3DS ZWRAP",
"RDTextures" => "RD-Textures",
"RailClone" => "RailClone",
"RealFlow" => "RealFlow",
"RealisticPaintStudio" => "Realistic Paint Studio",
"RealityCapture" => "RealityCapture",
"RealityScan" => "RealityScan",
"RealtimeBoard" => "Realtime Board",
"Rebelle" => "Rebelle",
"Redshift" => "Redshift",
"RenderMan" => "RenderMan",
"RenderNetwork" => "Render Network",
"Revit" => "Revit",
"Rhino" => "Rhino",
"Rhinoceros" => "Rhinoceros",
"RizomUV" => "RizomUV",
"RoughAnimator" => "Rough Animator",
"SamsungNotes" => "Samsung Notes",
"SamsungPENUP" => "Samsung PENUP",
"ScansLibrary" => "ScansLibrary",
"Scrivener" => "Scrivener",
"Sculpt+" => "Sculpt+",
"Sculptris" => "Sculptris",
"ShaveandaHaircut" => "Shave and a Haircut",
"ShiVa3D" => "ShiVa3D",
"Shotgun" => "Shotgun",
"Silo" => "Silo",
"Silugen" => "Silugen",
"Sketch" => "Sketch",
"SketchApp" => "Sketch App",
"SketchBookPro" => "SketchBook Pro",
"SketchClub" => "SketchClub",
"SketchUp" => "SketchUp",
"Sketchable" => "Sketchable",
"Sketchfab" => "Sketchfab",
"Skyshop" => "Skyshop",
"Snapseed" => "Snapseed",
"Snowdrop" => "Snowdrop",
"Softimage" => "Softimage",
"SolidWorks" => "SolidWorks",
"SonySketch" => "Sony Sketch",
"Soundbooth" => "Soundbooth",
"Source2" => "Source 2",
"SourceControl" => "Source Control",
"SourceFilmmaker" => "Source Filmmaker",
"SpeedTree" => "SpeedTree",
"Speedgrade" => "Speedgrade",
"SpeedyPainter" => "SpeedyPainter",
"Spine2D" => "Spine 2D",
"Spriter" => "Spriter",
"Stingray" => "Stingray",
"Storyboarder" => "Storyboarder",
"StoryboardPro" => "Storyboard Pro",
"SublimeText" => "Sublime Text",
"Substance3DDesigner" => "Substance 3D Designer",
"Substance3DModeler" => "Substance 3D Modeler",
"Substance3DPainter" => "Substance 3D Painter",
"Substance3DSampler" => "Substance 3D Sampler",
"Substance3DStager" => "Substance 3D Stager",
"SubstanceB2M" => "Substance B2M",
"SweetHome3D" => "Sweet Home 3D",
"SynthEyes" => "SynthEyes",
"TTools" => "TTools",
"TVPaint" => "TVPaint",
"TVPaintAnimation" => "TVPaint Animation",
"TayasuiSketches" => "Tayasui Sketches",
"TayasuiSketchesMobileApp" => "Tayasui Sketches Mobile App",
"TayasuiSketchesPro" => "Tayasui Sketches Pro",
"Terragen" => "Terragen",
"Texturescom" => "Textures.com",
"Texturingxyz" => "Texturingxyz",
"TeyaConceptor" => "Teya Conceptor",
"TheGrove3D" => "The Grove 3D",
"TheaRender" => "Thea Render",
"Threejs" => "Three.js",
"Tiled" => "Tiled",
"TiltBrush" => "Tilt Brush",
"Tooll3" => "Tooll3",
"ToonBoomHarmony" => "Toon Boom Harmony",
"ToonBoomStudio" => "Toon Boom Studio",
"ToonSquid" => "ToonSquid",
"TopoGun" => "TopoGun",
"TuxPaint" => "Tux Paint",
"Tvori" => "Tvori",
"Twinmotion" => "Twinmotion",
"UNIGINEEngine" => "UNIGINE Engine",
"UVLayout" => "UVLayout",
"UltraFractal" => "Ultra Fractal",
"uMake" => "uMake",
"Unfold3D" => "Unfold 3D",
"Unity" => "Unity",
"UnrealEngine" => "Unreal Engine",
"Vengi" => "vengi",
"VRay" => "V-Ray",
"VRED" => "VRED",
"VTubeStudio" => "VTube Studio",
"Vectary" => "Vectary",
"VectorayGen" => "VectorayGen",
"Vectorworks" => "Vectorworks",
"VegasPro" => "Vegas Pro",
"VisualDesigner3D" => "Visual Designer 3D",
"VisualStudio" => "Visual Studio",
"VRoidStudio" => "VRoid Studio",
"Vue" => "Vue",
"Vuforia" => "Vuforia",
"WebGL" => "WebGL",
"WhiteboardFox" => "Whiteboard Fox",
"WickEditor" => "Wick Editor",
"Wings3D" => "Wings 3D",
"Word" => "Word",
"WorldCreator" => "World Creator",
"WorldMachine" => "World Machine",
"XParticles" => "X-Particles",
"Xfrog" => "Xfrog",
"Xgen" => "Xgen",
"xNormal" => "xNormal",
"xTex" => "xTex",
"XoliulShader" => "Xoliul Shader",
"Yafaray" => "Yafaray",
"Yeti" => "Yeti",
"ZBrush" => "ZBrush",
"ZBrushCore" => "ZBrushCore",
"ZenBrush" => "Zen Brush"
]
]
];
}
private function get($proxy, $url, $get = [], $search){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: application/json, text/plain, */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
//"sentry-trace: 72b0318a7141fe18cbacbd905572eddf-a60de161b66b1e6f-1
//"baggage: sentry-environment=vercel-production,sentry-release=251ff5179b4de94974f36d9b8659a487bbb8a819,sentry-public_key=2b87af2b44c84643a011838ad097735f,sentry-trace_id=72b0318a7141fe18cbacbd905572eddf,sentry-transaction=GET%20%2Fsearch,sentry-sampled=true,sentry-sample_rand=0.09967130764937493,sentry-sample_rate=0.5",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
//"Referer: https://cara.app/search?q=jak+and+daxter&type=&sortBy=Top&filters=%7B%7D",
"Referer: https://cara.app/search?q=" . urlencode($search),
//"Cookie: __Host-next-auth.csrf-token=b752c4296375bccb7b480ff010e1e916c65c35c311a4a57ac6cd871468730578%7C4d3783cfb72a98f390e534abd149806432b6cf8d50555a52d00e99216a516911; __Secure-next-auth.callback-url=https%3A%2F%2Fcara.app; crumb=BV0HDt87G5+fOWE0ZDQ5MWM0ZTQ3YTZmMzM4MGU5MGNjNDNmMzY2",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"TE: trailers"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function image($get){
if($get["npt"]){
[$npt, $proxy] =
$this->backend->get(
$get["npt"],
"images"
);
$npt = json_decode($npt, true);
}else{
$search = $get["s"];
if(strlen($search) === 0){
throw new Exception("Search term is empty!");
}
$proxy = $this->backend->get_ip();
$npt = [
"q" => $get["s"],
"sortBy" => $get["sort"],
"take" => 24,
"skip" => 0,
"filters" => []
];
// parse filters
if($get["type"] != "any"){
$npt["filters"]["posts"] = [$get["type"]];
}
if($get["fields"] != "any"){
$npt["filters"]["fields"] = [$get["fields"]];
}
if($get["category"] != "any"){
$npt["filters"]["categories"] = [$get["category"]];
}
if($get["software"] != "any"){
$npt["filters"]["softwares"] = [$get["software"]];
}
if($npt["filters"] == []){
$npt["filters"] = "{}";
}else{
$npt["filters"] = json_encode($npt["filters"]);
}
}
$out = [
"status" => "ok",
"npt" => null,
"image" => []
];
// https://cara.app/api/search/portfolio-posts?q=jak+and+daxter&sortBy=Top&take=24&skip=0&filters=%7B%7D
try{
$json =
$this->get(
$proxy,
"https://cara.app/api/search/posts",
$npt,
$npt["q"]
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to decode JSON");
}
$imagecount = 0;
foreach($json as $image){
if(count($image["images"]) === 0){
// sometimes the api returns no images for an object
$imagecount++;
continue;
}
$cover = null;
$sources = [];
foreach($image["images"] as $source){
if($source["isCoverImg"]){
$cover = [
"url" => "https://images.cara.app/" . $this->fix_url($source["src"]),
"width" => 500,
"height" => 500
];
}else{
$sources[] = [
"url" => "https://images.cara.app/" . $this->fix_url($source["src"]),
"width" => null,
"height" => null
];
}
}
if($cover !== null){
$sources[] = $cover;
}
$out["image"][] = [
"title" => str_replace("\n", " ", $image["content"]),
"source" => $sources,
"url" => "https://cara.app/post/" . $image["id"]
];
$imagecount++;
}
if($imagecount === 24){
$npt["skip"] += 24;
$out["npt"] =
$this->backend->store(
json_encode($npt),
"images",
$proxy
);
}
return $out;
}
private function fix_url($url){
return
str_replace(
[" "],
["%20"],
$url
);
}
}

680
scraper/coccoc.php Normal file
View File

@@ -0,0 +1,680 @@
<?php
class coccoc{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("coccoc");
include "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
}
private function get($proxy, $url, $get = []){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
// http2 bypass
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
curl_setopt($curlproc, CURLOPT_HTTPHEADER, [
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
//"Cookie: _contentAB_15040_vi=V-06_01; split_test_search=new_search; uid=L_bauXyZBY1B; vid=uCVQJQSTgb9QGT3o; ls=1753742684; serp_version=29223843/7621a70; savedS=direct",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: cross-site",
"Priority: u=0, i"
]);
$this->backend->assign_proxy($curlproc, $proxy);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function getfilters($pagetype){
return [
"nsfw" => [
"display" => "NSFW",
"option" => [
"yes" => "Yes", // nsfw by default????
"no" => "No" // &safe=1
]
],
"time" => [
"display" => "Time posted",
"option" => [
"any" => "Any time",
"1w" => "1 week ago",
"2w" => "2 weeks ago",
"1m" => "1 month ago",
"3m" => "3 months ago",
"6m" => "6 months ago",
"1Y" => "1 year ago"
]
],
"filter" => [
"display" => "Remove duplicates",
"option" => [
"no" => "No",
"yes" => "Yes" // &filter=0
]
]
];
}
public function web($get){
if($get["npt"]){
[$query, $proxy] =
$this->backend->get(
$get["npt"],
"web"
);
$query = json_decode($query, true);
}else{
$proxy = $this->backend->get_ip();
$query = [
"query" => $get["s"]
];
// add filters
if($get["nsfw"] == "no"){
$query["safe"] = 1;
}
if($get["time"] != "any"){
$query["tbs"] = $get["time"];
}
if($get["filter"] == "yes"){
$query["filter"] = 0;
}
}
try{
$html =
$this->get(
$proxy,
"https://coccoc.com/search",
$query
);
}catch(Exception $error){
throw new Exception("Failed to get search page");
}
//$html = file_get_contents("scraper/coccoc.html");
$html = explode("window.composerResponse", $html, 2);
if(count($html) !== 2){
throw new Exception("Failed to grep window.composerResponse");
}
$html =
json_decode(
$this->fuckhtml
->extract_json(
ltrim($html[1], " =")
),
true
);
if($html === null){
throw new Exception("Failed to decode JSON");
}
if(
isset($html["captcha"]) &&
(int)$html["captcha"] === 1
){
throw new Exception("Coc Coc returned a Captcha");
}
if(!isset($html["search"]["search_results"])){
throw new Exception("Coc Coc did not return a search_results object");
}
$out = [
"status" => "ok",
"spelling" => [
"type" => "no_correction",
"using" => null,
"correction" => null
],
"npt" => null,
"answer" => [],
"web" => [],
"image" => [],
"video" => [],
"news" => [],
"related" => []
];
// word correction
foreach($html["top"] as $element){
if(isset($element["spellChecker"][0]["query"])){
$out["spelling"] = [
"type" => "not_many",
"using" => $html["search"]["query"],
"correction" => $element["spellChecker"][0]["query"]
];
}
}
foreach($html["search"]["search_results"] as $result){
if(isset($result["type"])){
switch($result["type"]){
//
// Related searches
//
case "related_queries":
$out["related"] = $result["queries"];
continue 2;
//
// Videos
//
case "video_hits":
foreach($result["results"] as $video){
if(
isset($video["image_url"]) &&
!empty($video["image_url"])
){
$thumb = [
"ratio" => "16:9",
"url" => $video["image_url"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["video"][] = [
"title" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$video["title"]
)
),
"description" => null,
"author" => [
"name" => $video["uploader"],
"url" => null,
"avatar" => null
],
"date" => (int)$video["date"],
"duration" => (int)$video["duration"],
"views" => null,
"thumb" => $thumb,
"url" => $video["url"]
];
}
continue 2;
}
}
if(
!isset($result["title"]) ||
!isset($result["url"])
){
// should not happen
continue;
}
if(isset($result["rich"]["data"]["image_url"])){
$thumb = [
"url" => $result["rich"]["data"]["image_url"],
"ratio" => "16:9"
];
}else{
$thumb = [
"url" => null,
"ratio" => null
];
}
$sublinks = [];
if(isset($result["rich"]["data"]["linked_docs"])){
foreach($result["rich"]["data"]["linked_docs"] as $sub){
$sublinks[] = [
"title" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$sub["title"]
)
),
"description" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$sub["content"]
)
),
"date" => null,
"url" => $sub["url"]
];
}
}
// get date
if(isset($result["date"])){
$date = (int)$result["date"];
}else{
$date = null;
}
// probe for metadata
$table = [];
if(isset($result["rich"]["data"]["rating"])){
$table["Rating"] = $result["rich"]["data"]["rating"];
if(isset($result["rich"]["data"]["num_rating"])){
$table["Rating"] .= " (" . number_format($result["rich"]["data"]["num_rating"]) . " ratings)";
}
}
if(isset($result["rich"]["data"]["views"])){
$table["Views"] = number_format($result["rich"]["data"]["views"]);
}
if(isset($result["rich"]["data"]["duration"])){
$table["Duration"] = $this->int2hms($result["rich"]["data"]["duration"]);
}
if(isset($result["rich"]["data"]["channel_name"])){
$table["Author"] = $result["rich"]["data"]["channel_name"];
}
if(isset($result["rich"]["data"]["video_quality"])){
$table["Quality"] = $result["rich"]["data"]["video_quality"];
}
if(isset($result["rich"]["data"]["category"])){
$table["Category"] = $result["rich"]["data"]["category"];
}
$out["web"][] = [
"title" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$result["title"]
)
),
"description" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$result["content"]
)
),
"url" => $result["url"],
"date" => $date,
"type" => "web",
"thumb" => $thumb,
"sublink" => $sublinks,
"table" => $table
];
}
//
// Get wikipedia head
//
if(isset($html["right"])){
foreach($html["right"] as $wiki){
$description = [];
if(isset($wiki["short_intro"])){
$description[] =
[
"type" => "quote",
"value" => $wiki["short_intro"],
];
}
if(isset($wiki["intro"])){
$description[] =
[
"type" => "text",
"value" => $wiki["intro"],
];
}
// get table elements
$table = [];
if(isset($wiki["fields"])){
foreach($wiki["fields"] as $element){
$table[$element["title"]] = implode(", ", $element["value"]);
}
}
// get sublinks
$sublinks = [];
if(isset($wiki["website"])){
if(
preg_match(
'/^http/',
$wiki["website"]
) === 0
){
$sublinks["Website"] = "https://" . $wiki["website"];
}else{
$sublinks["Website"] = $wiki["website"];
}
}
foreach($wiki["profiles"] as $sitename => $url){
$sitename = explode("_", $sitename);
$sitename = ucfirst($sitename[count($sitename) - 1]);
$sublinks[$sitename] = $url;
}
$out["answer"][] = [
"title" =>
$this->titledots(
$wiki["title"]
),
"description" => $description,
"url" => null,
"thumb" => isset($wiki["image"]["contentUrl"]) ? $wiki["image"]["contentUrl"] : null,
"table" => $table,
"sublink" => $sublinks
];
}
}
// get next page
if((int)$html["search"]["page"] < (int)$html["search"]["max_page"]){
// https://coccoc.com/composer?_=1754021153532&p=0&q=zbabduiqwhduwqhdnwq&reqid=bwcAs00q&s=direct&apiV=1
// ^json endpoint, but we can just do &page=2 lol
if(!isset($query["page"])){
$query["page"] = 2;
}else{
$query["page"]++;
}
$out["npt"] =
$this->backend
->store(
json_encode($query),
"web",
$proxy
);
}
return $out;
}
public function video($get){
//$html = file_get_contents("scraper/coccoc.html");
if($get["npt"]){
[$query, $proxy] =
$this->backend->get(
$get["npt"],
"videos"
);
$query = json_decode($query, true);
}else{
$proxy = $this->backend->get_ip();
$query = [
"query" => $get["s"],
"tbm" => "vid"
];
// add filters
if($get["nsfw"] == "no"){
$query["safe"] = 1;
}
if($get["time"] != "any"){
$query["tbs"] = $get["time"];
}
if($get["filter"] == "yes"){
$query["filter"] = 0;
}
}
try{
$html =
$this->get(
$proxy,
"https://coccoc.com/search",
$query
);
}catch(Exception $error){
throw new Exception("Failed to get search page");
}
$html = explode("window.composerResponse", $html, 2);
if(count($html) !== 2){
throw new Exception("Failed to grep window.composerResponse");
}
$html =
json_decode(
$this->fuckhtml
->extract_json(
ltrim($html[1], " =")
),
true
);
if($html === null){
throw new Exception("Failed to decode JSON");
}
$out = [
"status" => "ok",
"npt" => null,
"video" => [],
"author" => [],
"livestream" => [],
"playlist" => [],
"reel" => []
];
if(!isset($html["search_video"]["search_results"])){
if(isset($html["search_video"]["error"]["title"])){
if($html["search_video"]["error"]["title"] == "Không tìm thấy kết quả nào"){
return $out;
}
throw new Exception("Coc Coc returned an error: " . $html["search_video"]["error"]["title"]);
}
throw new Exception("Coc Coc did not supply a search_results object");
}
foreach($html["search_video"]["search_results"] as $video){
if(isset($video["rich"]["data"]["image_url"])){
$thumb = [
"ratio" => "16:9",
"url" => $video["rich"]["data"]["image_url"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["video"][] = [
"title" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$video["title"]
)
),
"description" =>
$this->titledots(
$this->fuckhtml
->getTextContent(
$video["content"]
)
),
"author" => [
"name" =>
isset($video["rich"]["data"]["channel_name"]) ?
$video["rich"]["data"]["channel_name"] : null,
"url" => null,
"avatar" => null
],
"date" =>
isset($video["date"]) ?
$video["date"] : null,
"duration" =>
isset($video["rich"]["data"]["duration"]) ?
(int)$video["rich"]["data"]["duration"] : null,
"views" => null,
"thumb" => $thumb,
"url" => $video["url"]
];
}
// get next page
if((int)$html["search_video"]["page"] < (int)$html["search_video"]["max_page"]){
if(!isset($query["page"])){
$query["page"] = 2;
}else{
$query["page"]++;
}
$out["npt"] =
$this->backend
->store(
json_encode($query),
"videos",
$proxy
);
}
return $out;
}
private function titledots($title){
return trim($title, " .\t\n\r\0\x0B");
}
private function int2hms($seconds){
$hours = floor($seconds / 3600);
$minutes = floor(($seconds % 3600) / 60);
$seconds = $seconds % 60;
return sprintf("%02d:%02d:%02d", $hours, $minutes, $seconds);
}
}

View File

@@ -285,6 +285,7 @@ class ddg{
"display" => "NSFW",
"option" => [
"yes" => "Yes",
"maybe" => "Maybe",
"no" => "No"
]
],
@@ -354,6 +355,36 @@ class ddg{
public function web($get){
if($get["npt"]){
[$raw_data, $proxy] = $this->backend->get($get["npt"], "web");
$raw_data = explode(",", $raw_data, 2);
if($raw_data[0] == "0"){
return $this->web_html($get, [$raw_data[1], $proxy]);
}
return $this->web_full($get, [$raw_data[1], $proxy]);
}else{
// we have $get["s"]
if(
strpos($get["s"], "\"") !== false || // contains quotes
strpos($get["s"], ":") !== false // contains potential site: operator or whatever the fuck
){
return $this->web_html($get);
}
// no quotes sent, do full web search
return $this->web_full($get);
}
}
public function web_html($get, $npt = null){
$out = [
"status" => "ok",
"spelling" => [
@@ -370,9 +401,374 @@ class ddg{
"related" => []
];
if($get["npt"]){
if($npt !== null){
[$js_link, $proxy] = $this->backend->get($get["npt"], "web");
[$get_filters, $proxy] = $npt;
$get_filters = json_decode($get_filters, true);
}else{
if(strlen($get["s"]) === 0){
throw new Exception("Search term is empty!");
}
$proxy = $this->backend->get_ip();
// generate filters
$get_filters = [
"q" => $get["s"]
];
if($get["country"] == "any"){
$get_filters["kl"] = "wt-wt";
}else{
$get_filters["kl"] = $get["country"];
}
switch($get["nsfw"]){
case "yes": $get_filters["kp"] = "-2"; break;
case "maybe": $get_filters["kp"] = "-1"; break;
case "no": $get_filters["kp"] = "1"; break;
}
$df = true;
if($get["newer"] === false){
if($get["older"] !== false){
$start = 36000;
$end = $get["older"];
}else{
$df = false;
}
}else{
$start = $get["newer"];
if($get["older"] !== false){
$end = $get["older"];
}else{
$end = time();
}
}
if($df === true){
$get_filters["df"] = date("Y-m-d", $start) . ".." . date("Y-m-d", $end);
}
}
//
// Get HTML
//
try{
$html = $this->get(
$proxy,
"https://html.duckduckgo.com/html/",
$get_filters
);
}catch(Exception $e){
throw new Exception("Failed to fetch search page");
}
//$html = file_get_contents("scraper/ddg.html");
$this->fuckhtml->load($html);
//
// Get next page token
//
$forms =
$this->fuckhtml
->getElementsByTagName(
"form"
);
foreach(array_reverse($forms) as $form){
$this->fuckhtml->load($form);
$input_probe =
$this->fuckhtml
->getElementsByClassName(
"btn--alt",
"input"
);
if(count($input_probe) !== 0){
// found next page!
$inputs =
$this->fuckhtml
->getElementsByAttributeValue(
"type",
"hidden",
"input"
);
$query = [];
foreach($inputs as $q){
$query[
$this->fuckhtml
->getTextContent(
$q["attributes"]["name"]
)
] =
$this->fuckhtml
->getTextContent(
$q["attributes"]["value"]
);
}
$out["npt"] =
$this->backend->store(
"0," . json_encode($query),
"web",
$proxy
);
break;
}
}
// reset
$this->fuckhtml->load($html);
//
// parse wikipedia answer
//
$wiki_wrapper =
$this->fuckhtml
->getElementsByClassName(
"zci-wrapper",
"div"
);
if(count($wiki_wrapper) !== 0){
$this->fuckhtml->load($wiki_wrapper[0]);
$a =
$this->fuckhtml
->getElementsByTagName(
"a"
);
if(count($a) !== 0){
$link =
$this->unshiturl(
$this->fuckhtml
->getTextContent(
$a[0]["attributes"]["href"]
)
);
}else{
$link = null;
}
$title =
$this->fuckhtml
->getElementsByTagName(
"h1"
);
if(count($title) !== 0){
$title =
$this->fuckhtml
->getTextContent(
$title[0]
);
}else{
$title = null;
}
$description =
$this->fuckhtml
->getElementById(
"zero_click_abstract",
"div"
);
if($description !== false){
$this->fuckhtml->load($description);
$thumb =
$this->fuckhtml
->getElementsByTagName(
"img"
);
if(count($thumb) !== 0){
$thumb =
$this->fuckhtml
->getTextContent(
$thumb[0]["attributes"]["src"]
);
}else{
$thumb = null;
}
$as =
$this->fuckhtml
->getElementsByTagName(
"a"
);
foreach($as as $a){
$description["innerHTML"] =
str_replace(
$a["outerHTML"],
"",
$description["innerHTML"]
);
}
$description =
$this->fuckhtml
->getTextContent(
$description
);
$out["answer"][] = [
"title" => $title,
"description" => [
[
"type" => "text",
"value" => $description
]
],
"url" => $link,
"thumb" => $thumb,
"table" => [],
"sublink" => []
];
}
// reset
$this->fuckhtml->load($html);
}
//
// Get results
//
$results =
$this->fuckhtml
->getElementsByClassName(
"result",
"div"
);
foreach($results as $result){
$this->fuckhtml->load($result);
if(stripos($result["attributes"]["class"], "result--ad") !== false){
// found an ad
continue;
}
$title =
$this->fuckhtml
->getElementsByTagName(
"h2"
);
if(count($title) === 0){
// should not happen
continue;
}
$title =
$this->fuckhtml
->getTextContent(
$title[0]
);
$description_obj =
$this->fuckhtml
->getElementsByClassName(
"result__snippet",
"a"
);
if(count($description_obj) === 0){
$description = null;
}else{
$description =
$this->titledots(
$this->fuckhtml
->getTextContent(
$description_obj[0]
)
);
}
$url =
$this->fuckhtml
->getTextContent(
$description_obj[0]["attributes"]["href"]
);
$out["web"][] = [
"title" => $this->titledots($title),
"description" => $description,
"url" => $this->unshiturl($url),
"date" => null,
"type" => "web",
"thumb" => [
"ratio" => null,
"url" => null
],
"sublink" => [],
"table" => []
];
}
return $out;
}
public function web_full($get, $npt = null){
$out = [
"status" => "ok",
"spelling" => [
"type" => "no_correction",
"using" => null,
"correction" => null
],
"npt" => null,
"answer" => [],
"web" => [],
"image" => [],
"video" => [],
"news" => [],
"related" => []
];
if($npt !== null){
[$js_link, $proxy] = $npt;
$js_link = "https://links.duckduckgo.com" . $js_link;
$html = "";
@@ -489,6 +885,7 @@ class ddg{
throw new Exception("Failed to fetch d.js");
}
//$js = file_get_contents("scraper/fuck.js");
//echo htmlspecialchars($js);
$js_tmp =
@@ -500,6 +897,139 @@ class ddg{
if(count($js_tmp) <= 1){
//
// Detect javascript challenge
//
if(
preg_match(
'/DDG\.deep\.initialize\(\'([^\']+)\'\ *\+ *jsa/i',
$js,
$challenge_url
)
){
throw new Exception("DuckDuckGo returned a JSA challenge");
// get JSA initial token
if(
!preg_match(
'/let jsa *= *([0-9]+)/',
$js,
$jsa
)
){
$jsa = 0;
}else{
$jsa = (int)$jsa[1];
}
// get function bodies
preg_match_all(
'/let *([A-Za-z0-9]+) *= *function\(.*\) *{(.*)};/sU',
$js,
$functions
);
$parsed_functions = [];
for($i=0; $i<count($functions[0]); $i++){
$functions[2][$i] = trim($functions[2][$i]);
if(
preg_match(
'/return num *\* *([0-9]+)/i',
$functions[2][$i],
$num
)
){
$parsed_functions[$functions[1][$i]] = [
"type" => "multiplication",
"num" => (int)$num[1]
];
continue;
}
if(
preg_match(
'/innerHTML *= *`([^`]+)`/i',
$functions[2][$i],
$challenge
)
){
$challenge[1] =
preg_replace(
'/<\/(br)>/',
'<$1>',
$challenge[1]
);
$parsed_functions[$functions[1][$i]] = [
"type" => "challenge",
"text" => $challenge[1]
];
}
}
// get function call order
preg_match_all(
'/jsa *= *([A-Za-z0-9]+)\(jsa\)/i',
$js,
$call_order
);
foreach($call_order[1] as $order){
if(!isset($parsed_functions[$order])){
throw new Exception("JS challenge solve failure: DuckDuckGo called an unknown function");
}
if($parsed_functions[$order]["type"] == "multiplication"){
$jsa = $jsa * $parsed_functions[$order]["num"];
continue;
}
if($parsed_functions[$order]["type"] == "challenge"){
// @TODO get parsed length
//$parsed_functions[$order]["text"]
$jsa = $jsa + strlen($parsed_functions[$order]["text"]);
}
}
try{
$js = $this->get(
$proxy,
"https://links.duckduckgo.com" . $challenge_url[1] . $jsa,
[],
ddg::req_xhr
);
}catch(Exception $error){
throw new Exception("Failed to get challenged d.js");
}
}
//
// Detect JavaScript anomaly failure thingy
//
if(
preg_match(
'/DDG.deep.anomalyDetectionBlock\({/',
$js
)
){
throw new Exception("DuckDuckGo detected an anomaly in the Javascript challenge response");
}
throw new Exception("Failed to grep pageLayout(d)");
}
@@ -524,6 +1054,18 @@ class ddg{
if(isset($item["c"])){
if(
!isset($item["s"]) &&
isset($item["t"]) &&
(
$item["t"] == "DEEP_ERROR_NO_RESULTS" ||
$item["t"] == "DEEP_SIMPLE_NO_RESULTS"
)
){
return $out;
}
$table = [];
// get youtube video information
@@ -665,7 +1207,7 @@ class ddg{
// get NPT
$out["npt"] =
$this->backend->store(
$item["n"],
"1," . $item["n"],
"web",
$proxy
);
@@ -717,7 +1259,7 @@ class ddg{
->getTextContent(
$json["suggestion"]
),
"correction" => $json["recourseText"]
"correction" => html_entity_decode($json["recourseText"])
];
}
}
@@ -1036,20 +1578,38 @@ class ddg{
if(isset($json["Abstract"])){
$description[] =
[
"type" => "text",
"value" => $json["Abstract"]
];
$description = $this->parse_rich_text($json["Abstract"]);
}
if(
!isset($json["Image"]) ||
$json["Image"] == "" ||
$json["Image"] === null ||
$json["Image"] == "https://duckduckgo.com/i/"
){
$image = null;
}else{
if(
preg_match(
'/^https?:\/\//',
$json["Image"]
)
){
$image = $json["Image"];
}else{
$image = "https://duckduckgo.com" . $json["Image"];
}
}
$out["answer"][] = [
"title" => $json["Heading"],
"description" => $description,
"url" => $json["AbstractURL"],
"thumb" =>
(isset($json["Image"]) && $json["Image"]) !== null ?
"https://duckduckgo.com" . $json["Image"] : null,
"thumb" => $image,
"table" => $table,
"sublink" => $sublinks
];
@@ -1062,11 +1622,11 @@ class ddg{
}
//
// Get wordnik definition
// Parse additional data endpoints
//
//nrj('/js/spice/dictionary/definition/create', null, null, null, null, 'dictionary_definition');
preg_match(
preg_match_all(
'/nrj\(\s*\'([^\']+)\'/',
$js,
$nrj
@@ -1074,11 +1634,14 @@ class ddg{
if(isset($nrj[1])){
$nrj = $nrj[1];
foreach($nrj[1] as $potential_endpoint){
//
// Probe for wordnik definition
//
preg_match(
'/\/js\/spice\/dictionary\/definition\/([^\/]+)/',
$nrj,
$potential_endpoint,
$word
);
@@ -1304,6 +1867,87 @@ class ddg{
];
}
}
//
// Parse stackoverflow answer
//
if(
preg_match(
'/^\/a\.js.*src_id=stack_overflow/',
$potential_endpoint
)
){
// found stackoverflow answer
try{
$json =
$this->get(
$proxy,
"https://duckduckgo.com" . $potential_endpoint,
[],
ddg::req_xhr
);
}catch(Exception $e){
// fail gracefully
return $out;
}
$json = explode("DDG.duckbar.add_array(", $json, 2);
if(count($json) === 2){
$json =
json_decode(
$this->fuckhtml
->extract_json(
$json[1]
),
true
);
if(
$json !== null &&
isset($json[0]["data"])
){
$json = $json[0]["data"];
foreach($json as $answer){
if(isset($answer["Heading"])){
$title = $answer["Heading"];
}elseif(isset($answer["title"])){
$title = $answer["title"];
}else{
$title = null;
}
if(
$title !== null &&
isset($answer["Abstract"])
){
$description = $this->parse_rich_text($answer["Abstract"]);
$out["answer"][] = [
"title" => $title,
"description" => $description,
"url" => $answer["AbstractURL"],
"thumb" => null,
"table" => [],
"sublink" => []
];
}
}
}
}
}
}
}
return $out;
@@ -1345,7 +1989,7 @@ class ddg{
$get_filters["iaf"] = $filters;
}
$nsfw = $get["nsfw"] == "yes" ? "-2" : "-1";
$nsfw = $get["nsfw"] == "yes" ? "-1" : "1";
$get_filters["kp"] = $nsfw;
try{
@@ -1498,8 +2142,12 @@ class ddg{
"ia" => "videos"
];
$nsfw = $get["nsfw"] == "yes" ? "-2" : "-1";
$get_filters["kp"] = $nsfw;
switch($get["nsfw"]){
case "yes": $nsfw = "-2"; break;
case "maybe": $nsfw = "-1"; break;
case "no": $nsfw = "1"; break;
}
$filters = [];
@@ -1827,6 +2475,146 @@ class ddg{
return $out;
}
private function parse_rich_text($html){
$description = [];
// pre-process the html, remove useless elements
$html =
strip_tags(
$html,
[
"h1", "h2", "h3", "h4", "h5", "h6", "h7",
"pre", "code"
]
);
$html =
preg_replace(
'/<(\/?)pre *[^>]*>\s*<\/?code *[^>]*>/i',
'<$1pre>',
$html
);
$this->fuckhtml->load($html);
$tags =
$this->fuckhtml
->getElementsByTagName(
"*"
);
if(count($tags) === 0){
$description[] = [
"type" => "text",
"value" =>
trim(
$this->fuckhtml
->getTextContent(
$html,
true,
false
)
)
];
}else{
$start = 0;
$was_code_block = true;
foreach($tags as $tag){
$text =
$this->fuckhtml
->getTextContent(
substr(
$html,
$start,
$tag["startPos"] - $start
),
true,
false
);
if($was_code_block){
$text = ltrim($text);
$was_code_block = false;
}
$description[] = [
"type" => "text",
"value" => $text
];
switch($tag["tagName"]){
case "pre":
$append = "code";
$was_code_block = true;
$c = count($description) - 1;
$description[$c]["value"] =
rtrim($description[$c]["value"]);
break;
case "code":
$append = "inline_code";
$c = count($description) - 1;
$description[$c]["value"] =
rtrim($description[$c]["value"]) . " ";
break;
case "h1":
case "h2":
case "h3":
case "h4":
case "h5":
case "h6":
case "h7":
$append = "title";
$c = count($description) - 1;
$description[$c]["value"] =
rtrim($description[$c]["value"]);
break;
}
$description[] = [
"type" => $append,
"value" =>
trim(
$this->fuckhtml
->getTextContent(
$tag,
true,
false
)
)
];
$start = $tag["endPos"];
}
// shit out remainder
$description[] = [
"type" => "text",
"value" =>
trim(
$this->fuckhtml
->getTextContent(
substr(
$html,
$start
),
true,
false
)
)
];
}
return $description;
}
private function titledots($title){
$substr = substr($title, -3);
@@ -1870,10 +2658,24 @@ class ddg{
private function unshiturl($url){
// check for domains w/out first short subdomain (ex: www.)
// remove tracking redirect
// yes, the privacy search engine has click-out tracking. great!
$domain = parse_url($url, PHP_URL_HOST);
if($domain == "duckduckgo.com"){
$query = parse_url($url, PHP_URL_QUERY);
parse_str($query, $query);
if(isset($query["uddg"])){
$url = $query["uddg"];
$domain = parse_url($url, PHP_URL_HOST);
}
}
// check for domains w/out first short subdomain (ex: www.)
$subdomain = preg_replace(
'/^[A-z0-9]{1,3}\./',
"",
@@ -1938,10 +2740,33 @@ class ddg{
private function bingimg($url){
$parse = parse_url($url);
parse_str($parse["query"], $parts);
$image = parse_url($url);
return "https://" . $parse["host"] . "/th?id=" . urlencode($parts["id"]);
$id = null;
if(isset($image["query"])){
parse_str($image["query"], $str);
if(isset($str["id"])){
$id = $str["id"];
}
}
if($id === null){
$id = explode("/th/id/", $image["path"], 2);
if(count($id) !== 2){
// malformed
return $url;
}
$id = $id[1];
}
return "https://" . $image["host"] . "/th?id=" . rawurlencode($id);
}
private function bingratio($width, $height){

415
scraper/flickr.php Normal file
View File

@@ -0,0 +1,415 @@
<?php
class flickr{
const req_web = 0;
const req_xhr = 1;
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("flickr");
include "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
}
public function getfilters($page){
return [
"nsfw" => [
"display" => "NSFW",
"option" => [
"yes" => "Yes",
"maybe" => "Maybe",
"no" => "No",
]
],
"sort" => [
"display" => "Sort by",
"option" => [
"relevance" => "Relevance",
"date-posted-desc" => "Newest uploads",
"date-posted-asc" => "Oldest uploads",
"date-taken-desc" => "Newest taken",
"date-taken-asc" => "Oldest taken",
"interestingness-desc" => "Interesting"
]
],
"color" => [
"display" => "Color",
"option" => [
"any" => "Any color",
// color_codes=
"0" => "Red",
"1" => "Brown",
"2" => "Orange",
"b" => "Pink",
"4" => "Yellow",
"3" => "Golden",
"5" => "Lime",
"6" => "Green",
"7" => "Sky blue",
"8" => "Blue",
"9" => "Purple",
"a" => "Hot pink",
"c" => "White",
"d" => "Gray",
"e" => "Black",
// styles= override
"blackandwhite" => "Black & white",
]
],
"style" => [ // styles=
"display" => "Style",
"option" => [
"any" => "Any style",
"depthoffield" => "Depth of field",
"minimalism" => "Minimalism",
"pattern" => "Patterns"
]
],
"license" => [
"display" => "License",
"option" => [
"any" => "Any license",
"1,2,3,4,5,6,9,11,12,13,14,15,16" => "All creative commons",
"4,5,6,9,10,11,12,13" => "Commercial use allowed",
"1,2,4,5,9,10,11,12,14,15" => "Modifications allowed",
"4,5,9,10,11,12" => "Commercial use & mods allowed",
"7,9,10" => "No known copyright restrictions",
"8" => "U.S Government works"
]
]
];
}
private function get($proxy, $url, $get = [], $reqtype){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
if($reqtype === flickr::req_web){
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: same-origin",
"Sec-Fetch-User: ?1",
"Priority: u=0, i",
"TE: trailers"]
);
}else{
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"Origin: https://www.flickr.com",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Referer: https://www.flickr.com/",
// Cookie:
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-site",
"TE: trailers"]
);
}
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
// http2 bypass
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function image($get){
if($get["npt"]){
[$filters, $proxy] =
$this->backend->get(
$get["npt"], "images"
);
$filters = json_decode($filters, true);
// Workaround for the future, if flickr deprecates &page argument on html page
/*
try{
$json =
$this->get(
$proxy,
"https://api.flickr.com/services/rest",
[
"sort" => $data["sort"],
"parse_tags" => 1,
// url_s,url_n,url_w,url_m,url_z,url_c,url_l,url_h,url_k,url_3k,url_4k,url_5k,url_6k,url_o
"extras" => "can_comment,can_print,count_comments,count_faves,description,isfavorite,license,media,needs_interstitial,owner_name,path_alias,realname,rotation,url_sq,url_q,url_t,url_s,url_n,url_w,url_m,url_z,url_c,url_l",
"per_page" => 100,
"page" => $data["page"],
"lang" => "en-US",
"text" => $data["search"],
"viewerNSID" => "",
"method" => "flickr.photos.search",
"csrf" => "",
"api_key" => $data["api_key"],
"format" => "json",
"hermes" => 1,
"hermesClient" => 1,
"reqId" => $data["reqId"],
"nojsoncallback" => 1
]
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}*/
}else{
if(strlen($get["s"]) === 0){
throw new Exception("Search term is empty!");
}
$proxy = $this->backend->get_ip();
// compute filters
$filters = [
"page" => 1,
"sort" => $get["sort"]
];
if($get["style"] != "any"){
$filters["styles"] = $get["style"];
}
if($get["color"] != "any"){
if($get["color"] != "blackandwhite"){
$filters["color_codes"] = $get["color"];
}else{
$filters["styles"] = "blackandwhite";
}
}
if($get["license"] != "any"){
$filters["license"] = $get["license"];
}
switch($get["nsfw"]){
case "yes": $filters["safe_search"] = 0; break;
case "maybe": $filters["safe_search"] = 2; break;
case "no": $filters["safe_search"] = 1; break;
}
}
$get_params = [
"text" => $get["s"],
"per_page" => 50,
// scrape highest resolution
"extras" => "url_s,url_n,url_w,url_m,url_z,url_c,url_l,url_h,url_k,url_3k,url_4k,url_5k,url_6k,url_o",
"view_all" => 1
];
$get_params = array_merge($get_params, $filters);
$html =
$this->get(
$proxy,
"https://www.flickr.com/search/",
$get_params,
flickr::req_web
);
// @TODO
// get api_key and reqId, if flickr deprecates &page
$this->fuckhtml->load($html);
//
// get response JSON
//
$scripts =
$this->fuckhtml
->getElementsByClassName(
"modelExport",
"script"
);
$found = false;
foreach($scripts as $script){
$json =
preg_split(
'/modelExport: ?/',
$script["innerHTML"],
2
);
if(count($json) !== 0){
$found = true;
$json = $json[1];
break;
}
}
if($found === false){
throw new Exception("Failed to grep JSON");
}
$json =
json_decode(
$this->fuckhtml
->extract_json(
$json
),
true
);
if($json === null){
throw new Exception("Failed to decode JSON");
}
$out = [
"status" => "ok",
"npt" => null,
"image" => []
];
if(!isset($json["main"]["search-photos-lite-models"][0]["data"]["photos"]["data"]["_data"])){
throw new Exception("Failed to access data object");
}
foreach($json["main"]["search-photos-lite-models"][0]["data"]["photos"]["data"]["_data"] as $image){
if(!isset($image["data"])){
// flickr likes to gives us empty array objects
continue;
}
$image = $image["data"];
$title = [];
if(isset($image["title"])){
$title[] =
$this->fuckhtml
->getTextContent(
$image["title"]
);
}
if(isset($image["description"])){
$title[] =
$this->fuckhtml
->getTextContent(
str_replace(
"\n",
" ",
$image["description"]
)
);
}
$title = implode(": ", $title);
$sources = array_values($image["sizes"]["data"]);
$suitable_sizes = ["n", "m", "w", "s"];
$thumb = &$sources[0]["data"];
foreach($suitable_sizes as $testing_size){
if(isset($image["sizes"]["data"][$testing_size])){
$thumb = &$image["sizes"]["data"][$testing_size]["data"];
break;
}
}
$og = &$sources[count($sources) - 1]["data"];
$out["image"][] = [
"title" => $title,
"source" => [
[
"url" => "https:" . $og["displayUrl"],
"width" => (int)$og["width"],
"height" => (int)$og["height"]
],
[
"url" => "https:" . $thumb["displayUrl"],
"width" => (int)$thumb["width"],
"height" => (int)$thumb["height"]
]
],
"url" => "https://www.flickr.com/photos/" . $image["ownerNsid"] . "/" . $image["id"] . "/"
];
}
$total_items = (int)$json["main"]["search-photos-lite-models"][0]["data"]["photos"]["data"]["totalItems"];
if(($filters["page"]) * 50 < $total_items){
$filters["page"]++;
$out["npt"] =
$this->backend->store(
json_encode($filters),
"images",
$proxy
);
}
return $out;
}
}

File diff suppressed because it is too large Load Diff

738
scraper/google_api.php Normal file
View File

@@ -0,0 +1,738 @@
<?php
// @TODO check for consent.google.com page, if need be
class google_api{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("google_api");
}
public function getfilters($page){
$base = [
"country" => [ // gl=<country> (image: cr=countryAF)
"display" => "Country",
"option" => [
"any" => "Instance's country",
"af" => "Afghanistan",
"al" => "Albania",
"dz" => "Algeria",
"as" => "American Samoa",
"ad" => "Andorra",
"ao" => "Angola",
"ai" => "Anguilla",
"aq" => "Antarctica",
"ag" => "Antigua and Barbuda",
"ar" => "Argentina",
"am" => "Armenia",
"aw" => "Aruba",
"au" => "Australia",
"at" => "Austria",
"az" => "Azerbaijan",
"bs" => "Bahamas",
"bh" => "Bahrain",
"bd" => "Bangladesh",
"bb" => "Barbados",
"by" => "Belarus",
"be" => "Belgium",
"bz" => "Belize",
"bj" => "Benin",
"bm" => "Bermuda",
"bt" => "Bhutan",
"bo" => "Bolivia",
"ba" => "Bosnia and Herzegovina",
"bw" => "Botswana",
"bv" => "Bouvet Island",
"br" => "Brazil",
"io" => "British Indian Ocean Territory",
"bn" => "Brunei Darussalam",
"bg" => "Bulgaria",
"bf" => "Burkina Faso",
"bi" => "Burundi",
"kh" => "Cambodia",
"cm" => "Cameroon",
"ca" => "Canada",
"cv" => "Cape Verde",
"ky" => "Cayman Islands",
"cf" => "Central African Republic",
"td" => "Chad",
"cl" => "Chile",
"cn" => "China",
"cx" => "Christmas Island",
"cc" => "Cocos (Keeling) Islands",
"co" => "Colombia",
"km" => "Comoros",
"cg" => "Congo",
"cd" => "Congo, the Democratic Republic",
"ck" => "Cook Islands",
"cr" => "Costa Rica",
"ci" => "Cote D'ivoire",
"hr" => "Croatia",
"cu" => "Cuba",
"cy" => "Cyprus",
"cz" => "Czech Republic",
"dk" => "Denmark",
"dj" => "Djibouti",
"dm" => "Dominica",
"do" => "Dominican Republic",
"ec" => "Ecuador",
"eg" => "Egypt",
"sv" => "El Salvador",
"gq" => "Equatorial Guinea",
"er" => "Eritrea",
"ee" => "Estonia",
"et" => "Ethiopia",
"fk" => "Falkland Islands (Malvinas)",
"fo" => "Faroe Islands",
"fj" => "Fiji",
"fi" => "Finland",
"fr" => "France",
"gf" => "French Guiana",
"pf" => "French Polynesia",
"tf" => "French Southern Territories",
"ga" => "Gabon",
"gm" => "Gambia",
"ge" => "Georgia",
"de" => "Germany",
"gh" => "Ghana",
"gi" => "Gibraltar",
"gr" => "Greece",
"gl" => "Greenland",
"gd" => "Grenada",
"gp" => "Guadeloupe",
"gu" => "Guam",
"gt" => "Guatemala",
"gn" => "Guinea",
"gw" => "Guinea-Bissau",
"gy" => "Guyana",
"ht" => "Haiti",
"hm" => "Heard Island and Mcdonald Islands",
"va" => "Holy See (Vatican City State)",
"hn" => "Honduras",
"hk" => "Hong Kong",
"hu" => "Hungary",
"is" => "Iceland",
"in" => "India",
"id" => "Indonesia",
"ir" => "Iran, Islamic Republic",
"iq" => "Iraq",
"ie" => "Ireland",
"il" => "Israel",
"it" => "Italy",
"jm" => "Jamaica",
"jp" => "Japan",
"jo" => "Jordan",
"kz" => "Kazakhstan",
"ke" => "Kenya",
"ki" => "Kiribati",
"kp" => "Korea, Democratic People's Republic",
"kr" => "Korea, Republic",
"kw" => "Kuwait",
"kg" => "Kyrgyzstan",
"la" => "Lao People's Democratic Republic",
"lv" => "Latvia",
"lb" => "Lebanon",
"ls" => "Lesotho",
"lr" => "Liberia",
"ly" => "Libyan Arab Jamahiriya",
"li" => "Liechtenstein",
"lt" => "Lithuania",
"lu" => "Luxembourg",
"mo" => "Macao",
"mk" => "Macedonia, the Former Yugosalv Republic",
"mg" => "Madagascar",
"mw" => "Malawi",
"my" => "Malaysia",
"mv" => "Maldives",
"ml" => "Mali",
"mt" => "Malta",
"mh" => "Marshall Islands",
"mq" => "Martinique",
"mr" => "Mauritania",
"mu" => "Mauritius",
"yt" => "Mayotte",
"mx" => "Mexico",
"fm" => "Micronesia, Federated States",
"md" => "Moldova, Republic",
"mc" => "Monaco",
"mn" => "Mongolia",
"ms" => "Montserrat",
"ma" => "Morocco",
"mz" => "Mozambique",
"mm" => "Myanmar",
"na" => "Namibia",
"nr" => "Nauru",
"np" => "Nepal",
"nl" => "Netherlands",
"an" => "Netherlands Antilles",
"nc" => "New Caledonia",
"nz" => "New Zealand",
"ni" => "Nicaragua",
"ne" => "Niger",
"ng" => "Nigeria",
"nu" => "Niue",
"nf" => "Norfolk Island",
"mp" => "Northern Mariana Islands",
"no" => "Norway",
"om" => "Oman",
"pk" => "Pakistan",
"pw" => "Palau",
"ps" => "Palestinian Territory, Occupied",
"pa" => "Panama",
"pg" => "Papua New Guinea",
"py" => "Paraguay",
"pe" => "Peru",
"ph" => "Philippines",
"pn" => "Pitcairn",
"pl" => "Poland",
"pt" => "Portugal",
"pr" => "Puerto Rico",
"qa" => "Qatar",
"re" => "Reunion",
"ro" => "Romania",
"ru" => "Russian Federation",
"rw" => "Rwanda",
"sh" => "Saint Helena",
"kn" => "Saint Kitts and Nevis",
"lc" => "Saint Lucia",
"pm" => "Saint Pierre and Miquelon",
"vc" => "Saint Vincent and the Grenadines",
"ws" => "Samoa",
"sm" => "San Marino",
"st" => "Sao Tome and Principe",
"sa" => "Saudi Arabia",
"sn" => "Senegal",
"cs" => "Serbia and Montenegro",
"sc" => "Seychelles",
"sl" => "Sierra Leone",
"sg" => "Singapore",
"sk" => "Slovakia",
"si" => "Slovenia",
"sb" => "Solomon Islands",
"so" => "Somalia",
"za" => "South Africa",
"gs" => "South Georgia and the South Sandwich Islands",
"es" => "Spain",
"lk" => "Sri Lanka",
"sd" => "Sudan",
"sr" => "Suriname",
"sj" => "Svalbard and Jan Mayen",
"sz" => "Swaziland",
"se" => "Sweden",
"ch" => "Switzerland",
"sy" => "Syrian Arab Republic",
"tw" => "Taiwan, Province of China",
"tj" => "Tajikistan",
"tz" => "Tanzania, United Republic",
"th" => "Thailand",
"tl" => "Timor-Leste",
"tg" => "Togo",
"tk" => "Tokelau",
"to" => "Tonga",
"tt" => "Trinidad and Tobago",
"tn" => "Tunisia",
"tr" => "Turkey",
"tm" => "Turkmenistan",
"tc" => "Turks and Caicos Islands",
"tv" => "Tuvalu",
"ug" => "Uganda",
"ua" => "Ukraine",
"ae" => "United Arab Emirates",
"uk" => "United Kingdom",
"us" => "United States",
"um" => "United States Minor Outlying Islands",
"uy" => "Uruguay",
"uz" => "Uzbekistan",
"vu" => "Vanuatu",
"ve" => "Venezuela",
"vn" => "Viet Nam",
"vg" => "Virgin Islands, British",
"vi" => "Virgin Islands, U.S.",
"wf" => "Wallis and Futuna",
"eh" => "Western Sahara",
"ye" => "Yemen",
"zm" => "Zambia",
"zw" => "Zimbabwe"
]
],
"nsfw" => [
"display" => "NSFW",
"option" => [
"yes" => "Yes", // safe=active
"no" => "No" // safe=off
]
]
];
switch($page){
case "web":
return array_merge(
$base,
[
"lang" => [ // lr=<lang> (prefix lang with "lang_")
"display" => "Language",
"option" => [
"any" => "Any language",
"ar" => "Arabic",
"bg" => "Bulgarian",
"ca" => "Catalan",
"cs" => "Czech",
"da" => "Danish",
"de" => "German",
"el" => "Greek",
"en" => "English",
"es" => "Spanish",
"et" => "Estonian",
"fi" => "Finnish",
"fr" => "French",
"hr" => "Croatian",
"hu" => "Hungarian",
"id" => "Indonesian",
"is" => "Icelandic",
"it" => "Italian",
"iw" => "Hebrew",
"ja" => "Japanese",
"ko" => "Korean",
"lt" => "Lithuanian",
"lv" => "Latvian",
"nl" => "Dutch",
"no" => "Norwegian",
"pl" => "Polish",
"pt" => "Portuguese",
"ro" => "Romanian",
"ru" => "Russian",
"sk" => "Slovak",
"sl" => "Slovenian",
"sr" => "Serbian",
"sv" => "Swedish",
"tr" => "Turkish",
"zh-CN" => "Chinese (Simplified)",
"zh-TW" => "Chinese (Traditional)"
]
],
"sort" => [
"display" => "Sort by",
"option" => [
"any" => "Any order",
"date:d" => "Oldest",
"date:a" => "Newest"
]
],
"newer" => [
"display" => "Newer than",
"option" => "_DATE"
],
"rm_dupes" => [
"display" => "Remove duplicates",
"option" => [
"yes" => "Yes",
"no" => "No"
]
]
]
);
break;
/*
case "images":
return array_merge(
$base,
[
"time" => [ // tbs=qdr:<time>
"display" => "Time posted",
"option" => [
"any" => "Any time",
"d" => "Past 24 hours",
"w" => "Past week",
"m" => "Past month",
"y" => "Past year"
]
],
"size" => [ // imgsz
"display" => "Size",
"option" => [
"any" => "Any size",
"l" => "Large",
"m" => "Medium",
"i" => "Icon",
"qsvga" => "Larger than 400x300",
"vga" => "Larger than 640x480",
"svga" => "Larger than 800x600",
"xga" => "Larger than 1024x768",
"2mp" => "Larger than 2MP",
"4mp" => "Larger than 4MP",
"6mp" => "Larger than 6MP",
"8mp" => "Larger than 8MP",
"10mp" => "Larger than 10MP",
"12mp" => "Larger than 12MP",
"15mp" => "Larger than 15MP",
"20mp" => "Larger than 20MP",
"40mp" => "Larger than 40MP",
"70mp" => "Larger than 70MP"
]
],
"ratio" => [ // imgar
"display" => "Aspect ratio",
"option" => [
"any" => "Any ratio",
"t|xt" => "Tall",
"s" => "Square",
"w" => "Wide",
"xw" => "Panoramic"
]
],
"color" => [ // imgc
"display" => "Color",
"option" => [
"any" => "Any color",
"color" => "Full color",
"bnw" => "Black & white",
"trans" => "Transparent",
// from here, imgcolor
"red" => "Red",
"orange" => "Orange",
"yellow" => "Yellow",
"green" => "Green",
"teal" => "Teal",
"blue" => "Blue",
"purple" => "Purple",
"pink" => "Pink",
"white" => "White",
"gray" => "Gray",
"black" => "Black",
"brown" => "Brown"
]
],
"type" => [ // tbs=itp:<type>
"display" => "Type",
"option" => [
"any" => "Any type",
"clipart" => "Clip Art",
"lineart" => "Line Drawing",
"animated" => "Animated"
]
],
"format" => [ // as_filetype
"display" => "Format",
"option" => [
"any" => "Any format",
"jpg" => "JPG",
"gif" => "GIF",
"png" => "PNG",
"bmp" => "BMP",
"svg" => "SVG",
"webp" => "WEBP",
"ico" => "ICO",
"craw" => "RAW"
]
],
"rights" => [ // tbs=sur:<rights>
"display" => "Usage rights",
"option" => [
"any" => "Any license",
"cl" => "Creative Commons licenses",
"ol" => "Commercial & other licenses"
]
]
]
);
break;*/
}
}
private function get($proxy, $url, $get = []){
$curlproc = curl_init();
$headers = [
"Accept: application/json",
"Accept-Encoding: gzip"
];
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
// follow redirects
curl_setopt($curlproc, CURLOPT_FOLLOWLOCATION, true);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function web($get){
// rotate proxy + key on EVERY request
$keydata = $this->backend->get_key();
$proxy = $this->backend->get_ip($keydata["increment"]);
if($get["npt"]){
// $p is never used
[$params, $p] = $this->backend->get(
$get["npt"],
"web"
);
$params = json_decode($params, true);
$params["key"] = $keydata["key"];
}else{
//$json = file_get_contents("scraper/google.json");
$params = [
"q" => $get["s"],
"cx" => config::GOOGLE_CX_ENDPOINT,
"num" => 10,
"start" => 1,
"key" => $keydata["key"]
];
//
// parse filters
//
if($get["newer"] !== false){
$params["dateRestrict"] = "d" . (round((time() - $get["newer"]) / 100000));
}
if($get["rm_dupes"] == "no"){ $params["filter"] = "0"; }
if($get["country"] != "any"){ $params["gl"] = $get["country"]; }
if($get["lang"] != "any"){ $params["lr"] = "lang_" . $get["lang"]; }
if($get["nsfw"] == "yes"){
$params["safe"] = "off";
}else{
$params["safe"] = "active";
}
if($get["sort"] != "any"){ $params["sort"] = $get["sort"]; }
}
try{
$json =
$this->get(
$proxy,
"https://www.googleapis.com/customsearch/v1",
$params
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to decode JSON");
}
$out = [
"status" => "ok",
"spelling" => [
"type" => "no_correction",
"using" => null,
"correction" => null
],
"npt" => null,
"answer" => [],
"web" => [],
"image" => [],
"video" => [],
"news" => [],
"related" => []
];
if(isset($json["error"]["message"])){
throw new Exception(
"API returned an error: " .
$json["error"]["message"] .
" (key #" . $keydata["increment"] . ")"
);
}
if(!isset($json["items"])){
// google just doesnt return items when theres no results
return $out;
}
foreach($json["items"] as $result){
//
// probe for thumbnail
//
$probes = [
isset($result["pagemap"]["cse_thumbnail"][0]["src"]) ? $result["pagemap"]["cse_thumbnail"][0]["src"] : null,
isset($result["pagemap"]["cse_image"][0]["src"]) ? $result["pagemap"]["cse_image"][0]["src"] : null,
isset($result["pagemap"]["metatags"][0]["twitter:image"]) ? $result["pagemap"]["metatags"][0]["twitter:image"] : null,
isset($result["pagemap"]["metatags"][0]["og:image"]) ? $result["pagemap"]["metatags"][0]["og:image"] : null
];
$thumb = [
"url" => null,
"ratio" => null
];
foreach($probes as $probe){
if($probe !== null){
$thumb = [
"url" => $probe,
"ratio" => "16:9"
];
break;
}
}
//
// probe for page format
//
$mime = "web";
if(isset($result["mime"])){
$result["mime"] =
explode(
"/",
$result["mime"],
2
);
if(count($result["mime"]) === 2){
$mime = strtoupper($result["mime"][1]);
}
}
$description = $result["snippet"];
//
// Get date
//
$description_split =
explode(
"...", $description, 2
);
if(count($description_split) === 1){
$description = $result["snippet"];
}elseif(strlen($description_split[0]) < 17){
$date = trim($description_split[0]);
$date_probe = strtotime($date);
if($date_probe !== false){
$description = $description_split[1];
}else{
//
// fallback to getting date from meta tags
//
if(isset($result["pagemap"]["metatags"][0]["creationdate"])){
$date = $result["pagemap"]["metatags"][0]["creationdate"];
}elseif(isset($result["pagemap"]["metatags"][0]["moddate"])){
$date = $result["pagemap"]["metatags"][0]["moddate"];
}else{
$date = null;
}
$description = $result["snippet"];
}
}
if($date !== null){
$date =
strtotime(
trim(
str_replace(
["D:", "'"],
"",
$date
)
)
);
if($date === false){
$date = null;
}
}
$out["web"][] = [
"title" =>
$this->titledots(
$result["title"]
),
"description" =>
$this->titledots(
$description
),
"url" => $result["link"],
"date" => $date,
"type" => $mime,
"thumb" => $thumb,
"sublink" => [],
"table" => []
];
}
// get npt
if(isset($json["queries"]["nextPage"][0]["startIndex"])){
unset($params["key"]);
$params["start"] = (int)$json["queries"]["nextPage"][0]["startIndex"];
$out["npt"] =
$this->backend->store(
json_encode($params),
"web",
$proxy
);
}
return $out;
}
private function titledots($title){
return trim($title, " .\t\n\r\0\x0B");
}
}

View File

@@ -1,4 +1,6 @@
<?php
// greppr dev probably monitors 4get code, lol
// hello greppr dude, add an API you moron
class greppr{
@@ -16,20 +18,30 @@ class greppr{
return [];
}
private function get($proxy, $url, $get = [], $cookie = false){
private function get($proxy, $url, $get = [], $cookies = [], $post = false){
$curlproc = curl_init();
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
$cookie = [];
foreach($cookies as $k => $v){
$cookie[] = "{$k}={$v}";
}
$cookie = implode("; ", $cookie);
if($post === false){
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
if($cookie === false){
if($cookie == ""){
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
@@ -48,17 +60,48 @@ class greppr{
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"Cookie: PHPSESSID=" . $cookie,
"Accept-Encoding: gzip, deflate, br, zstd",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Referer: https://greppr.org/search",
"Cookie: {$cookie}",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"]
"Sec-Fetch-Site: same-origin",
"Sec-Fetch-User: ?1",
"Priority: u=0, i"]
);
}
}else{
$get = http_build_query($get);
curl_setopt($curlproc, CURLOPT_POST, true);
curl_setopt($curlproc, CURLOPT_POSTFIELDS, $get);
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
"Content-Type: application/x-www-form-urlencoded",
"Content-Length: " . strlen($get),
"Origin: https://greppr.org",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Referer: https://greppr.org/",
"Cookie: {$cookie}",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: same-origin",
"Sec-Fetch-User: ?1",
"Priority: u=0, i"]
);
}
@@ -86,7 +129,7 @@ class greppr{
return $len;
}
$headers[strtolower(trim($header[0]))] = trim($header[1]);
$headers[strtolower(trim($header[0]))][] = trim($header[1]);
return $len;
}
@@ -113,7 +156,24 @@ class greppr{
[$q, $proxy] = $this->backend->get($get["npt"], "web");
$q = json_decode($q, true);
$tokens = json_decode($q, true);
//
// Get paginated page
//
try{
$html = $this->get(
$proxy,
"https://greppr.org" . $tokens["get"],
[],
$tokens["cookies"],
false
);
}catch(Exception $error){
throw new Exception("Failed to fetch search page");
}
}else{
@@ -124,88 +184,121 @@ class greppr{
}
$proxy = $this->backend->get_ip();
}
//
// get token
// token[0] = static token that changes once a day
// token[1] = dynamic token that changes on every request
// token[1] = PHPSESSID cookie
$tokens = apcu_fetch("greppr_token");
if(
$tokens === false ||
$first_attempt === false // force token fetch
){
// we haven't gotten the token yet, get it
//
try{
$response =
$html =
$this->get(
$proxy,
"https://greppr.org",
[]
[],
[],
false
);
}catch(Exception $error){
throw new Exception("Failed to fetch search tokens");
throw new Exception("Failed to fetch homepage");
}
$tokens = $this->parse_token($response);
//
// Parse token
//
$this->fuckhtml->load($html["data"]);
if($tokens === false){
$tokens = [
"req" => null,
"data" => null,
"cookies" => null
];
throw new Exception("Failed to grep search tokens");
$inputs =
$this->fuckhtml
->getElementsByTagName(
"input"
);
foreach($inputs as $input){
if(!isset($input["attributes"]["name"])){
continue;
}
if(
isset($input["attributes"]["value"]) &&
!empty($input["attributes"]["value"])
){
$tokens
["data"]
[$this->fuckhtml
->getTextContent(
$input["attributes"]["name"]
)] =
$this->fuckhtml
->getTextContent(
$input["attributes"]["value"]
);
}else{
$tokens["req"] =
$this->fuckhtml
->getTextContent(
$input["attributes"]["name"]
);
}
}
if($tokens["req"] === null){
throw new Exception("Failed to get request ID");
}
if(isset($html["headers"]["set-cookie"])){
foreach($html["headers"]["set-cookie"] as $cookie){
if(
preg_match(
'/([^=]+)=([^;]+)/',
$cookie,
$matches
)
){
$tokens["cookies"][$matches[1]] = $matches[2];
}
}
}
//
// Get initial search page
//
$tokens_req = $tokens["data"];
$tokens_req[$tokens["req"]] = $search;
try{
if($get["npt"]){
$params = [
$tokens[0] => $q["q"],
"s" => $q["s"],
"l" => 30,
"n" => $tokens[1]
];
}else{
$params = [
$tokens[0] => $search,
"n" => $tokens[1]
];
}
$searchresults = $this->get(
$html = $this->get(
$proxy,
"https://greppr.org/search",
$params,
$tokens[2]
$tokens_req,
$tokens["cookies"],
true
);
}catch(Exception $error){
throw new Exception("Failed to fetch search page");
}
if(strlen($searchresults["data"]) === 0){
// redirected to main page, which means we got old token
// generate a new one
// ... unless we just tried to do that
if($first_attempt === false){
throw new Exception("Failed to get a new search token");
}
return $this->web($get, false);
}
//$html = file_get_contents("scraper/greppr.html");
//$this->fuckhtml->load($html);
$this->fuckhtml->load($html["data"]);
// refresh the token with new data (this also triggers fuckhtml load)
$this->parse_token($searchresults, $tokens[2]);
// response object
$out = [
"status" => "ok",
"spelling" => [
@@ -254,24 +347,16 @@ class greppr{
if($break === true){
parse_str(
$out["npt"] =
$this->backend->store(
json_encode([
"get" =>
$this->fuckhtml
->getTextContent(
$a["attributes"]["href"]
),
$values
);
$values = array_values($values);
$out["npt"] =
$this->backend->store(
json_encode(
[
"q" => $values[0],
"s" => $values[1]
]
),
"cookies" => $tokens["cookies"]
]),
"web",
$proxy
);
@@ -360,74 +445,6 @@ class greppr{
return $out;
}
private function parse_token($response, $cookie = false){
$this->fuckhtml->load($response["data"]);
$scripts =
$this->fuckhtml
->getElementsByTagName("script");
$found = false;
foreach($scripts as $script){
preg_match(
'/window\.location ?= ?\'\/search\?([^=]+).*&n=([0-9]+)/',
$script["innerHTML"],
$tokens
);
if(isset($tokens[1])){
$found = true;
break;
}
}
if($found === false){
return false;
}
$tokens = [
$tokens[1],
$tokens[2]
];
if($cookie !== false){
// we already specified a cookie, so use the one we have already
$tokens[] = $cookie;
apcu_store("greppr_token", $tokens);
return $tokens;
}
if(!isset($response["headers"]["set-cookie"])){
// server didn't send a cookie
return false;
}
// get cookie
preg_match(
'/PHPSESSID=([^;]+)/',
$response["headers"]["set-cookie"],
$cookie
);
if(!isset($cookie[1])){
// server sent an unexpected cookie
return false;
}
$tokens[] = $cookie[1];
apcu_store("greppr_token", $tokens);
return $tokens;
}
private function limitstrlen($text){
return explode("\n", wordwrap($text, 300, "\n"))[0];

View File

@@ -182,6 +182,23 @@ class imgur{
throw new Exception("Failed to fetch HTML");
}
$json = json_decode($html, true);
if($json){
// {"data":{"error":"Imgur is temporarily over capacity. Please try again later."},"success":false,"status":403}
if(isset($json["data"]["error"])){
if(stripos($json["data"]["error"], "capacity")){
throw new Exception("Imgur IP blocked this 4get instance or request proxy. Try again");
}
}
throw new Exception("Imgur returned an unknown error (IP ban?)");
}
$this->fuckhtml->load($html);
$posts =
@@ -197,7 +214,14 @@ class imgur{
$image =
$this->fuckhtml
->getElementsByTagName("img")[0];
->getElementsByTagName("img");
if(count($image) === 0){
continue;
}
$image = $image[0];
$image_url = "https:" . substr($this->fuckhtml->getTextContent($image["attributes"]["src"]), 0, -5);

View File

@@ -3,7 +3,10 @@
class marginalia{
public function __construct(){
include "lib/fuckhtml.php";
include "lib/anubis.php";
$this->anubis = new anubis();
include_once "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
include "lib/backend.php";
@@ -102,7 +105,40 @@ class marginalia{
);
}
private function get($proxy, $url, $get = []){
private function get($proxy, $url, $get = [], $get_cookies = 1){
$curlproc = curl_init();
switch($get_cookies){
case 0:
$cookies = "";
$cookies_tmp = [];
curl_setopt($curlproc, CURLOPT_HEADERFUNCTION, function($curlproc, $header) use (&$cookies_tmp){
$length = strlen($header);
$header = explode(":", $header, 2);
if(trim(strtolower($header[0])) == "set-cookie"){
$cookie_tmp = explode("=", trim($header[1]), 2);
$cookies_tmp[trim($cookie_tmp[0])] =
explode(";", $cookie_tmp[1], 2)[0];
}
return $length;
});
break;
case 1:
$cookies = "";
break;
default:
$cookies = "Cookie: " . $get_cookies;
}
$headers = [
"User-Agent: " . config::USER_AGENT,
@@ -110,6 +146,7 @@ class marginalia{
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
$cookies,
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
@@ -118,8 +155,6 @@ class marginalia{
"Sec-Fetch-User: ?1"
];
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
@@ -145,7 +180,19 @@ class marginalia{
throw new Exception(curl_error($curlproc));
}
if($get_cookies === 0){
$cookie = [];
foreach($cookies_tmp as $key => $value){
$cookie[] = $key . "=" . $value;
}
curl_close($curlproc);
return implode(";", $cookie);
}
return $data;
}
@@ -267,6 +314,60 @@ class marginalia{
// HTML parser
$proxy = $this->backend->get_ip();
//
// Bypass anubis check
//
/*
if(($anubis_key = apcu_fetch("marginalia_cookie")) === false){
try{
$html =
$this->get(
$proxy,
"https://old-search.marginalia.nu/search",
[
"query" => $search
]
);
}catch(Exception $error){
throw new Exception("Failed to get anubis challenge");
}
try{
$anubis_data = $this->anubis->scrape($html);
}catch(Exception $error){
throw new Exception($error);
}
// send anubis response & get cookies
// https://old-search.marginalia.nu/.within.website/x/cmd/anubis/api/pass-challenge?response=0000018966b086834f738bacba6031028adb5aa875974ead197a8b75778baf3a&nonce=39947&redir=https%3A%2F%2Fold-search.marginalia.nu%2F&elapsedTime=1164
try{
$anubis_key =
$this->get(
$proxy,
"https://old-search.marginalia.nu/.within.website/x/cmd/anubis/api/pass-challenge",
[
"response" => $anubis_data["response"],
"nonce" => $anubis_data["nonce"],
"redir" => "https://old-search.marginalia.nu/",
"elapsedTime" => random_int(1000, 2000)
],
0
);
}catch(Exception $error){
throw new Exception("Failed to submit anubis challenge");
}
apcu_store("marginalia_cookie", $anubis_key);
}*/
if($get["npt"]){
[$params, $proxy] =
@@ -279,7 +380,9 @@ class marginalia{
$html =
$this->get(
$proxy,
"https://old-search.marginalia.nu/search?" . $params
"https://old-search.marginalia.nu/search?" . $params,
[],
//$anubis_key
);
}catch(Exception $error){
@@ -309,7 +412,8 @@ class marginalia{
$this->get(
$proxy,
"https://old-search.marginalia.nu/search",
$params
$params,
//$anubis_key
);
}catch(Exception $error){

View File

@@ -457,7 +457,7 @@ class mojeek{
"tn" => 7, // number of news results/page
"date" => 1, // show date
"tlen" => 128, // max length of title
"dlen" => 511, // max length of description
//"dlen" => 511, // max length of description
"arc" => ($country == "any" ? "none" : $country) // location. don't use autodetect!
];
@@ -501,11 +501,6 @@ class mojeek{
throw new Exception("Failed to get HTML");
}
/*
$handle = fopen("scraper/mojeek.html", "r");
$html = fread($handle, filesize("scraper/mojeek.html"));
fclose($handle);*/
}
$out = [
@@ -526,6 +521,8 @@ class mojeek{
$this->fuckhtml->load($html);
$this->detect_block();
$results =
$this->fuckhtml
->getElementsByClassName("results-standard", "ul");
@@ -695,17 +692,18 @@ class mojeek{
preg_match(
'/\/image\?img=([^&]+)/i',
$thumb[0]["attributes"]["src"],
$thumb
$matches
);
if(count($thumb) === 2){
if(count($matches) === 2){
// for some reason, if we dont get the image from mojeek
// it sometimes fail to fetch the right image URL
$answer["thumb"] =
urldecode(
"https://mojeek.com" .
$this->fuckhtml
->getTextContent(
$thumb[1]
)
$thumb[0]["attributes"]["src"]
);
}
}
@@ -1034,6 +1032,8 @@ class mojeek{
$this->fuckhtml->load($html);
$this->detect_block();
$articles =
$this->fuckhtml->getElementsByTagName("article");
@@ -1166,6 +1166,26 @@ class mojeek{
return $out;
}
private function detect_block(){
$title =
$this->fuckhtml
->getElementsByTagName(
"title"
);
if(
count($title) !== 0 &&
$this->fuckhtml
->getTextContent(
$title[0]["innerHTML"]
) == "403 - Forbidden"
){
throw new Exception("Mojeek blocked this instance or request proxy.");
}
}
private function titledots($title){
return trim($title, ". \t\n\r\0\x0B");

342
scraper/mullvad.php Normal file
View File

@@ -0,0 +1,342 @@
<?php
class mullvad{
public function __construct($engine){
$this->engine = $engine;
include "lib/backend.php";
$this->backend = new backend("mullvad_{$this->engine}");
}
public function getfilters($page){
return [
"country" => [ // &country=
"display" => "Country",
"option" => [
"any" => "Any country",
"ar" => "Argentina",
"au" => "Australia",
"at" => "Austria",
"be" => "Belgium",
"br" => "Brazil",
"ca" => "Canada",
"cl" => "Chile",
"cn" => "China",
"dk" => "Denmark",
"fi" => "Finland",
"fr" => "France",
"de" => "Germany",
"hk" => "Hong Kong",
"in" => "India",
"id" => "Indonesia",
"it" => "Italy",
"jp" => "Japan",
"kr" => "Korea, Republic",
"my" => "Malaysia",
"mx" => "Mexico",
"nl" => "Netherlands",
"nz" => "New Zealand",
"no" => "Norway",
"ph" => "Philippines",
"pl" => "Poland",
"pt" => "Portugal",
"ru" => "Russian Federation",
"sa" => "Saudi Arabia",
"za" => "South Africa",
"es" => "Spain",
"se" => "Sweden",
"ch" => "Switzerland",
"tw" => "Taiwan",
"tr" => "Turkey",
"uk" => "United Kingdom",
"us" => "United States"
]
],
"language" => [ // &language=
"display" => "Language",
"option" => [
"any" => "Any language",
"ar" => "Arabic",
"bg" => "Bulgarian",
"ca" => "Catalan",
"zh-hans" => "Chinese (Simplified)",
"zh-hant" => "Chinese (Traditional)",
"hr" => "Croatian",
"cs" => "Czech",
"da" => "Danish",
"nl" => "Dutch",
"en" => "English",
"et" => "Estonian",
"fi" => "Finnish",
"fr" => "French",
"de" => "German",
"he" => "Hebrew",
"hu" => "Hungarian",
"is" => "Icelandic",
"it" => "Italian",
"jp" => "Japanese",
"ko" => "Korean",
"lv" => "Latvian",
"lt" => "Lithuanian",
"nb" => "Norwegian",
"pl" => "Polish",
"pt" => "Portuguese",
"ro" => "Romanian",
"ru" => "Russian",
"sr" => "Serbian",
"sk" => "Slovak",
"sl" => "Slovenian",
"es" => "Spanish",
"sv" => "Swedish",
"tr" => "Turkish"
]
],
"time" => [ // &lastUpdated=
"display" => "Time posted",
"option" => [
"any" => "Any time",
"d" => "Past day",
"w" => "Past week",
"m" => "Past month",
"y" => "Past year"
]
]
];
}
private function get($proxy, $url, $get = []){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
// http2 bypass
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
"Referer: https://leta.mullvad.net/search",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Cookie: engine=brave",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"Priority: u=0",
"TE: trailers"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function web($get){
if($get["npt"]){
[$params, $proxy] = $this->backend->get($get["npt"], "web");
$params = json_decode($params, true);
}else{
if(strlen($get["s"]) === 0){
throw new Exception("Search term is empty!");
}
// generate filters
$params = [
"q" => $get["s"],
"engine" => $this->engine,
"page" => 1
];
if($get["country"] != "any"){
$params["country"] = $get["country"];
}
if($get["language"] != "any"){
$params["language"] = $get["language"];
}
if($get["time"] != "any"){
$params["lastUpdated"] = $get["time"];
}
$proxy = $this->backend->get_ip();
}
try{
$json = $this->get(
$proxy,
"https://leta.mullvad.net/search/__data.json",
$params
);
}catch(Exception $error){
throw new Exception("Failed to fetch search page");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to decode JSON");
}
if(!isset($json["nodes"])){
throw new Exception("Mullvad did not return a nodes object");
}
$out = [
"status" => "ok",
"spelling" => [
"type" => "no_correction",
"using" => null,
"correction" => null
],
"npt" => $nextpage,
"answer" => [],
"web" => [],
"image" => [],
"video" => [],
"news" => [],
"related" => []
];
// parse json payload
foreach($json["nodes"] as $node){
if(!isset($node["data"][0]["q"])){
// not iterating through the query object
continue;
}
// node 0 contains pointers to what we need to iterate through
$node0 = &$node["data"][0];
if(!isset($node["data"][$node0["success"]])){
throw new Exception("Mullvad did not return a success object");
}
$success = &$node["data"][$node0["success"]];
if($success === false){
throw new Exception("Mullvad flagged the response as unsuccessful");
}
if(!isset($node["data"][$node0["items"]])){
throw new Exception("Mullvad did not return an items object");
}
$search_pointers = &$node["data"][$node0["items"]];
//
// Iterate over results
//
foreach($search_pointers as $pointer){
$pointer = &$node["data"][$pointer];
$link = &$node["data"][$pointer["link"]];
$title = &$node["data"][$pointer["title"]];
$description = &$node["data"][$pointer["snippet"]];
$date = null;
if($this->engine == "google"){
// attempt to extract date
// Jan 12, 2017
$date_parts = explode(" ... ", $description, 2);
if(
count($date_parts) === 2 &&
strlen($date_parts[0]) < 15
){
$date = strtotime(trim($date_parts[0]));
if($date === false){
$date = null;
}else{
$description = trim($date_parts[1]);
}
}
}
$out["web"][] = [
"title" => $this->titledots($title),
"description" => $this->titledots($description),
"url" => $link,
"date" => $date,
"type" => "web",
"thumb" => [
"url" => null,
"ratio" => null
],
"sublink" => [],
"table" => []
];
}
//
// Get nextpage
//
if(isset($node["data"][$node0["next"]])){
$params["page"] = (int)$node["data"][$node0["next"]];
$out["npt"] =
$this->backend->store(
json_encode($params),
"web",
$proxy
);
}
}
return $out;
}
private function titledots($title){
return trim($title, " .\t\n\r\0\x0B");
}
}

20
scraper/mullvad_brave.php Normal file
View File

@@ -0,0 +1,20 @@
<?php
class mullvad_brave{
public function __construct(){
include "scraper/mullvad.php";
$this->mullvad = new mullvad("brave");
}
public function getfilters($page){
return $this->mullvad->getfilters($page);
}
public function web($get){
return $this->mullvad->web($get);
}
}

View File

@@ -0,0 +1,20 @@
<?php
class mullvad_google{
public function __construct(){
include "scraper/mullvad.php";
$this->mullvad = new mullvad("google");
}
public function getfilters($page){
return $this->mullvad->getfilters($page);
}
public function web($get){
return $this->mullvad->web($get);
}
}

View File

@@ -410,10 +410,7 @@ class qwant{
"thumb" =>
$answer["data"]["result"]["thumbnail"]["landscape"] == null ?
null :
$this->unshitimage(
$answer["data"]["result"]["thumbnail"]["landscape"],
false
),
$this->unshitimage($answer["data"]["result"]["thumbnail"]["landscape"]),
"table" => [],
"sublink" => []
];
@@ -770,7 +767,7 @@ class qwant{
}else{
$thumb = [
"url" => $this->unshitimage($video["thumbnail"], false),
"url" => $this->unshitimage($video["thumbnail"]),
"ratio" => "16:9"
];
}
@@ -870,7 +867,7 @@ class qwant{
}else{
$thumb = [
"url" => $this->unshitimage($news["media"][0]["pict_big"]["url"], false),
"url" => $this->unshitimage($news["media"][0]["pict_big"]["url"]),
"ratio" => "16:9"
];
}
@@ -920,18 +917,77 @@ class qwant{
return trim($text, ". ");
}
private function unshitimage($url, $is_bing = true){
private function unshitimage($url){
// https://s1.qwant.com/thumbr/0x0/8/d/f6de4deb2c2b12f55d8bdcaae576f9f62fd58a05ec0feeac117b354d1bf5c2/th.jpg?u=https%3A%2F%2Fwww.bing.com%2Fth%3Fid%3DOIP.vvDWsagzxjoKKP_rOqhwrQAAAA%26w%3D160%26h%3D160%26c%3D7%26pid%3D5.1&q=0&b=1&p=0&a=0
parse_str(parse_url($url)["query"], $parts);
// https://s2.qwant.com/thumbr/474x289/7/f/412d13b3fe3a03eb2b89633c8e88b609b7d0b93cdd9a5e52db3c663e41e65e/th.jpg?u=https%3A%2F%2Ftse.mm.bing.net%2Fth%3Fid%3DOIP.9Tm_Eo6m7V7ltN19mxduDgHaEh%26pid%3DApi&q=0&b=1&p=0&a=0
if($is_bing){
$parse = parse_url($parts["u"]);
parse_str($parse["query"], $parts);
$image = parse_url($url);
return "https://" . $parse["host"] . "/th?id=" . urlencode($parts["id"]);
if(
!isset($image["host"]) ||
!isset($image["query"])
){
// cant do anything
return $url;
}
return $parts["u"];
$id = null;
if(
preg_match(
'/s[0-9]+\.qwant\.com$/',
$image["host"]
)
){
parse_str($image["query"], $str);
// we're being served a proxy URL
if(isset($str["u"])){
$bing_url = $str["u"];
}else{
// give up
return $url;
}
}
// parse bing URL
$id = null;
$image = parse_url($bing_url);
if(isset($image["query"])){
parse_str($image["query"], $str);
if(isset($str["id"])){
$id = $str["id"];
}
}
if($id === null){
$id = explode("/th/id/", $image["path"], 2);
if(count($id) !== 2){
// malformed
return $url;
}
$id = $id[1];
}
if(is_array($id)){
// fuck off, let proxy.php deal with it
return $url;
}
return "https://" . $image["host"] . "/th?id=" . rawurlencode($id);
}
}

541
scraper/sepiasearch.php Normal file
View File

@@ -0,0 +1,541 @@
<?php
class sepiasearch{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("sepiasearch");
}
public function getfilters($page){
return [
"nsfw" => [
"display" => "NSFW",
"option" => [
"yes" => "Yes", // &sensitiveContent=both
"no" => "No" // &sensitiveContent=false
]
],
"language" => [
"display" => "Language", // &language=
"option" => [
"any" => "Any language",
"en" => "English",
"fr" => "Français",
"ar" => "العربية",
"ca" => "Català",
"cs" => "Čeština",
"de" => "Deutsch",
"el" => "ελληνικά",
"eo" => "Esperanto",
"es" => "Español",
"eu" => "Euskara",
"fa" => "فارسی",
"fi" => "Suomi",
"gd" => "Gàidhlig",
"gl" => "Galego",
"hr" => "Hrvatski",
"hu" => "Magyar",
"is" => "Íslenska",
"it" => "Italiano",
"ja" => "日本語",
"kab" => "Taqbaylit",
"nl" => "Nederlands",
"no" => "Norsk",
"oc" => "Occitan",
"pl" => "Polski",
"pt" => "Português (Brasil)",
"pt-PT" => "Português (Portugal)",
"ru" => "Pусский",
"sk" => "Slovenčina",
"sq" => "Shqip",
"sv" => "Svenska",
"th" => "ไทย",
"tok" => "Toki Pona",
"tr" => "Türkçe",
"uk" => "украї́нська мо́ва",
"vi" => "Tiếng Việt",
"zh-Hans" => "简体中文(中国)",
"zh-Hant" => "繁體中文(台灣)"
]
],
"type" => [
"display" => "Result type", // i handle this
"option" => [
"videos" => "Videos",
"playlists" => "Playlists",
"channels" => "Channels"
]
],
"sort" => [
"display" => "Sort by",
"option" => [
"best" => "Best match", // no filter
"-publishedAt" => "Newest", // sort=-publishedAt
"publishedAt" => "Oldest" // sort=publishedAt
]
],
"newer" => [ // &startDate=2025-07-26T04:00:00.000Z
"display" => "Newer than",
"option" => "_DATE"
],
"duration" => [
"display" => "Duration",
"option" => [
"any" => "Any duration",
"short" => "Short (0-4mins)", // &durationRange=short
"medium" => "Medium (4-10 mins)",
"long" => "Long (10+ mins)",
]
],
"category" => [
"display" => "Category", // &categoryOneOf[]=
"option" => [
"any" => "Any category",
"1" => "Music",
"2" => "Films",
"3" => "Vehicles",
"4" => "Art",
"5" => "Sports",
"6" => "Travels",
"7" => "Gaming",
"8" => "People",
"9" => "Comedy",
"10" => "Entertainment",
"11" => "News & Politics",
"12" => "How To",
"13" => "Education",
"14" => "Activism",
"15" => "Science & Technology",
"16" => "Animals",
"17" => "Kids",
"18" => "Food"
]
],
"display" => [
"display" => "Display",
"option" => [
"any" => "Everything",
"true" => "Live videos", // &isLive=true
"false" => "VODs" // &isLive=false
]
],
"license" => [
"display" => "License", // &license=
"option" => [
"any" => "Any license",
"1" => "Attribution",
"2" => "Attribution - Share Alike",
"3" => "Attribution - No Derivatives",
"4" => "Attribution - Non Commercial",
"5" => "Attribution - Non Commercial - Share Alike",
"6" => "Attribution - Non Commercial - No Derivatives",
"7" => "Public Domain Dedication"
]
]
];
}
private function get($proxy, $url, $get = []){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt(
$curlproc,
CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: application/json, text/plain, */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Referer: https://sepiasearch.org/search",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"Priority: u=0",
"TE: trailers"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function video($get){
if($get["npt"]){
[$npt, $proxy] =
$this->backend
->get(
$get["npt"],
"videos"
);
$npt = json_decode($npt, true);
$type = $npt["type"];
$npt = $npt["npt"];
}else{
$proxy = $this->backend->get_ip();
$npt = [
"search" => $get["s"],
"start" => 0,
"count" => 20
];
if($get["type"] == "videos"){
//
// Parse video filters
//
switch($get["nsfw"]){
case "yes": $npt["nsfw"] = "both"; break;
case "no": $npt["nsfw"] = "false"; break;
}
$npt["boostLanguages[]"] = "en";
if($get["language"] != "any"){
$npt["languageOneOf[]"] = $get["language"];
}
if($get["sort"] != "best"){
$npt["sort"] = $get["sort"];
}
if($get["newer"] !== false){
$date = new DateTime("@{$get["newer"]}");
$date->setTimezone(new DateTimeZone("UTC"));
$formatted = $date->format("Y-m-d\TH:i:s.000\Z");
$npt["startDate"] = $formatted;
}
switch($get["duration"]){
case "short":
$npt["durationMax"] = 240;
break;
case "medium":
$npt["durationMin"] = 240;
$npt["durationMax"] = 600;
break;
case "long":
$npt["durationMin"] = 600;
break;
}
if($get["category"] != "any"){
$npt["categoryOneOf[]"] = $get["category"];
}
if($get["display"] != "any"){
$npt["isLive"] = $get["display"];
}
if($get["license"] != "any"){
// typo in license, lol
$npt["licenceOneOf[]"] = $get["license"];
}
}
$type = $get["type"];
}
switch($type){
case "videos":
$url = "https://sepiasearch.org/api/v1/search/videos";
break;
case "channels":
$url = "https://sepiasearch.org/api/v1/search/video-channels";
break;
case "playlists":
$url = "https://sepiasearch.org/api/v1/search/video-playlists";
break;
}
//$json = file_get_contents("scraper/sepia.json");
try{
$json =
$this->get(
$proxy,
$url,
$npt
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to parse JSON");
}
if(isset($json["errors"])){
$msg = [];
foreach($json["errors"] as $error){
if(isset($error["msg"])){
$msg[] = $error["msg"];
}
}
throw new Exception("Sepia Search returned error(s): " . implode(", ", $msg));
}
if(!isset($json["data"])){
throw new Exception("Sepia Search did not return a data object");
}
$out = [
"status" => "ok",
"npt" => null,
"video" => [],
"author" => [],
"livestream" => [],
"playlist" => [],
"reel" => []
];
switch($get["type"]){
case "videos":
foreach($json["data"] as $video){
if(count($video["account"]["avatars"]) !== 0){
$avatar =
$video["account"]["avatars"][count($video["account"]["avatars"]) - 1]["url"];
}else{
$avatar = null;
}
if($video["thumbnailUrl"] === null){
$thumb = [
"ratio" => null,
"url" => null
];
}else{
$thumb = [
"ratio" => "16:9",
"url" => $video["thumbnailUrl"]
];
}
if($video["isLive"]){
$append = "livestream";
}else{
$append = "video";
}
$out[$append][] = [
"title" => $video["name"],
"description" =>
$this->limitstrlen(
$this->titledots(
$video["description"]
)
),
"author" => [
"name" => $video["account"]["displayName"] . " ({$video["account"]["name"]})",
"url" => $video["account"]["url"],
"avatar" => $avatar
],
"date" => strtotime($video["publishedAt"]),
"duration" => $video["isLive"] ? "_LIVE" : $video["duration"],
"views" => $video["views"],
"thumb" => $thumb,
"url" => $video["url"]
];
}
break;
case "playlists":
foreach($json["data"] as $playlist){
if(count($playlist["ownerAccount"]["avatars"]) !== 0){
$avatar =
$playlist["ownerAccount"]["avatars"][count($playlist["ownerAccount"]["avatars"]) - 1]["url"];
}else{
$avatar = null;
}
if($playlist["thumbnailUrl"] === null){
$thumb = [
"ratio" => null,
"url" => null
];
}else{
$thumb = [
"ratio" => "16:9",
"url" => $playlist["thumbnailUrl"]
];
}
$out["playlist"][] = [
"title" => $playlist["displayName"],
"description" =>
$this->limitstrlen(
$this->titledots(
$playlist["description"]
)
),
"author" => [
"name" => $playlist["ownerAccount"]["displayName"] . " ({$playlist["ownerAccount"]["name"]})",
"url" => $playlist["ownerAccount"]["url"],
"avatar" => $avatar
],
"date" => strtotime($playlist["createdAt"]),
"duration" => $playlist["videosLength"],
"views" => null,
"thumb" => $thumb,
"url" => $playlist["url"]
];
}
break;
case "channels":
foreach($json["data"] as $channel){
if(count($channel["avatars"]) !== 0){
$thumb = [
"ratio" => "1:1",
"url" => $channel["avatars"][count($channel["avatars"]) - 1]["url"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["author"][] = [
"title" => $channel["displayName"] . " ({$channel["name"]})",
"followers" => $channel["followersCount"],
"description" =>
$channel["videosCount"] . " videos. " .
$this->limitstrlen(
$this->titledots(
$channel["description"]
)
),
"thumb" => $thumb,
"url" => $channel["url"]
];
}
break;
}
// get next page
if($json["total"] - 20 > $npt["start"]){
$npt["start"] += 20;
$npt = [
"type" => $get["type"],
"npt" => $npt
];
$out["npt"] =
$this->backend
->store(
json_encode($npt),
"videos",
$proxy
);
}
return $out;
}
private function titledots($title){
$substr = substr($title, -3);
if(
$substr == "..." ||
$substr == ""
){
return trim(substr($title, 0, -3), " \n\r\t\v\x00\0\x0B\xc2\xa0");
}
return trim($title, " \n\r\t\v\x00\0\x0B\xc2\xa0");
}
private function limitstrlen($text){
return
explode(
"\n",
wordwrap(
str_replace(
["\n\r", "\r\n", "\n", "\r"],
" ",
$text
),
300,
"\n"
),
2
)[0];
}
}

View File

@@ -1226,7 +1226,12 @@ class startpage{
// get results
foreach($json["render"]["presenter"]["regions"]["mainline"] as $category){
if($category["display_type"] == "video-youtube"){
if(
preg_match(
'/^video-/i',
$category["display_type"]
)
){
foreach($category["results"] as $video){
@@ -1248,7 +1253,7 @@ class startpage{
}
$out["video"][] = [
"title" => $video["title"],
"title" => str_replace(["", ""], "", $video["title"]),
"description" => $this->limitstrlen($video["description"]),
"author" => [
"name" => $video["channelTitle"],
@@ -1256,7 +1261,7 @@ class startpage{
"avatar" => null
],
"date" => strtotime($video["publishDate"]),
"duration" => $this->hms2int($video["duration"]),
"duration" => $this->hms2int($category["display_type"] == "video-youtube" ? $video["duration"] : $video["duration"] / 1000),
"views" => (int)$video["viewCount"],
"thumb" => $thumb,
"url" => $video["clickUrl"]

754
scraper/vimeo.php Normal file
View File

@@ -0,0 +1,754 @@
<?php
class vimeo{
public function __construct(){
include "lib/backend.php";
$this->backend = new backend("vimeo");
include "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
}
public function getfilters($page){
return [
"time" => [
"display" => "Date uploaded", // &filter_uploaded=
"option" => [
"any" => "Any time",
"today" => "Last 24 hours",
"this-week" => "Last 7 days",
"this-month" => "Last 30 days",
"this-year" => "Last 365 days",
]
],
"display" => [
"display" => "Display",
"option" => [
"video" => "Videos",
"ondemand" => "On-Demand ($$)",
"people" => "People",
"channel" => "Channels",
"group" => "Groups"
]
],
"sort" => [
"display" => "Sort by",
"option" => [
"relevance" => "Relevance", // no param
"recent" => "Newest", // &sort=latest&direction=desc
"popular" => "Most popular", // &sort=popularity&direction=desc
"a_z" => "Title, A to Z", // &sort=alphabetical&direction=asc
"z_a" => "Title, Z to A", // &sort=alphabetical&direction=desc
"longest" => "Longest", // &sort=duration&direction=desc
"shortest" => "Shortest", // &sort=duration&direction=asc
]
],
"duration" => [
"display" => "Duration", // &filter_duration=
"option" => [
"any" => "Any duration",
"short" => "Short (less than 4 minutes)",
"medium" => "Medium (4-10 minutes)",
"long" => "Long (over 10 minutes)"
]
],
"resolution" => [
"display" => "Resolution",
"option" => [
"any" => "Any resolution",
"4k" => "4K" // &filter_resolution=4k
]
],
"category" => [
"display" => "Category", // &filter_category=
"option" => [
"any" => "Any category",
"animation" => "Animation",
"comedy" => "Comedy",
"music" => "Music",
"experimental" => "Experimental",
"documentary" => "Documentary",
"identsandanimatedlogos" => "Idents and Animated Logos",
"industry" => "Industry",
"instructionals" => "Instructionals",
"narrative" => "Narrative",
"personal" => "Personal"
]
],
"live" => [
"display" => "Live events",
"option" => [
"any" => "Any",
"yes" => "Live now" // &filter_live=now
]
],
"hdr" => [
"display" => "HDR", // &filter_hdr=
"option" => [
"any" => "Any",
"hdr" => "Any HDR",
"dolby_vision" => "Dolby Vision",
"hdr10" => "HDR10",
"hdr10+" => "HDR10+"
]
],
"vimeo_360" => [
"display" => "Vimeo 360°", // &filter_vimeo_360
"option" => [
"any" => "Any",
"spatial" => "Spatial",
"360" => "360°"
]
],
"price" => [ // &filter_price=
"display" => "Price",
"option" => [
"any" => "Any price",
"free" => "Free",
"paid" => "Paid"
]
],
"collection" => [
"display" => "Vimeo collections",
"option" => [
"any" => "Any collection",
"staff_pick" => "Staff picks" // &filter_staffpicked=true
]
],
"license" => [ // &filter_license=
"display" => "License",
"option" => [
"any" => "Any license",
"by-nc-nd" => "CC BY-NC-ND",
"by" => "CC BY",
"by-nc" => "CC BY-NC",
"by-nc-sa" => "CC BY-NC-SA",
"by-nd" => "CC BY-ND",
"by-sa" => "CC BY-SA",
"cc0" => "CC0"
]
]
];
}
private function get($proxy, $url, $get = [], $jwt = false){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
if($jwt === false){
curl_setopt(
$curlproc,
CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: */*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br, zstd",
"Referer: https://vimeo.com/search",
"X-Requested-With: XMLHttpRequest",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-origin",
"Priority: u=4"]
);
}else{
curl_setopt(
$curlproc,
CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
"Accept: application/vnd.vimeo.*+json;version=3.3",
"Accept-Language: en",
"Accept-Encoding: gzip, deflate, br, zstd",
"Referer: https://vimeo.com/",
"Content-Type: application/json",
"Authorization: jwt $jwt",
"Vimeo-Page: /search/[[...slug]]",
"Origin: https://vimeo.com",
"DNT: 1",
"Sec-GPC: 1",
"Connection: keep-alive",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-site",
"Priority: u=4"]
);
}
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$this->backend->assign_proxy($curlproc, $proxy);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function video($get){
// parse shit
if($get["npt"]){
[$npt, $proxy] =
$this->backend
->get(
$get["npt"],
"videos"
);
$npt = json_decode($npt, true);
$pagetype = $npt["pagetype"];
$npt = $npt["npt"];
$jwt = $this->get_jwt($proxy);
try{
$json =
$this->get(
$proxy,
"https://api.vimeo.com" . $npt,
[],
$jwt
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
}else{
$proxy = null;
$jwt = $this->get_jwt($proxy); // this gives us a proxy by reference
// parse filters
$npt = [
"query" => $get["s"],
"page" => 1,
"per_page" => 24,
"facets" => "type"
];
switch($get["display"]){
case "video":
$npt["filter_type"] = "clip";
$npt["fields"] = "clip.name,stats.plays,clip.pictures,clip.user.name,clip.user.link,clip.user.pictures.sizes,clip.uri,clip.stats.plays,clip.duration,clip.created_time,clip.link,clip.description";
break;
case "ondemand":
$npt["filter_type"] = "ondemand";
$npt["sizes"] = "296x744";
$npt["fields"] = "ondemand.link,ondemand.name,ondemand.pictures.sizes,ondemand.metadata.interactions.buy,ondemand.metadata.interactions.rent,ondemand.uri";
break;
case "people":
$npt["filter_type"] = "people";
$npt["fetch_user_profile"] = "1";
$npt["fields"] = "people.name,people.location_details.formatted_address,people.metadata.public_videos.total,people.pictures.sizes,people.link,people.metadata.connections.followers.total,people.skills.name,people.skills.uri,people.background_video,people.uri";
break;
case "channel":
$npt["filter_type"] = "channel";
$npt["fields"] = "channel.name,channel.metadata.connections.users.total,channel.metadata.connections.videos.total,channel.pictures.sizes,channel.link,channel.uri";
break;
case "group":
$npt["filter_type"] = "group";
$npt["fields"] = "group.name,group.metadata.connections.users.total,group.metadata.connections.videos.total,group.pictures.sizes,group.link,group.uri";
break;
}
// only apply filters if we're searching for videos
if($get["display"] == "video"){
switch($get["sort"]){
case "relevance": break; // do nothing
case "recent":
$npt["sort"] = "latest";
$npt["direction"] = "desc";
break;
case "popular":
$npt["sort"] = "popularity";
$npt["direction"] = "desc";
break;
case "a_z":
$npt["sort"] = "alphabetical";
$npt["direction"] = "asc";
break;
case "z_a":
$npt["sort"] = "alphabetical";
$npt["direction"] = "desc";
break;
case "longest":
$npt["sort"] = "duration";
$npt["direction"] = "desc";
break;
case "shortest":
$npt["sort"] = "duration";
$npt["direction"] = "asc";
break;
}
if($get["time"] != "any"){
$npt["filter_uploaded"] = $get["time"];
}
if($get["duration"] != "any"){
$npt["filter_duration"] = $get["duration"];
}
if($get["resolution"] != "any"){
$npt["filter_resolution"] = $get["resolution"];
}
if($get["category"] != "any"){
$npt["filter_category"] = $get["category"];
}
if($get["live"] != "any"){
$npt["filter_live"] = "now";
}
if($get["hdr"] != "any"){
$npt["filter_hdr"] = $get["hdr"];
}
if($get["vimeo_360"] != "any"){
$npt["filter_vimeo_360"] = $get["vimeo_360"];
}
if($get["price"] != "any"){
$npt["filter_price"] = $get["price"];
}
if($get["collection"] == "staff_pick"){
$npt["filter_staffpicked"] = "true";
}
if($get["license"] != "any"){
$npt["filter_license"] = $get["license"];
}
}
$pagetype = $npt["filter_type"];
try{
$json =
$this->get(
$proxy,
"https://api.vimeo.com/search",
$npt,
$jwt
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");
}
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("Failed to parse JSON");
}
$out = [
"status" => "ok",
"npt" => null,
"video" => [],
"author" => [],
"livestream" => [],
"playlist" => [],
"reel" => []
];
if(isset($json["error"])){
$error = $json["error"];
if(isset($json["developer_message"])){
$error .= " ({$json["developer_message"]})";
}
throw new Exception("Vimeo returned an error: " . $error);
}
if(!isset($json["data"])){
throw new Exception("Vimeo did not return a data object");
}
switch($pagetype){
case "clip":
foreach($json["data"] as $video){
$video = $video["clip"];
if(isset($video["user"]["pictures"]["sizes"])){
$avatar = $video["user"]["pictures"]["sizes"][count($video["user"]["pictures"]["sizes"]) - 1]["link"];
}else{
$avatar = null;
}
$out["video"][] = [
"title" => $video["name"],
"description" =>
$this->limitstrlen(
$video["description"]
),
"author" => [
"name" => $video["user"]["name"],
"url" => $video["user"]["link"],
"avatar" => $avatar
],
"date" => strtotime($video["created_time"]),
"duration" => (int)$video["duration"],
"views" => (int)$video["stats"]["plays"],
"thumb" => [
"ratio" => "16:9",
"url" => $video["pictures"]["base_link"]
],
"url" => $video["link"]
];
}
break;
case "ondemand":
foreach($json["data"] as $video){
$video = $video["ondemand"];
$description = [];
if(isset($video["metadata"]["interactions"]["rent"]["display_price"])){
$description[] = "Rent for " . $video["metadata"]["interactions"]["rent"]["display_price"];
}
if(isset($video["metadata"]["interactions"]["buy"]["display_price"])){
$description[] = "Buy for " . $video["metadata"]["interactions"]["buy"]["display_price"];
}
$description = implode(", ", $description);
$out["video"][] = [
"title" => $video["name"],
"description" => $description,
"author" => [
"name" => null,
"url" => null,
"avatar" => null
],
"date" => null,
"duration" => null,
"views" => null,
"thumb" => [
"ratio" => "9:16",
"url" => $video["pictures"]["sizes"][0]["link"]
],
"url" => $video["link"]
];
}
break;
case "people":
foreach($json["data"] as $user){
$user = $user["people"];
if(
isset($user["pictures"]["sizes"]) &&
count($user["pictures"]["sizes"]) !== 0
){
$thumb = [
"ratio" => "1:1",
"url" => $user["pictures"]["sizes"][count($user["pictures"]["sizes"]) - 1]["link"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["author"][] = [
"title" => $user["name"],
"followers" => (int)$user["metadata"]["connections"]["followers"]["total"],
"description" => $user["metadata"]["public_videos"]["total"] . " videos.",
"thumb" => $thumb,
"url" => $user["link"]
];
}
break;
case "channel":
case "group":
foreach($json["data"] as $channel){
$channel = $channel[$npt["filter_type"]];
if(
isset($channel["pictures"]["sizes"]) &&
count($channel["pictures"]["sizes"]) !== 0
){
$thumb = [
"ratio" => "16:9",
"url" => $channel["pictures"]["sizes"][count($channel["pictures"]["sizes"]) - 1]["link"]
];
}else{
$thumb = [
"ratio" => null,
"url" => null
];
}
$out["author"][] = [
"title" => $channel["name"],
"followers" => (int)$channel["metadata"]["connections"]["users"]["total"],
"description" => $channel["metadata"]["connections"]["videos"]["total"] . " videos.",
"thumb" => $thumb,
"url" => $channel["link"]
];
}
break;
}
//
// get next page
//
if(
isset($json["paging"]["next"]) &&
is_string($json["paging"]["next"])
){
$out["npt"] =
$this->backend
->store(
json_encode([
"npt" => $json["paging"]["next"],
"pagetype" => $pagetype
]),
"videos",
$proxy
);
}
return $out;
}
private function get_jwt(&$proxy){
//
// get jwt token
// it's probably safe to cache this across proxies, cause the jwt doesnt contain an userID
// only an appID, whatever shit that is
// we can only cache it for 5 minutes though, otherwise vimeo cries about it
//
if($proxy === null){
$proxy = $this->backend->get_ip();
}
$jwt = apcu_fetch("vimeo_jwt");
if($jwt === false){
/*
$html =
$this->get(
$proxy,
"https://vimeo.com/search",
[],
false
);
$this->fuckhtml->load($html);
$captcha =
$this->fuckhtml
->getElementsByTagName(
"title"
);
if(
count($captcha) !== 0 &&
$this->fuckhtml
->getTextContent(
$captcha[0]
) == "Vimeo / CAPTCHA Challenge"
){
throw new Exception("Vimeo returned a Captcha");
}
$html =
explode(
'<script id="viewer-bootstrap" type="application/json">',
$html,
2
);
if(count($html) !== 2){
throw new Exception("Failed to find JWT json");
}
$jwt =
json_decode(
$this->fuckhtml
->extract_json(
$html[1]
),
true
);
if($jwt === null){
throw new Exception("Failed to decode JWT json");
}
if(!isset($jwt["jwt"])){
throw new Exception("Failed to grep JWT");
}
$jwt = $jwt["jwt"];
*/
try{
$json =
$this->get(
$proxy,
"https://vimeo.com/_next/jwt",
[],
false
);
}catch(Exception $error){
throw new Exception("Failed to fetch JWT token");
}
$this->fuckhtml->load($json);
$captcha =
$this->fuckhtml
->getElementsByTagName(
"title"
);
if(
count($captcha) !== 0 &&
$this->fuckhtml
->getTextContent(
$captcha[0]
) == "Vimeo / CAPTCHA Challenge"
){
throw new Exception("Vimeo returned a Captcha");
}
$json = json_decode($json, true);
if($json === null){
throw new Exception("The JWT object could not be decoded");
}
if(!isset($json["token"])){
throw new Exception("Vimeo did not return a JWT");
}
$jwt = $json["token"];
apcu_store("vimeo_jwt", $jwt, 300);
}
return $jwt;
}
private function titledots($title){
$substr = substr($title, -3);
if(
$substr == "..." ||
$substr == ""
){
return trim(substr($title, 0, -3), " \n\r\t\v\x00\0\x0B\xc2\xa0");
}
return trim($title, " \n\r\t\v\x00\0\x0B\xc2\xa0");
}
private function limitstrlen($text){
return
explode(
"\n",
wordwrap(
str_replace(
["\n\r", "\r\n", "\n", "\r"],
" ",
$text
),
300,
"\n"
),
2
)[0];
}
}

View File

@@ -14,7 +14,7 @@ class yandex{
// backend included in the scraper functions
}
private function get($proxy, $url, $get = [], $nsfw){
private function get($proxy, $url, $get = [], $nsfw, $get_cookie = 1){
$curlproc = curl_init();
@@ -25,19 +25,55 @@ class yandex{
curl_setopt($curlproc, CURLOPT_URL, $url);
// extract "i" cookie
if($get_cookie === 0){
$cookies_tmp = [];
curl_setopt($curlproc, CURLOPT_HEADERFUNCTION, function($curlproc, $header) use (&$cookies_tmp){
$length = strlen($header);
$header = explode(":", $header, 2);
if(trim(strtolower($header[0])) == "set-cookie"){
$cookie_tmp = explode("=", trim($header[1]), 2);
$cookies_tmp[trim($cookie_tmp[0])] =
explode(";", $cookie_tmp[1], 2)[0];
}
return $length;
});
}
switch($nsfw){
case "yes": $nsfw = "0"; break;
case "maybe": $nsfw = "1"; break;
case "no": $nsfw = "2"; break;
}
switch($get_cookie){
case 0:
$cookie = "";
break;
case 1:
$cookie = "Cookie: yp=" . (time() - 4000033) . ".szm.1:1920x1080:876x1000#" . time() . ".sp.family:" . $nsfw;
break;
default:
$cookie = "Cookie: i=" . $get_cookie;
}
$headers =
["User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding: gzip",
"Accept-Language: en-US,en;q=0.5",
"DNT: 1",
"Cookie: yp=1716337604.sp.family%3A{$nsfw}#1685406411.szm.1:1920x1080:1920x999",
$cookie,
"Referer: https://yandex.com/images/search",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
@@ -59,6 +95,17 @@ class yandex{
$data = curl_exec($curlproc);
if($get_cookie === 0){
if(isset($cookies_tmp["i"])){
return $cookies_tmp["i"];
}else{
throw new Exception("Failed to get Yandex clearance cookie");
}
}
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
@@ -217,6 +264,23 @@ class yandex{
// https://yandex.com/search/site/?text=minecraft&web=1&frame=1&v=2.0&searchid=3131712
// &within=777&from_day=26&from_month=8&from_year=2023&to_day=26&to_month=8&to_year=2023
// get clearance cookie
if(($cookie = apcu_fetch("yandexweb_cookie")) === false){
$proxy = $this->backend->get_ip();
$cookie =
$this->get(
$proxy,
"https://yandex.ru/support2/smart-captcha/ru/",
[],
false,
0
);
apcu_store("yandexweb_cookie", $cookie);
}
if($get["npt"]){
[$npt, $proxy] = $this->backend->get($get["npt"], "web");
@@ -226,7 +290,8 @@ class yandex{
$proxy,
"https://yandex.com" . $npt,
[],
"yes"
"yes",
$cookie
);
}else{
@@ -236,7 +301,7 @@ class yandex{
throw new Exception("Search term is empty!");
}
$proxy = $this->backend->get_ip();
$proxy = !isset($proxy) ? $this->backend->get_ip() : $proxy;
$lang = $get["lang"];
$older = $get["older"];
$newer = $get["newer"];
@@ -283,7 +348,8 @@ class yandex{
$proxy,
"https://yandex.com/search/site/",
$params,
"yes"
"yes",
$cookie
);
}catch(Exception $error){
@@ -314,6 +380,19 @@ class yandex{
$this->fuckhtml->load($html);
// Scrape page blocked error
$title =
$this->fuckhtml
->getElementsByTagName("title");
if(
count($title) !== 0 &&
$title[0]["innerHTML"] == "403"
){
throw new Exception("Yandex blocked this proxy or 4get instance.");
}
// get nextpage
$npt =
$this->fuckhtml
@@ -668,7 +747,6 @@ class yandex{
foreach($json["blocks"] as $block){
$html .= $block["html"];
// get next page
if(
isset($block["params"]["nextPageUrl"]) &&

View File

@@ -255,13 +255,6 @@ class yep{
// use http2
curl_setopt($curlproc, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
// set ciphers
curl_setopt(
$curlproc,
CURLOPT_SSL_CIPHER_LIST,
"aes_128_gcm_sha_256,chacha20_poly1305_sha_256,aes_256_gcm_sha_384,ecdhe_ecdsa_aes_128_gcm_sha_256,ecdhe_rsa_aes_128_gcm_sha_256,ecdhe_ecdsa_chacha20_poly1305_sha_256,ecdhe_rsa_chacha20_poly1305_sha_256,ecdhe_ecdsa_aes_256_gcm_sha_384,ecdhe_rsa_aes_256_gcm_sha_384,ecdhe_ecdsa_aes_256_sha,ecdhe_ecdsa_aes_128_sha,ecdhe_rsa_aes_128_sha,ecdhe_rsa_aes_256_sha,rsa_aes_128_gcm_sha_256,rsa_aes_256_gcm_sha_384,rsa_aes_128_sha,rsa_aes_256_sha"
);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: " . config::USER_AGENT,
@@ -351,6 +344,7 @@ class yep{
"type" => "web"
]
);
}catch(Exception $error){
throw new Exception("Failed to fetch JSON");

View File

@@ -125,6 +125,10 @@ $settings = [
"value" => "brave",
"text" => "Brave"
],
[
"value" => "mullvad_brave",
"text" => "Mullvad (Brave)"
],
[
"value" => "yandex",
"text" => "Yandex"
@@ -133,10 +137,18 @@ $settings = [
"value" => "google",
"text" => "Google"
],
[
"value" => "google_api",
"text" => "Google API"
],
[
"value" => "google_cse",
"text" => "Google CSE"
],
[
"value" => "mullvad_google",
"text" => "Mullvad (Google)"
],
[
"value" => "startpage",
"text" => "Startpage"
@@ -169,6 +181,14 @@ $settings = [
"value" => "mojeek",
"text" => "Mojeek"
],
[
"value" => "baidu",
"text" => "Baidu"
],
[
"value" => "coccoc",
"text" => "Cốc Cốc"
],
[
"value" => "solofield",
"text" => "Solofield"
@@ -223,6 +243,10 @@ $settings = [
"value" => "yep",
"text" => "Yep"
],
[
"value" => "baidu",
"text" => "Baidu"
],
[
"value" => "solofield",
"text" => "Solofield"
@@ -231,6 +255,14 @@ $settings = [
"value" => "pinterest",
"text" => "Pinterest"
],
[
"value" => "cara",
"text" => "Cara"
],
[
"value" => "flickr",
"text" => "Flickr"
],
[
"value" => "fivehpx",
"text" => "500px"
@@ -257,6 +289,14 @@ $settings = [
"value" => "yt",
"text" => "YouTube"
],
[
"value" => "vimeo",
"text" => "Vimeo"
],
[
"value" => "sepiasearch",
"text" => "Sepia Search"
],
[
"value" => "ddg",
"text" => "DuckDuckGo"
@@ -281,6 +321,14 @@ $settings = [
"value" => "qwant",
"text" => "Qwant"
],
[
"value" => "baidu",
"text" => "Baidu"
],
[
"value" => "coccoc",
"text" => "Cốc Cốc"
],
[
"value" => "solofield",
"text" => "Solofield"
@@ -318,6 +366,10 @@ $settings = [
[
"value" => "mojeek",
"text" => "Mojeek"
],
[
"value" => "baidu",
"text" => "Baidu"
]
]
],

View File

@@ -12,8 +12,6 @@
--ebdbb2:#ebdbb2;
}
body{
padding:15px 4% 40px;
margin:unset;
@@ -42,7 +40,6 @@ h3,h4,h5,h6{
background:#723c0b;
}
.searchbox input{
all:unset;
line-height:36px;
@@ -97,7 +94,6 @@ h3,h4,h5,h6{
display:inline-block;
}
.tabs .tab.selected{
border-bottom:2px solid #fc92a5;
}
@@ -107,7 +103,7 @@ h3,h4,h5,h6{
padding-bottom:12px;
padding-top:7px;
margin-bottom:7px;
background-color:#232525
background-color:#232525;
}
.filters .filter{
@@ -170,7 +166,6 @@ h3,h4,h5,h6{
font-size:12px;
}
.web .hover{
display:block;
text-decoration:none;
@@ -194,16 +189,13 @@ h3,h4,h5,h6{
color:#9760b1 !important;
}
.web .text-result .greentext{
font-size:14px;
color:var(--bdae93);
}
/* favicon */
.favicon-dropdown a{
text-decoration:none;
color:#d3d0c1;
@@ -212,37 +204,31 @@ h3,h4,h5,h6{
font-size:13px;
}
.web .favicon img,
.favicon-dropdown img{
.web .favicon img, .favicon-dropdown img{
margin:3px 7px 0 0;
height:16px;
font-size:12px;
line-height:16px;;
line-height:16px;
display:block;
text-align:left;
}
.web .sublinks{
padding:17px 10px;
font-size:15px;
color:var(--#928374);
}
.web .text-result .sublinks:last-child{
padding-bottom:0;
}
/* Wikipedia head */
.wiki-head{
padding:5px;
background-color: #322f2b
background-color:#322f2b;
}
/*
Images tab
*/
@@ -258,8 +244,6 @@ h3,h4,h5,h6{
float:left;
}
#images .image .title{
white-space:nowrap;
overflow:hidden;
@@ -268,7 +252,6 @@ h3,h4,h5,h6{
color:var(--bdae93);
}
#popup-status{
display:none;
position:fixed;
@@ -284,40 +267,56 @@ h3,h4,h5,h6{
Settings page
*/
.web .settings-submit a{
margin-right:17px;
color:#bdae93;
}
/*
Responsive image
*/
@media only screen and (max-width: 1454px){ #images .image-wrapper{ width:25%; } }
@media only screen and (max-width: 1161px){ #images .image-wrapper{ width:25%; } }
@media only screen and (max-width: 750px){ #images .image-wrapper{ width:50%; } }
@media only screen and (max-width: 450px){ #images .image-wrapper{ width:100%; } }
@media only screen and (max-width:1454px){
#images .image-wrapper{
width:25%;
}
}
@media only screen and (max-width:1161px){
#images .image-wrapper{
width:25%;
}
}
@media only screen and (max-width:750px){
#images .image-wrapper{
width:50%;
}
}
@media only screen and (max-width:450px){
#images .image-wrapper{
width:100%;
}
}
/*
Responsive design
*/
@media only screen and (max-width:1550px){
.web .left,
.searchbox{
width:60%;
}
}
@media only screen and (max-width: 1000px){
@media only screen and (max-width:1100px){
.web .left,
.searchbox{
width:100%;
}
}
.type{
color:var(--bdae93);
}
}

17
static/themes/Kuuro.css Normal file
View File

@@ -0,0 +1,17 @@
:root{
--1d2021: #101010;
--282828: #1a1a1a;
--3c3836: #1e1e1e;
--504945: #232323;
--928374: #949494;
--a89984: #d2d2d2;
--bdae93: #d2d2d2;
--8ec07c: #99c794;
--ebdbb2: #d2d2d2;
--comment: #5f6364;
--default: #cccece;
--keyword: #c594c5;
--string: #99c794;
}

View File

@@ -89,7 +89,7 @@ if($results["spelling"]["type"] != "no_correction"){
'&' .
$frontend->buildquery($get, true) .
'&spellcheck=no">' .
$results["spelling"]["correction"] .
htmlspecialchars($results["spelling"]["correction"]) .
'</a>?' .
'</div>';
}