still missing things on google scraper

This commit is contained in:
lolcat 2023-07-22 14:41:14 -04:00
commit bca265aea6
90 changed files with 17559 additions and 0 deletions

72
README.md Normal file
View File

@ -0,0 +1,72 @@
# 4get
4get is a metasearch engine that doesn't suck (they live in our walls!)
## About 4get
https://4get.ca/about
## Try it out
https://4get.ca
# Setup
Login as root.
```sh
apt install apache2 certbot php-dom php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-php python3-certbot-apache
service apache2 start
a2enmod rewrite
```
For all of the files in `/etc/apache2/sites-enabled/`, you must apply the following changes:
- Uncomment `ServerName` directive, put your domain name there
- Change `ServerAdmin` to your email
- Change `DocumentRoot` to `/var/www/html/4get`
- Change `ErrorLog` and `CustomLog` directives to log stuff out to `/dev/null/`
Now open `/etc/apache2/apache2.conf` and change `ErrorLog` and `CustomLog` directives to have `/dev/null/` as a value
This *should* disable logging completely, but I'm not 100% sure since I sort of had to troubleshoot alot of shit while writing this. So after we're done check if `/var/log/apache2/*` contains any personal info, and if it does, call me retarded trough email exchange.
Blindly run the following shit
```sh
cd /var/www/html
git clone https://git.lolcat.ca/lolcat/4get
cd 4get
mkdir icons
chmod 777 -R icons/
```
Restart the service for good measure... `service apache2 restart`
## Setup encryption
I'm schizoid (as you should) so I'm gonna setup 4096bit key encryption. To complete this step, you need a domain or subdomain in your possession. Make sure that the DNS shit for your domain has propagated properly before continuing, because certbot is a piece of shit that will error out the ass once you reach 5 attempts under an hour.
```sh
certbot --apache --rsa-key-size 4096 -d www.yourdomain.com -d yourdomain.com
```
When it asks to choose a vhost, choose the option with "HTTPS" listed. Don't setup HTTPS for tor, we don't need it (it doesn't even work anyways with let's encrypt)
Edit `000-default-le-ssl.conf`
Add this at the end:
```xml
<Directory /var/www/html/4get>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule (.*) $1.php [L]
Options Indexes FollowSymLinks
AllowOverride All
Require all granted
</Directory>
```
Now since this file is located in `/etc/apache2/sites-enabled/`, you must change all of the logging shit as to make it not log anything, like we did earlier.
Restart again
```sh
service apache2 restart
```
You'll probably want to setup a tor address at this point, but I'm too lazy to put instructions here.
Ok bye!!!

130
about.php Normal file
View File

@ -0,0 +1,130 @@
<?php
include "lib/frontend.php";
$frontend = new frontend();
echo
'<!DOCTYPE html>' .
'<html lang="en">' .
'<head>' .
'<meta http-equiv="Content-Type" content="text/html;charset=utf-8">' .
'<title>About</title>' .
'<link rel="stylesheet" href="/static/style.css">' .
'<meta name="viewport" content="width=device-width,initial-scale=1">' .
'<meta name="robots" content="index,follow">' .
'<link rel="icon" type="image/x-icon" href="/favicon.ico">' .
'<meta name="description" content="4get.ca: About">' .
'<link rel="search" type="application/opensearchdescription+xml" title="4get" href="/opensearch.xml">' .
'</head>' .
'<body class="' . $frontend->getthemeclass(false) . 'about">';
$left =
'<a href="/" class="link">&lt; Go back</a>
<h1>Set as default search engine</h1>
<a href="#firefox"><h2 id="firefox">On Firefox and other Gecko based browsers</h2></a>
To set this as your default search engine on Firefox, right click the URL bar and select <div class="code-inline">Add "4get"</div>. Then, visit <a href="about:preferences#search" target="_BLANK" class="link">about:preferences#search</a> and select <div class="code-inline">4get</div> in the dropdown menu.
<a href="#chrome"><h2 id="chrome">On Chromium and Blink based browsers</h2></a>
Right click the URL bar and click <div class="code-inline">Manage search engines and site search</div>, or visit <a href="chrome://settings/searchEngines" target="_BLANK" class="link">chrome://settings/searchEngines</a>. Then, create a new entry under <div class="code-inline">Search engines</div> and fill in the following details:
<table>
<tr>
<td><b>Field</b></td>
<td><b>Value</b></td>
</tr>
<tr>
<td>Search engine</td>
<td>4get</td>
</tr>
<tr>
<td>Shortcut</td>
<td>4get.ca</td>
</tr>
<tr>
<td>URL with %s in place of query</td>
<td>https://4get.ca/web?q=%s</td>
</tr>
</table>
Once that\'s done, click <div class="code-inline">Save</div>. Then, on the right handside of the newly created entry, open the dropdown menu and select <div class="code-inline">Make default</div>.
<a href="#other-browsers"><h2 id="other-browsers">Other browsers</h2></a>
Get a real browser.
<h1>Frequently asked questions</h1>
<a href="#what-is-this"><h2 id="what-is-this">What is this?</h2></a>
This is a metasearch engine that gets results from other engines, and strips away all of the tracking parameters and Microsoft/globohomo bullshit they add. Most of the other alternatives to Google jack themselves off about being ""privacy respecting"" or whatever the fuck but it always turns out to be a total lie, and I just got fed up with their shit honestly. Alternatives like Searx or YaCy all fucking sucks so I made my own thing.
<a href="#goal"><h2 id="goal">My goal</h2></a>
Provide users with a privacy oriented, extremely lightweight, ad free, free as in freedom (and free beer!) way to search for documents around the internet, with minimal, optional javascript code. My long term goal would be to build my own index (that doesn\'t suck) and provide users with an unbiased search engine, with no political inclinations.
<a href="#logs"><h2 id="logs">Do you keep logs?</h2></a>
I store data temporarly to get the next page of results. This might include search queries, tokens and other parameters. These parameters are encrypted using <div class="code-inline">aes-256-gcm</div> on the serber, for which I give you a key (also known internally as <div class="code-inline">npt</div> token). When you make a request to get the next page, you supply the token, the data is decrypted and the request is fulfilled. This encrypted data is deleted after 7 minutes, or after it\'s used, whichever comes first.<br><br>
I <b>don\'t</b> log IP addresses, user agents, or anything else. The <div class="code-inline">npt</div> tokens are the only thing that are stored (in RAM, mind you), temporarly, encrypted.
<a href="#information-sharing"><h2 id="information-sharing">Do you share information with third parties?</h2></a>
Your search queries and supplied filters are shared with the scraper you chose (so I can get the search results, duh). I don\'t share anything else (that means I don\'t share your IP address, location, or anything of this kind). There is no way that site can know you\'re the one searching for something, <u>unless you send out a search query that de-anonymises you.</u> For example, a search query like "hello my full legal name is jonathan gallindo and i want pictures of cloacas" would definitively blow your cover. 4get doesn\'t contain ads or any third party javascript applets or trackers. I don\'t profile you, and quite frankly, I don\'t give a shit about what you search on there.<br><br>
TL;DR assume those websites can see what you search for, but can\'t see who you are (unless you\'re really dumb).
<a href="#hosting"><h2 id="hosting">Where is this website hosted?</h2></a>
This website is hosted on a Contabo shitbox in the United States.
<a href="#keyboard-shortcuts"><h2 id="keyboard-shortcuts">Keyboard shortcuts?</h2></a>
Use <div class="code-inline">/</div> to focus the search box.<br><br>
When the image viewer is open, you can use the following keybinds:<br>
<div class="code-inline">Up</div>, <div class="code-inline">Down</div>, <div class="code-inline">Left</div>, <div class="code-inline">Right</div> to rotate the image.<br>
<div class="code-inline">CTRL+Up</div>, <div class="code-inline">CTRL+Down</div>, <div class="code-inline">CTRL+Left</div>, <div class="code-inline">CTRL+Right</div> to mirror the image.<br>
<div class="code-inline">Escape</div> to exit the image viewer.
<a href="#instances"><h2 id="instances">Instances</h2></a>
4get is open source, anyone can create their own 4get instance! If you wish to add your website to this list, please <a href="https://lolcat.ca/contact">contact me</a>.
<table>
<tr>
<td>Name</td>
<td>Address</td>
</tr>
<tr>
<td>4get</td>
<td><a href="https://4get.ca">4get.ca</a><a href="http://4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion/">(tor)</a></td>
</tr>
</table>
<a href="#schizo"><h2 id="schizo">How can I trust you?</h2></a>
You just sort of have to take my word for it right now. If you\'d rather trust yourself instead of me (I believe in you!!), all of the code on this website is available trough my <a href="https://git.lolcat.ca/lolcat" class="link">git page</a> for you to host on your own machines. Just a reminder: if you\'re the sole user of your instance, it doesn\'t take immense brain power for Microshit to figure out you basically just switched IP addresses. Invite your friends to use your instance!
<a href="#contact"><h2 id="contact">I want to report abuse or have erotic roleplay trough email</h2></a>
I don\'t know about that second part but if you want to talk to me, just drop me an email...<br><br>
<b>Message to all DMCA enforcers:</b> I don\'t host any of the content. Everything you see here is <u>proxied</u> trough my shitbox with no moderation. Please reach out to the people hosting the infringing content instead.<br><br>
<a href="https://lolcat.ca/contact" rel="dofollow" class="link">Click here to contact me!</a><br><br>
<a href="https://validator.w3.org/nu/?doc=https%3A%2F%2F4get.ca" title="W3 Valid!">
<img src="/static/icon/w3html.png" alt="Valid W3C HTML 4.01" width="88" height="31">
</a>';
// trim out whitespace
$left = explode("\n", $left);
$out = "";
foreach($left as $line){
$out .= trim($line);
}
echo
$frontend->load(
"search.html",
[
"class" => "",
"right-left" => "",
"right-right" => "",
"left" => $out
]
);

289
api.txt Normal file
View File

@ -0,0 +1,289 @@
__ __ __
/ // / ____ ____ / /_
/ // /_/ __ `/ _ \/ __/
/__ __/ /_/ / __/ /_
/_/ \__, /\___/\__/
/____/
+ Welcome to the 4get API documentation +
+ Terms of use
Do NOT misuse the API. Misuses can include... ::
1. Serp SEO scanning
2. Intensive scraping
3. Any other activity that isn't triggered by a human
4. Illegal activities in Canada
5. Constant "test" queries while developping your program
(please cache the API responses!)
Examples of good uses of the API ::
1. A chatroom bot that presents users with search results
2. Personal use
3. Any other activity that is initiated by a human
If you wish to engage in the activities listed under "misuses", feel
free to download the source code of the project and running 4get
under your own terms. Please respect the terms of use listed here so
that this website may be available to all in the far future.
Get your instance running here ::
https://git.lolcat.ca/lolcat/4get
Thanks!
+ Decode the data
All payloads returned by the API are encoded in the JSON format. If
you don't know how to tackle the problem, maybe programming is not
for you.
All of the endpoints use the GET method.
+ Check if an API call was successful
All API responses come with an array index named "status". If the
status is something else than the string "ok", something went wrong.
The HTTP code will always be 200 as to not cause issues with CORS.
+ Get the next page of results
All API responses come with an array index named "nextpage". To get
the next page of results, you must make another API call with &npt.
Example ::
+ First API call
/api/v1/web?s=higurashi
+ Second API call
/api/v1/web?npt=ddg1._rJ2hWmYSjpI2hsXWmYajJx < ... >
You shouldn't specify the search term, only the &npt parameter
suffices.
The first part of the token before the dot (ddg1) refers to an
array position on the serber's memory. The second part is an
encryption key used to decode the data at that position. This way,
it is impossible to supply invalid pagination data and it is
impossible for a 4get operator to peek at the private data of the
user after a request has been made.
The tokens will expire as soon as they are used or after a 7 minutes
inactivity period, whichever comes first.
+ Beware of null values!
Most fields in the API responses can return "null". You don't need
to worry about unset values.
+ API Parameters
To construct a valid request, you can use the 4get web interface
to craft a valid request, and replace "/web" with "/api/v1/web".
+ "date" and "time" parameters
"date" always refer to a calendar date.
"time" always refer to the duration of some media.
They are both integers that uses seconds as its unit. The "date"
parameter specifies the number of seconds that passed since January
1st 1970.
______ __ _ __
/ ____/___ ____/ /___ ____ (_)___ / /______
/ __/ / __ \/ __ / __ \/ __ \/ / __ \/ __/ ___/
/ /___/ / / / /_/ / /_/ / /_/ / / / / / /_(__ )
/_____/_/ /_/\__,_/ .___/\____/_/_/ /_/\__/____/
/_/
+ /api/v1/web
+ &extendedsearch
When using the ddg(DuckDuckGo) scraper, you may make use of the
&extendedsearch parameter. If you need rich answer data from
additional sources like StackOverflow, music lyrics sites, etc.,
you need to specify the value of (string)"true".
The default value is "false" for API calls.
+ Parse the "spelling"
The array index named "spelling" contains 3 indexes ::
spelling:
type: "including"
using: "4chan"
correction: '"4cha"'
The "type" may be any of these 3 values. When rendering the
autocorrect text inside your application, it should look like
what follows right after the parameter value ::
no_correction <Empty>
including Including results for %using%. Did you mean
%correction%?
not_many Not many results for %using%. Did you mean
%correction%?
As of right now, the "spelling" is only available on
"/api/v1/web".
+ Parse the "answer"
The array index named "answer" may contain a list of multiple
answers. The array index "description" contains a linear list of
nodes that can help you construct rich formatted data inside of
your application. The structure is similar to the one below:
answer:
0:
title: "Higurashi"
description:
0:
type: "text"
value: "Higurashi is a great show!"
1:
type: "quote"
value: "Source: my ass"
Each "description" node contains an array index named "type".
Here is a list of them:
text
+ title
italic
+ quote
+ code
inline_code
link
+ image
+ audio
Each individual node prepended with a "+" should be prepended by
a newline when constructing the rendered description object.
There are some nodes that differ from the type-value format.
Please parse them accordingly ::
+ link
type: "link"
url: "https://lolcat.ca"
value: "Visit my website!"
+ image
type: "image"
url: "https://lolcat.ca/static/pixels.png"
+ audio
type: "audio"
url: "https://lolcat.ca/static/whatever.mp3"
The array index named "table" is an associative array. You can
loop over the data using this PHP code, for example ::
foreach($table as $website_name => $url){ // ...
The rest of the JSON is pretty self explanatory.
+ /api/v1/images
All images are contained within "image". The structure looks like
below ::
image:
0:
title: "My awesome Higurashi image"
source:
0:
url: "https://lolcat.ca/static/profile_pix.png"
width: 400
height: 400
1:
url: "https://lolcat.ca/static/pixels.png"
width: 640
height: 640
2:
url: "https://tse1.mm.bing.net/th?id=OIP.VBM3BQg
euf0-xScO1bl1UgHaGG"
width: 194
height: 160
The last image of the "source" array is always the thumbnail, and is
a good fallback to use when other sources fail to load. There can be
more than 1 source; this is especially true when using the Yandex
scraper, but beware of captcha rate limits.
+ /api/v1/videos
The "time" parameter for videos may be set to "_LIVE". For live
streams, the amount of people currently watching is passed in
"views".
+ /api/v1/news
Just make a request to "/api/v1/news?s=elon+musk". The payload
has nothing special about it and is very self explanatory, just like
the endpoint above.
+ /favicon
Get the favicon for a website. The only parameter is "s", and must
include the protocol.
Example ::
/favicon?s=https://lolcat.ca
If we had to revert to using Google's favicon cache, it will throw
an error in the X-Error header field. If Google's favicon cache
also failed to return an image, or if you're too retarded to specify
a valid domain name, a default placeholder image will be returned
alongside the "404" HTTP error code.
+ /proxy
Get a proxied image. Useful if you don't want to leak your user's IP
address. The parameters are "i" for the image link and "s" for the
size.
Acceptable "s" parameters:
portrait 90x160
landscape 160x90
square 90x90
thumb 236x180
cover 207x270
original <Original resolution>
You can also ommit the "s" parameter if you wish to view the
original image. When an error occurs, an "X-Error" header field
is set.
+ /audio
Get a proxied audio file. Does not support "Range" headers, as it's
only used to proxy small files.
The parameter is "s" for the audio link.
+ Appendix
If you have any questions or need clarifications, please send an
email my way to will at lolcat.ca

10
api/index.php Normal file
View File

@ -0,0 +1,10 @@
<?php
header("Content-Type: application/json");
http_response_code(404);
echo json_encode(
[
"status" => "Unknown endpoint"
]
);

25
api/v1/images.php Normal file
View File

@ -0,0 +1,25 @@
<?php
header("Content-Type: application/json");
chdir("../../");
include "lib/frontend.php";
$frontend = new frontend();
[$scraper, $filters] = $frontend->getscraperfilters(
"images",
isset($_GET["scraper"]) ? $_GET["scraper"] : null
);
$get = $frontend->parsegetfilters($_GET, $filters);
try{
echo json_encode(
$scraper->image($get)
);
}catch(Exception $e){
echo json_encode(["status" => $e->getMessage()]);
}

10
api/v1/index.php Normal file
View File

@ -0,0 +1,10 @@
<?php
header("Content-Type: application/json");
http_response_code(404);
echo json_encode(
[
"status" => "Unknown endpoint"
]
);

25
api/v1/news.php Normal file
View File

@ -0,0 +1,25 @@
<?php
header("Content-Type: application/json");
chdir("../../");
include "lib/frontend.php";
$frontend = new frontend();
[$scraper, $filters] = $frontend->getscraperfilters(
"news",
isset($_GET["scraper"]) ? $_GET["scraper"] : null
);
$get = $frontend->parsegetfilters($_GET, $filters);
try{
echo json_encode(
$scraper->news($get)
);
}catch(Exception $e){
echo json_encode(["status" => $e->getMessage()]);
}

25
api/v1/videos.php Normal file
View File

@ -0,0 +1,25 @@
<?php
header("Content-Type: application/json");
chdir("../../");
include "lib/frontend.php";
$frontend = new frontend();
[$scraper, $filters] = $frontend->getscraperfilters(
"videos",
isset($_GET["scraper"]) ? $_GET["scraper"] : null
);
$get = $frontend->parsegetfilters($_GET, $filters);
try{
echo json_encode(
$scraper->video($get)
);
}catch(Exception $e){
echo json_encode(["status" => $e->getMessage()]);
}

30
api/v1/web.php Normal file
View File

@ -0,0 +1,30 @@
<?php
header("Content-Type: application/json");
chdir("../../");
include "lib/frontend.php";
$frontend = new frontend();
[$scraper, $filters] = $frontend->getscraperfilters(
"web",
isset($_GET["scraper"]) ? $_GET["scraper"] : null
);
$get = $frontend->parsegetfilters($_GET, $filters);
if(!isset($_GET["extendedsearch"])){
$get["extendedsearch"] = "no";
}
try{
echo json_encode(
$scraper->web($get)
);
}catch(Exception $e){
echo json_encode(["status" => $e->getMessage()]);
}

19
audio.php Normal file
View File

@ -0,0 +1,19 @@
<?php
if(!isset($_GET["s"])){
http_response_code(404);
header("X-Error: No SOUND(s) provided!");
die();
}
include "lib/curlproxy.php";
$proxy = new proxy();
try{
$proxy->stream_linear_audio($_GET["s"]);
}catch(Exception $error){
header("X-Error: " . $error->getMessage());
}

BIN
banner/aves.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

BIN
banner/aves_2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

BIN
banner/bibblebop.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.1 KiB

BIN
banner/deek.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 KiB

BIN
banner/deekchat.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

BIN
banner/eagle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

BIN
banner/eagle2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.1 KiB

BIN
banner/eagle3.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

BIN
banner/eddd_1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

BIN
banner/eddd_2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

BIN
banner/eddd_3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

BIN
banner/gnuwu.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

BIN
banner/gnuwu_2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

BIN
banner/horse.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

BIN
banner/linucks.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

BIN
banner/real_nig_3.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

BIN
banner/sec.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

BIN
banner/tagmachine.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
favicon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 393 B

362
favicon.php Normal file
View File

@ -0,0 +1,362 @@
<?php
if(!isset($_GET["s"])){
header("X-Error: Missing parameter (s)ite");
die();
}
new favicon($_GET["s"]);
class favicon{
public function __construct($url){
header("Content-Type: image/png");
if(substr_count($url, "/") !== 2){
header("X-Error: Only provide the protocol and domain");
$this->defaulticon();
}
$filename = str_replace(["https://", "http://"], "", $url);
header("Content-Disposition: inline; filename=\"{$filename}.png\"");
include "lib/curlproxy.php";
$this->proxy = new proxy(false);
$this->filename = parse_url($url, PHP_URL_HOST);
/*
Check if we have the favicon stored locally
*/
if(file_exists("icons/" . $filename . ".png")){
$handle = fopen("icons/" . $filename . ".png", "r");
echo fread($handle, filesize("icons/" . $filename . ".png"));
fclose($handle);
return;
}
/*
Scrape html
*/
try{
$payload = $this->proxy->get($url, $this->proxy::req_web, true);
}catch(Exception $error){
header("X-Error: Could not fetch HTML (" . $error->getMessage() . ")");
$this->favicon404();
}
//$payload["body"] = '<link rel="manifest" id="MANIFEST_LINK" href="/data/manifest/" crossorigin="use-credentials" />';
// get link tags
preg_match_all(
'/< *link +(.*)[\/]?>/Uixs',
$payload["body"],
$linktags
);
/*
Get relevant tags
*/
$linktags = $linktags[1];
$attributes = [];
/*
header("Content-Type: text/plain");
print_r($linktags);
print_r($payload);
die();*/
for($i=0; $i<count($linktags); $i++){
// get attributes
preg_match_all(
'/([A-Za-z0-9]+) *= *("[^"]*"|[^" ]+)/s',
$linktags[$i],
$tags
);
for($k=0; $k<count($tags[1]); $k++){
$attributes[$i][] = [
"name" => $tags[1][$k],
"value" => trim($tags[2][$k], "\" \n\r\t\v\x00")
];
}
}
unset($payload);
unset($linktags);
$href = [];
// filter out the tags we want
foreach($attributes as &$group){
$tmp_href = null;
$tmp_rel = null;
$badtype = false;
foreach($group as &$attribute){
switch($attribute["name"]){
case "rel":
$attribute["value"] = strtolower($attribute["value"]);
if(
(
$attribute["value"] == "icon" ||
$attribute["value"] == "manifest" ||
$attribute["value"] == "shortcut icon" ||
$attribute["value"] == "apple-touch-icon" ||
$attribute["value"] == "mask-icon"
) === false
){
break;
}
$tmp_rel = $attribute["value"];
break;
case "type":
$attribute["value"] = explode("/", $attribute["value"], 2);
if(strtolower($attribute["value"][0]) != "image"){
$badtype = true;
break;
}
break;
case "href":
// must not contain invalid characters
// must be bigger than 1
if(
filter_var($attribute["value"], FILTER_SANITIZE_URL) == $attribute["value"] &&
strlen($attribute["value"]) > 0
){
$tmp_href = $attribute["value"];
break;
}
break;
}
}
if(
$badtype === false &&
$tmp_rel !== null &&
$tmp_href !== null
){
$href[$tmp_rel] = $tmp_href;
}
}
/*
Priority list
*/
/*
header("Content-Type: text/plain");
print_r($href);
die();*/
if(isset($href["icon"])){ $href = $href["icon"]; }
elseif(isset($href["apple-touch-icon"])){ $href = $href["apple-touch-icon"]; }
elseif(isset($href["manifest"])){
// attempt to parse manifest, but fallback to []
$href = $this->parsemanifest($href["manifest"], $url);
}
if(is_array($href)){
if(isset($href["mask-icon"])){ $href = $href["mask-icon"]; }
elseif(isset($href["shortcut icon"])){ $href = $href["shortcut icon"]; }
else{
$href = "/favicon.ico";
}
}
$href = $this->proxy->getabsoluteurl($href, $url);
/*
header("Content-type: text/plain");
echo $href;
die();*/
/*
Download the favicon
*/
//$href = "https://git.lolcat.ca/assets/img/logo.svg";
try{
$payload =
$this->proxy->get(
$href,
$this->proxy::req_image,
true,
$url
);
}catch(Exception $error){
header("X-Error: Could not fetch the favicon (" . $error->getMessage() . ")");
$this->favicon404();
}
/*
Parse the file format
*/
$image = null;
$format = $this->proxy->getimageformat($payload, $image);
/*
Convert the image
*/
try{
/*
@todo: fix issues with avif+transparency
maybe using GD as fallback?
*/
if($format !== false){
$image->setFormat($format);
}
$image->setBackgroundColor(new ImagickPixel("transparent"));
$image->readImageBlob($payload["body"]);
$image->resizeImage(16, 16, imagick::FILTER_LANCZOS, 1);
$image->setFormat("png");
$image = $image->getImageBlob();
// save favicon
$handle = fopen("icons/" . $this->filename . ".png", "w");
fwrite($handle, $image, strlen($image));
fclose($handle);
echo $image;
}catch(ImagickException $error){
header("X-Error: Could not convert the favicon: (" . $error->getMessage() . ")");
$this->favicon404();
}
return;
}
private function parsemanifest($href, $url){
if(
// check if base64-encoded JSON manifest
preg_match(
'/^data:application\/json;base64,([A-Za-z0-9=]*)$/',
$href,
$json
)
){
$json = base64_decode($json[1]);
if($json === false){
// could not decode the manifest regex
return [];
}
}else{
try{
$json =
$this->proxy->get(
$this->proxy->getabsoluteurl($href, $url),
$this->proxy::req_web,
false,
$url
);
$json = $json["body"];
}catch(Exception $error){
// could not fetch the manifest
return [];
}
}
$json = json_decode($json, true);
if($json === null){
// manifest did not return valid json
return [];
}
if(
isset($json["start_url"]) &&
$this->proxy->validateurl($json["start_url"])
){
$url = $json["start_url"];
}
if(!isset($json["icons"][0]["src"])){
// manifest does not contain a path to the favicon
return [];
}
// horay, return the favicon path
return $json["icons"][0]["src"];
}
private function favicon404(){
// fallback to google favicons
// ... probably blocked by cuckflare
try{
$image =
$this->proxy->get(
"https://t0.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=http://{$this->filename}&size=16",
$this->proxy::req_image
);
}catch(Exception $error){
$this->defaulticon();
}
// write favicon from google
$handle = fopen("icons/" . $this->filename . ".png", "w");
fwrite($handle, $image["body"], strlen($image["body"]));
fclose($handle);
echo $image["body"];
die();
}
private function defaulticon(){
// give 404 and fuck off
http_response_code(404);
$handle = fopen("lib/favicon404.png", "r");
echo fread($handle, filesize("lib/favicon404.png"));
fclose($handle);
die();
}
}

BIN
icons/lolcat.ca.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 KiB

99
images.php Normal file
View File

@ -0,0 +1,99 @@
<?php
/*
Initialize random shit
*/
include "lib/frontend.php";
$frontend = new frontend();
[$scraper, $filters] = $frontend->getscraperfilters("images");
$get = $frontend->parsegetfilters($_GET, $filters);
$frontend->loadheader(
$get,
$filters,
"images"
);
$payload = [
"images" => "",
"nextpage" => ""
];
try{
$results = $scraper->image($get);
}catch(Exception $error){
echo
$frontend->drawerror(
"Shit",
'This scraper returned an error:' .
'<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' .
'Things you can try:' .
'<ul>' .
'<li>Use a different scraper</li>' .
'<li>Remove keywords that could cause errors</li>' .
'<li>Use another 4get instance</li>' .
'</ul><br>' .
'If the error persists, please <a href="/about">contact the administrator</a>.'
);
die();
}
if(count($results["image"]) === 0){
$payload["images"] =
'<div class="infobox">' .
"<h1>Nobody here but us chickens!</h1>" .
'Have you tried:' .
'<ul>' .
'<li>Using a different scraper</li>' .
'<li>Using fewer keywords</li>' .
'<li>Defining broader filters (Is NSFW turned off?)</li>' .
'</ul>' .
'</div>';
}
foreach($results["image"] as $image){
$domain = htmlspecialchars(parse_url($image["url"], PHP_URL_HOST));
$c = count($image["source"]) - 1;
if(
preg_match(
'/^data:/',
$image["source"][$c]["url"]
)
){
$src = htmlspecialchars($image["source"][$c]["url"]);
}else{
$src = "/proxy?i=" . urlencode($image["source"][$c]["url"]) . "&s=thumb";
}
$payload["images"] .=
'<div class="image-wrapper" title="' . htmlspecialchars($image["title"]) .'" data-json="' . htmlspecialchars(json_encode($image["source"])) . '">' .
'<div class="image">' .
'<a href="' . htmlspecialchars($image["source"][0]["url"]) . '" rel="noreferrer nofollow" class="thumb">' .
'<img src="' . $src . '" alt="thumbnail">' .
'<div class="duration">' . $image["source"][0]["width"] . 'x' . $image["source"][0]["height"] . '</div>' .
'</a>' .
'<a href="' . htmlspecialchars($image["url"]) . '" rel="noreferrer nofollow">' .
'<div class="title">' . htmlspecialchars($domain) . '</div>' .
'<div class="description">' . $frontend->highlighttext($get["s"], $image["title"]) . '</div>' .
'</a>' .
'</div>' .
'</div>';
}
if($results["npt"] !== null){
$payload["nextpage"] =
'<a href="' . $frontend->htmlnextpage($get, $results["npt"], "images") . '" class="nextpage img">Next page &gt;</a>';
}
echo $frontend->load("images.html", $payload);

14
index.php Normal file
View File

@ -0,0 +1,14 @@
<?php
include "lib/frontend.php";
$frontend = new frontend();
$images = glob("banner/*");
echo $frontend->load(
"home.html",
[
"body_class" => $frontend->getthemeclass(false),
"banner" => $images[rand(0, count($images) - 1)]
]
);

144
lib/bingcache-todo-fix.php Normal file
View File

@ -0,0 +1,144 @@
<?php
// https://www.bing.com/search?q=url%3Ahttps%3A%2F%2Flolcat.ca
// https://cc.bingj.com/cache.aspx?q=url%3ahttps%3a%2f%2flolcat.ca&d=4769685974291356&mkt=en-CA&setlang=en-US&w=tEsWuE7HW3Z5AIPQMVkDH4WaotS4LrK-
// <div class="b_attribution" u="0N|5119|4769685974291356|tEsWuE7HW3Z5AIPQMVkDH4WaotS4LrK-" tabindex="0">
new bingcache();
class bingcache{
public function __construct(){
if(
!isset($_GET["s"]) ||
$this->validate_url($_GET["s"]) === false
){
var_dump($this->validate_url($_GET["s"]));
$this->do404("Please provide a valid URL.");
}
$url = $_GET["s"];
$curlproc = curl_init();
curl_setopt(
$curlproc,
CURLOPT_URL,
"https://www.bing.com/search?q=url%3A" .
urlencode($url)
);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt(
$curlproc,
CURLOPT_HTTPHEADER,
["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 5);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
$this->do404("Failed to connect to bing servers. Please try again later.");
}
curl_close($curlproc);
preg_match(
'/<div class="b_attribution" u="(.*)" tabindex="0">/',
$data,
$keys
);
print_r($keys);
if(count($keys) === 0){
$this->do404("Bing has not archived this URL.");
}
$keys = explode("|", $keys[1]);
$count = count($keys);
//header("Location: https://cc.bingj.com/cache.aspx?d=" . $keys[$count - 2] . "&w=" . $keys[$count - 1]);
echo("Location: https://cc.bingj.com/cache.aspx?d=" . $keys[$count - 2] . "&w=" . $keys[$count - 1]);
}
public function do404($text){
include "lib/frontend.php";
$frontend = new frontend();
echo
$frontend->load(
"error.html",
[
"title" => "Shit",
"text" => $text
]
);
die();
}
public function validate_url($url){
$url_parts = parse_url($url);
// check if required parts are there
if(
!isset($url_parts["scheme"]) ||
!(
$url_parts["scheme"] == "http" ||
$url_parts["scheme"] == "https"
) ||
!isset($url_parts["host"])
){
return false;
}
if(
// if its not an RFC-valid URL
!filter_var($url, FILTER_VALIDATE_URL)
){
return false;
}
$ip =
str_replace(
["[", "]"], // handle ipv6
"",
$url_parts["host"]
);
// if its not an IP
if(!filter_var($ip, FILTER_VALIDATE_IP)){
// resolve domain's IP
$ip = gethostbyname($url_parts["host"] . ".");
}
// check if its localhost
return filter_var(
$ip,
FILTER_VALIDATE_IP, FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE
);
}
}

BIN
lib/classic.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.4 KiB

652
lib/curlproxy.php Normal file
View File

@ -0,0 +1,652 @@
<?php
class proxy{
public const req_web = 0;
public const req_image = 1;
public function __construct($cache = true){
$this->cache = $cache;
}
public function do404(){
http_response_code(404);
header("Content-Type: image/png");
$handle = fopen("lib/img404.png", "r");
echo fread($handle, filesize("lib/img404.png"));
fclose($handle);
die();
return;
}
public function getabsoluteurl($path, $relative){
if($this->validateurl($path)){
return $path;
}
if(substr($path, 0, 2) == "//"){
return "https:" . $path;
}
$url = null;
$relative = parse_url($relative);
$url = $relative["scheme"] . "://";
if(
isset($relative["user"]) &&
isset($relative["pass"])
){
$url .= $relative["user"] . ":" . $relative["pass"] . "@";
}
$url .= $relative["host"];
if(isset($relative["path"])){
$relative["path"] = explode(
"/",
$relative["path"]
);
unset($relative["path"][count($relative["path"]) - 1]);
$relative["path"] = implode("/", $relative["path"]);
$url .= $relative["path"];
}
if(
strlen($path) !== 0 &&
$path[0] !== "/"
){
$url .= "/";
}
$url .= $path;
return $url;
}
public function validateurl($url){
$url_parts = parse_url($url);
// check if required parts are there
if(
!isset($url_parts["scheme"]) ||
!(
$url_parts["scheme"] == "http" ||
$url_parts["scheme"] == "https"
) ||
!isset($url_parts["host"])
){
return false;
}
$ip =
str_replace(
["[", "]"], // handle ipv6
"",
$url_parts["host"]
);
// if its not an IP
if(!filter_var($ip, FILTER_VALIDATE_IP)){
// resolve domain's IP
$ip = gethostbyname($url_parts["host"] . ".");
}
// check if its localhost
if(
filter_var(
$ip,
FILTER_VALIDATE_IP, FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE
) === false
){
return false;
}
return true;
}
public function get($url, $reqtype = self::req_web, $acceptallcodes = false, $referer = null, $redirectcount = 0){
if($redirectcount === 5){
throw new Exception("Too many redirects");
}
// sanitize URL
try{
$this->validateurl($url);
}catch(Exception $error){
throw new Exception($error->getMessage());
}
$this->clientcache();
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curl, CURLOPT_HEADER, 1);
switch($reqtype){
case self::req_web:
curl_setopt(
$curl,
CURLOPT_HTTPHEADER,
[
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate",
"DNT: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"
]
);
break;
case self::req_image:
if($referer === null){
$referer = explode("/", $url, 4);
array_pop($referer);
$referer = implode("/", $referer);
}
curl_setopt(
$curl,
CURLOPT_HTTPHEADER,
[
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0",
"Accept: image/avif,image/webp,*/*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate",
"DNT: 1",
"Connection: keep-alive",
"Referer: {$referer}"
]
);
break;
}
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl, CURLOPT_TIMEOUT, 30);
// limit size of payloads
curl_setopt($curl, CURLOPT_BUFFERSIZE, 1024);
curl_setopt($curl, CURLOPT_NOPROGRESS, false);
curl_setopt(
$curl,
CURLOPT_PROGRESSFUNCTION,
function($downloadsize, $downloaded, $uploadsize, $uploaded
){
// if $downloaded exceeds 100MB, fuck off
return ($downloaded > 100000000) ? 1 : 0;
});
$body = curl_exec($curl);
if(curl_errno($curl)){
throw new Exception(curl_error($curl));
}
curl_close($curl);
$headers = [];
$http = null;
while(true){
$header = explode("\n", $body, 2);
$body = $header[1];
if($http === null){
// http/1.1 200 ok
$header = explode("/", $header[0], 2);
$header = explode(" ", $header[1], 3);
$http = [
"version" => (float)$header[0],
"code" => (int)$header[1]
];
continue;
}
if(trim($header[0]) == ""){
// reached end of headers
break;
}
$header = explode(":", $header[0], 2);
// malformed headers
if(count($header) !== 2){ continue; }
$headers[strtolower(trim($header[0]))] = trim($header[1]);
}
// check http code
if(
$http["code"] >= 300 &&
$http["code"] <= 309
){
// redirect
if(!isset($headers["location"])){
throw new Exception("Broken redirect");
}
$redirectcount++;
return $this->get($this->getabsoluteurl($headers["location"], $url), $reqtype, $acceptallcodes, $referer, $redirectcount);
}else{
if(
$acceptallcodes === false &&
$http["code"] > 300
){
throw new Exception("Remote server returned an error code! ({$http["code"]})");
}
}
// check if data is okay
switch($reqtype){
case self::req_image:
$format = false;
if(isset($headers["content-type"])){
if($headers["content-type"] == "text/html"){
throw new Exception("Server returned an html document instead of image");
}
$tmp = explode(";", $headers["content-type"]);
for($i=0; $i<count($tmp); $i++){
if(
preg_match(
'/^image\/([^ ]+)/i',
$tmp[$i],
$match
)
){
$format = strtolower($match[1]);
if(substr($format, 0, 2) == "x-"){
$format = substr($format, 2);
}
break;
}
}
}
return [
"http" => $http,
"format" => $format,
"headers" => $headers,
"body" => $body
];
break;
default:
return [
"http" => $http,
"headers" => $headers,
"body" => $body
];
break;
}
return;
}
public function stream_linear_image($url, $referer = null){
$this->stream($url, $referer, "image");
}
public function stream_linear_audio($url, $referer = null){
$this->stream($url, $referer, "audio");
}
private function stream($url, $referer, $format){
$this->url = $url;
$this->format = $format;
// sanitize URL
try{
$this->validateurl($url);
}catch(Exception $error){
throw new Exception($error->getMessage());
}
$this->clientcache();
$curl = curl_init();
// set headers
if($referer === null){
$referer = explode("/", $url, 4);
array_pop($referer);
$referer = implode("/", $referer);
}
switch($format){
case "image":
curl_setopt(
$curl,
CURLOPT_HTTPHEADER,
[
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0",
"Accept: image/avif,image/webp,*/*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br",
"DNT: 1",
"Connection: keep-alive",
"Referer: {$referer}"
]
);
break;
case "audio":
curl_setopt(
$curl,
CURLOPT_HTTPHEADER,
[
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0",
"Accept: audio/webm,audio/ogg,audio/wav,audio/*;q=0.9,application/ogg;q=0.7,video/*;q=0.6,*/*;q=0.5",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br",
"DNT: 1",
"Connection: keep-alive",
"Referer: {$referer}"
]
);
break;
}
// follow redirects
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_MAXREDIRS, 5);
curl_setopt($curl, CURLOPT_AUTOREFERER, 5);
// set url
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_ENCODING, ""); // default encoding
// timeout + disable ssl
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_TIMEOUT, 30);
curl_setopt(
$curl,
CURLOPT_WRITEFUNCTION,
function($c, $data){
if(curl_getinfo($c, CURLINFO_HTTP_CODE) !== 200){
throw new Exception("Serber returned a non-200 code");
}
echo $data;
return strlen($data);
}
);
$this->empty_header = false;
$this->cont = false;
$this->headers_tmp = [];
$this->headers = [];
curl_setopt(
$curl,
CURLOPT_HEADERFUNCTION,
function($c, $header){
$head = trim($header);
$len = strlen($head);
if($len === 0){
$this->empty_header = true;
$this->headers_tmp = [];
}else{
$this->empty_header = false;
$this->headers_tmp[] = $head;
}
foreach($this->headers_tmp as $h){
// parse headers
$h = explode(":", $h, 2);
if(count($h) !== 2){
if(curl_getinfo($c, CURLINFO_HTTP_CODE) !== 200){
// not HTTP 200, probably a redirect
$this->cont = false;
}else{
$this->cont = true;
}
// is HTTP 200, just ignore that line
continue;
}
$this->headers[strtolower(trim($h[0]))] = trim($h[1]);
}
if(
$this->cont &&
$this->empty_header
){
// get content type
if(isset($this->headers["content-type"])){
$filetype = explode("/", $this->headers["content-type"]);
if(strtolower($filetype[0]) != $this->format){
throw new Exception("Resource is not an {$this->format} (Found {$filetype[0]} instead)");
}
}else{
throw new Exception("Resource is not an {$this->format} (no Content-Type)");
}
header("Content-Type: {$this->format}/{$filetype[1]}");
// give payload size
if(isset($this->headers["content-length"])){
header("Content-Length: {$this->headers["content-length"]}");
}
// give filename
$this->getfilenameheader($this->headers, $this->url, $filetype[1]);
}
return strlen($header);
}
);
curl_exec($curl);
if(curl_errno($curl)){
throw new Exception(curl_error($curl));
}
curl_close($curl);
}
public function getfilenameheader($headers, $url, $filetype = "jpg"){
// get filename from content-disposition header
if(isset($headers["content-disposition"])){
preg_match(
'/filename=([^;]+)/',
$headers["content-disposition"],
$filename
);
if(isset($filename[1])){
header("Content-Disposition: filename=" . $filename[1] . "." . $filetype);
return;
}
}
// get filename from URL
$filename = parse_url($url, PHP_URL_PATH);
if($filename === null){
// everything failed! rename file to domain name
header("Content-Disposition: filename=" . parse_url($url, PHP_URL_HOST) . "." . $filetype);
return;
}
// remove extension from filename
$filename =
explode(
".",
basename($filename)
);
if(count($filename) > 1){
array_pop($filename);
}
$filename = implode(".", $filename);
header("Content-Disposition: inline; filename=" . $filename . "." . $filetype);
return;
}
public function getimageformat($payload, &$imagick){
$finfo = new finfo(FILEINFO_MIME_TYPE);
$format = $finfo->buffer($payload["body"]);
if($format === false){
if($payload["format"] === false){
header("X-Error: Could not parse format");
$this->favicon404();
}
$format = $payload["format"];
}else{
$format_tmp = explode("/", $format, 2);
if($format_tmp[0] == "image"){
$format_tmp = strtolower($format_tmp[1]);
if(substr($format_tmp, 0, 2) == "x-"){
$format_tmp = substr($format_tmp, 2);
}
$format = $format_tmp;
}
}
switch($format){
case "tiff": $format = "gif"; break;
case "vnd.microsoft.icon": $format = "ico"; break;
case "icon": $format = "ico"; break;
case "svg+xml": $format = "svg"; break;
}
$imagick = new Imagick();
if(
!in_array(
$format,
array_map("strtolower", $imagick->queryFormats())
)
){
// format could not be found, but imagemagick can
// sometimes detect it? shit's fucked
$format = false;
}
return $format;
}
public function clientcache(){
if($this->cache === false){
return;
}
header("Last-Modified: Thu, 01 Oct 1970 00:00:00 GMT");
$headers = getallheaders();
if(
isset($headers["If-Modified-Since"]) ||
isset($headers["If-Unmodified-Since"])
){
http_response_code(304); // 304: Not Modified
die();
}
}
}

BIN
lib/favicon404.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 807 B

1282
lib/frontend.php Normal file

File diff suppressed because it is too large Load Diff

361
lib/fuckhtml.php Normal file
View File

@ -0,0 +1,361 @@
<?php
class fuckhtml{
public function __construct($html = null, $isfile = false){
if($html !== null){
$this->load($html, $isfile);
}
}
public function load($html, $isfile = false){
if(is_array($html)){
if(!isset($html["innerHTML"])){
throw new Exception("(load) Supplied array doesn't contain a innerHTML index");
}
$html = $html["innerHTML"];
}
if($isfile){
$handle = fopen($html, "r");
$fetch = fread($handle, filesize($html));
fclose($handle);
$this->html = $fetch;
}else{
$this->html = $html;
}
$this->strlen = strlen($this->html);
}
public function getElementsByTagName(string $tagname){
$out = [];
/*
Scrape start of the tag. Example
<div class="mydiv"> ...
*/
if($tagname == "*"){
$tagname = '[^\/<>\s]+';
}else{
$tagname = preg_quote(strtolower($tagname));
}
preg_match_all(
'/<\s*(' . $tagname . ')(\s(?:[^>\'"]*|"[^"]*"|\'[^\']*\')+)?\s*>/i',
/* '/<\s*(' . $tagname . ')(\s[\S\s]*?)?>/i', */
$this->html,
$starting_tags,
PREG_OFFSET_CAPTURE
);
for($i=0; $i<count($starting_tags[0]); $i++){
/*
Parse attributes
*/
$attributes = [];
preg_match_all(
'/([^\/\s\\=]+)(?:\s*=\s*("[^"]*"|\'[^\']*\'|[^\s]*))?/',
$starting_tags[2][$i][0],
$regex_attributes
);
for($k=0; $k<count($regex_attributes[0]); $k++){
if(trim($regex_attributes[2][$k]) == ""){
$attributes[$regex_attributes[1][$k]] =
"true";
continue;
}
$attributes[$regex_attributes[1][$k]] =
trim($regex_attributes[2][$k], "'\" \n\r\t\v\x00");
}
$out[] = [
"tagName" => strtolower($starting_tags[1][$i][0]),
"startPos" => $starting_tags[0][$i][1],
"endPos" => 0,
"startTag" => $starting_tags[0][$i][0],
"attributes" => $attributes,
"innerHTML" => null
];
}
/*
Get innerHTML
*/
// get closing tag positions
preg_match_all(
'/<\s*\/\s*(' . $tagname . ')\s*>/i',
$this->html,
$regex_closing_tags,
PREG_OFFSET_CAPTURE
);
// merge opening and closing tags together
for($i=0; $i<count($regex_closing_tags[1]); $i++){
$out[] = [
"tagName" => strtolower($regex_closing_tags[1][$i][0]),
"endTag" => $regex_closing_tags[0][$i][0],
"startPos" => $regex_closing_tags[0][$i][1]
];
}
usort(
$out,
function($a, $b){
return $a["startPos"] > $b["startPos"];
}
);
// computer the indent level for each element
$level = [];
$count = count($out);
for($i=0; $i<$count; $i++){
if(!isset($level[$out[$i]["tagName"]])){
$level[$out[$i]["tagName"]] = 0;
}
if(isset($out[$i]["startTag"])){
// encountered starting tag
$level[$out[$i]["tagName"]]++;
$out[$i]["level"] = $level[$out[$i]["tagName"]];
}else{
// encountered closing tag
$out[$i]["level"] = $level[$out[$i]["tagName"]];
$level[$out[$i]["tagName"]]--;
}
}
// if the indent level is the same for a div,
// we encountered _THE_ closing tag
for($i=0; $i<$count; $i++){
if(!isset($out[$i]["startTag"])){
continue;
}
for($k=$i; $k<$count; $k++){
if(
isset($out[$k]["endTag"]) &&
$out[$i]["tagName"] == $out[$k]["tagName"] &&
$out[$i]["level"]
=== $out[$k]["level"]
){
$startlen = strlen($out[$i]["startTag"]);
$endlen = strlen($out[$k]["endTag"]);
$out[$i]["endPos"] = $out[$k]["startPos"] + $endlen;
$out[$i]["innerHTML"] =
substr(
$this->html,
$out[$i]["startPos"] + $startlen,
$out[$k]["startPos"] - ($out[$i]["startPos"] + $startlen)
);
$out[$i]["outerHTML"] =
substr(
$this->html,
$out[$i]["startPos"],
$out[$k]["startPos"] - $out[$i]["startPos"] + $endlen
);
break;
}
}
}
// filter out ending divs
for($i=0; $i<$count; $i++){
if(isset($out[$i]["endTag"])){
unset($out[$i]);
}
unset($out[$i]["startTag"]);
}
return array_values($out);
}
public function getElementsByAttributeName(string $name, $collection = null){
if($collection === null){
$collection = $this->getElementsByTagName("*");
}elseif(is_string($collection)){
$collection = $this->getElementsByTagName($collection);
}
$return = [];
foreach($collection as $elem){
foreach($elem["attributes"] as $attrib_name => $attrib_value){
if($attrib_name == $name){
$return[] = $elem;
continue 2;
}
}
}
return $return;
}
public function getElementsByFuzzyAttributeValue(string $name, string $value, $collection = null){
$elems = $this->getElementsByAttributeName($name, $collection);
$value = explode(" ", $value);
$return = [];
foreach($elems as $elem){
foreach($elem["attributes"] as $attrib_name => $attrib_value){
$attrib_value = explode(" ", $attrib_value);
$ac = count($attrib_value);
$nc = count($value);
$cr = 0;
for($i=0; $i<$nc; $i++){
for($k=0; $k<$ac; $k++){
if($value[$i] == $attrib_value[$k]){
$cr++;
}
}
}
if($cr === $nc){
$return[] = $elem;
continue 2;
}
}
}
return $return;
}
public function getElementsByAttributeValue(string $name, string $value, $collection = null){
$elems = $this->getElementsByAttributeName($name, $collection);
$return = [];
foreach($elems as $elem){
foreach($elem["attributes"] as $attrib_name => $attrib_value){
if($attrib_value == $value){
$return[] = $elem;
continue 2;
}
}
}
return $return;
}
public function getElementById(string $idname, $collection = null){
$id = $this->getElementsByAttributeValue("id", $idname, $collection);
if(count($id) !== 0){
return $id[0];
}
return false;
}
public function getElementsByClassName(string $classname, $collection = null){
return $this->getElementsByFuzzyAttributeValue("class", $classname, $collection);
}
public function getTextContent($html, $whitespace = false, $trim = true){
if(is_array($html)){
if(!isset($html["innerHTML"])){
throw new Exception("(getTextContent) Supplied array doesn't contain a innerHTML index");
}
$html = $html["innerHTML"];
}
$html =
preg_split('/\n|<\/?br>/i', $html);
$out = "";
for($i=0; $i<count($html); $i++){
$tmp =
html_entity_decode(
strip_tags(
$html[$i]
),
ENT_QUOTES | ENT_XML1, "UTF-8"
);
if($trim){
$tmp = trim($tmp);
}
$out .= $tmp;
if($whitespace === true){
$out .= "\n";
}else{
$out .= " ";
}
}
if($trim){
return trim($out);
}
return $out;
}
}
?>

BIN
lib/img404.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.4 KiB

106
lib/nextpage.php Normal file
View File

@ -0,0 +1,106 @@
<?php
class nextpage{
public function __construct($scraper){
$this->scraper = $scraper;
}
public function store($payload, $page){
$page = $page[0];
$password = random_bytes(256); // 2048 bit
$salt = random_bytes(16);
$key = hash_pbkdf2("sha512", $password, $salt, 20000, 32, true);
$iv =
random_bytes(
openssl_cipher_iv_length("aes-256-gcm")
);
$tag = "";
$out = openssl_encrypt($payload, "aes-256-gcm", $key, OPENSSL_RAW_DATA, $iv, $tag, "", 16);
$key = apcu_inc("key", 1);
apcu_store(
$page . "." .
$this->scraper .
(string)($key),
gzdeflate($salt.$iv.$out.$tag),
420 // cache information for 7 minutes blaze it
);
return
$this->scraper . $key . "." .
rtrim(strtr(base64_encode($password), '+/', '-_'), '=');
}
public function get($npt, $page){
$page = $page[0];
$explode = explode(".", $npt, 2);
if(count($explode) !== 2){
throw new Exception("Malformed nextPageToken!");
}
$apcu = $page . "." . $explode[0];
$key = $explode[1];
$payload = apcu_fetch($apcu);
if($payload === false){
throw new Exception("The nextPageToken is invalid or has expired!");
}
$key =
base64_decode(
str_pad(
strtr($key, '-_', '+/'),
strlen($key) % 4,
'=',
STR_PAD_RIGHT
)
);
$payload = gzinflate($payload);
$key =
hash_pbkdf2(
"sha512",
$key,
substr($payload, 0, 16), // salt
20000,
32,
true
);
$ivlen = openssl_cipher_iv_length("aes-256-gcm");
$payload =
openssl_decrypt(
substr(
$payload,
16 + $ivlen,
-16
),
"aes-256-gcm",
$key,
OPENSSL_RAW_DATA,
substr($payload, 16, $ivlen),
substr($payload, -16)
);
if($payload === false){
throw new Exception("The nextPageToken is invalid or has expired!");
}
// remove the key after using
apcu_delete($apcu);
return $payload;
}
}

132
lib/type-todo.php Normal file
View File

@ -0,0 +1,132 @@
public function type($get){
$search = $get["s"];
$bang = $get["bang"];
if(empty($search)){
if(!empty($bang)){
// !youtube
$conn = pg_connect("host=localhost dbname=4get user=postgres password=postgres");
pg_prepare($conn, "bang_get", "SELECT bang,name FROM bangs WHERE bang LIKE $1 ORDER BY bang ASC LIMIT 8");
$q = pg_execute($conn, "bang_get", ["$bang%"]);
$results = [];
while($row = pg_fetch_array($q, null, PGSQL_ASSOC)){
$results[] = [
"s" => "!" . $row["bang"],
"n" => $row["name"]
];
}
return $results;
}else{
// everything is empty
// lets just return a bang list
return [
[
"s" => "!w",
"n" => "Wikipedia",
"u" => "https://en.wikipedia.org/wiki/Special:Search?search={%q%}"
],
[
"s" => "!4ch",
"n" => "4chan Board",
"u" => "https://find.4chan.org/?q={%q%}"
],
[
"s" => "!a",
"n" => "Amazon",
"u" => "https://www.amazon.com/s?k={%q%}"
],
[
"s" => "!e",
"n" => "eBay",
"u" => "https://www.ebay.com/sch/items/?_nkw={%q%}"
],
[
"s" => "!so",
"n" => "Stack Overflow",
"u" => "http://stackoverflow.com/search?q={%q%}"
],
[
"s" => "!gh",
"n" => "GitHub",
"u" => "https://github.com/search?utf8=%E2%9C%93&q={%q%}"
],
[
"s" => "!tw",
"n" => "Twitter",
"u" => "https://twitter.com/search?q={%q%}"
],
[
"s" => "!r",
"n" => "Reddit",
"u" => "https://www.reddit.com/search?q={%q%}"
],
];
}
}
// now we know search isnt empty
if(!empty($bang)){
// check if the bang exists
$conn = pg_connect("host=localhost dbname=4get user=postgres password=postgres");
pg_prepare($conn, "bang_get_single", "SELECT bang,name FROM bangs WHERE bang = $1 LIMIT 1");
$q = pg_execute($conn, "bang_get_single", [$bang]);
$row = pg_fetch_array($q, null, PGSQL_ASSOC);
if(isset($row["bang"])){
$bang = "!$bang ";
}else{
$bang = "";
}
}
try{
$res = $this->get(
"https://duckduckgo.com/ac/",
[
"q" => strtolower($search)
],
ddg::req_xhr
);
$res = json_decode($res, true);
}catch(Exception $e){
throw new Exception("Failed to get /ac/");
}
$arr = [];
for($i=0; $i<count($res); $i++){
if($i === 8){break;}
if(empty($bang)){
$arr[] = [
"s" => $res[$i]["phrase"]
];
}else{
$arr[] = [
"s" => $bang . $res[$i]["phrase"],
"n" => $row["name"]
];
}
}
return $arr;
}

96
news.php Normal file
View File

@ -0,0 +1,96 @@
<?php
/*
Initialize random shit
*/
include "lib/frontend.php";
$frontend = new frontend();
[$scraper, $filters] = $frontend->getscraperfilters("news");
$get = $frontend->parsegetfilters($_GET, $filters);
$frontend->loadheader(
$get,
$filters,
"news"
);
$payload = [
"class" => "",
"right-left" => "",
"right-right" => "",
"left" => ""
];
try{
$results = $scraper->news($get);
}catch(Exception $error){
echo
$frontend->drawerror(
"Shit",
'This scraper returned an error:' .
'<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' .
'Things you can try:' .
'<ul>' .
'<li>Use a different scraper</li>' .
'<li>Remove keywords that could cause errors</li>' .
'<li>Use another 4get instance</li>' .
'</ul><br>' .
'If the error persists, please <a href="/about">contact the administrator</a>.'
);
die();
}
/*
Populate links
*/
if(count($results["news"]) === 0){
$payload["left"] =
'<div class="infobox">' .
"<h1>Nobody here but us chickens!</h1>" .
'Have you tried:' .
'<ul>' .
'<li>Using a different scraper</li>' .
'<li>Using fewer keywords</li>' .
'<li>Defining broader filters (Is NSFW turned off?)</li>' .
'</ul>' .
'</div>';
}
foreach($results["news"] as $news){
$greentext = [];
if($news["date"] !== null){
$greentext[] = date("jS M y @ g:ia", $news["date"]);
}
if($news["author"] !== null){
$greentext[] = htmlspecialchars($news["author"]);
}
if(count($greentext) !== 0){
$greentext = implode("", $greentext);
}else{
$greentext = null;
}
$n = null;
$payload["left"] .= $frontend->drawtextresult($news, $greentext, $n, $get["s"]);
}
if($results["npt"] !== null){
$payload["left"] .=
'<a href="' . $frontend->htmlnextpage($get, $results["npt"], "news") . '" class="nextpage">Next page &gt;</a>';
}
echo $frontend->load("search.html", $payload);

9
opensearch.xml Normal file
View File

@ -0,0 +1,9 @@
<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>4get</ShortName>
<InputEncoding>UTF-8</InputEncoding>
<Image height="16" width="16">https://4get.ca/favicon.ico</Image>
<Url type="text/html" method="GET" template="https://4get.ca/web?s={searchTerms}"/>
</OpenSearchDescription>

130
proxy.php Normal file
View File

@ -0,0 +1,130 @@
<?php
include "lib/curlproxy.php";
$proxy = new proxy();
if(!isset($_GET["i"])){
header("X-Error: No URL(i) provided!");
$proxy->do404();
die();
}
try{
// original size request, stream file to browser
if(
!isset($_GET["s"]) ||
$_GET["s"] == "original"
){
$proxy->stream_linear_image($_GET["i"]);
die();
}
// bing request, ask bing to resize and stream to browser
if(
preg_match(
'/bing.net$/',
parse_url($_GET["i"], PHP_URL_HOST)
)
){
switch($_GET["s"]){
case "portrait": $req = "&w=50&h=90&p=0&qlt=99"; break;
case "landscape": $req = "&w=160&h=90&p=0&qlt=99"; break;
case "square": $req = "&w=90&h=90&p=0&qlt=99"; break;
case "thumb": $req = "&w=236&h=180&p=0&qlt=99"; break;
case "cover": $req = "&w=207&h=270&p=0&qlt=99"; break;
}
$proxy->stream_linear_image($_GET["i"] . $req, "https://bing.net");
die();
}
// resize image ourselves
$payload = $proxy->get($_GET["i"], $proxy::req_image, true);
// get image format & set imagick
$image = null;
$format = $proxy->getimageformat($payload, $image);
try{
if($format !== false){
$image->setFormat($format);
}
$image->readImageBlob($payload["body"]);
$image_width = $image->getImageWidth();
$image_height = $image->getImageHeight();
switch($_GET["s"]){
case "portrait":
$width = 50;
$height = 90;
break;
case "landscape":
$width = 160;
$height = 90;
break;
case "square":
$width = 90;
$height = 90;
break;
case "thumb":
$width = 236;
$height = 180;
break;
case "cover":
$width = 207;
$height = 270;
break;
}
$ratio = $image_width / $image_height;
if($image_width > $width){
$image_width = $width;
$image_height = round($image_width / $ratio);
}
if($image_height > $height){
$ratio = $image_width / $image_height;
$image_height = $height;
$image_width = $image_height * $ratio;
}
$image->setImageBackgroundColor(new ImagickPixel("#504945"));
$image->mergeImageLayers(Imagick::LAYERMETHOD_FLATTEN);
$image->resizeImage($image_width, $image_height, Imagick::FILTER_LANCZOS, 1);
$image->stripImage();
$image->setFormat("jpeg");
$image->setImageCompression(Imagick::COMPRESSION_JPEG2000);
$proxy->getfilenameheader($payload["headers"], $_GET["i"]);
header("Content-Type: image/jpeg");
echo $image->getImageBlob();
}catch(ImagickException $error){
header("X-Error: Could not convert the image: (" . $error->getMessage() . ")");
$proxy->do404();
}
}catch(Exception $error){
header("X-Error: " . $error->getMessage());
$proxy->do404();
die();
}

28
robots.txt Normal file
View File

@ -0,0 +1,28 @@
# When the robots.txt is sus
# ⠀⠀⠀⡯⡯⡾⠝⠘⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢊⠘⡮⣣⠪⠢⡑⡌
# ⠀⠀⠀⠟⠝⠈⠀⠀⠀⠡⠀⠠⢈⠠⢐⢠⢂⢔⣐⢄⡂⢔⠀⡁⢉⠸⢨⢑⠕⡌
# ⠀⠀⡀⠁⠀⠀⠀⡀⢂⠡⠈⡔⣕⢮⣳⢯⣿⣻⣟⣯⣯⢷⣫⣆⡂⠀⠀⢐⠑⡌
# ⢀⠠⠐⠈⠀⢀⢂⠢⡂⠕⡁⣝⢮⣳⢽⡽⣾⣻⣿⣯⡯⣟⣞⢾⢜⢆⠀⡀⠀⠪
# ⣬⠂⠀⠀⢀⢂⢪⠨⢂⠥⣺⡪⣗⢗⣽⢽⡯⣿⣽⣷⢿⡽⡾⡽⣝⢎⠀⠀⠀⢡
# ⣿⠀⠀⠀⢂⠢⢂⢥⢱⡹⣪⢞⡵⣻⡪⡯⡯⣟⡾⣿⣻⡽⣯⡻⣪⠧⠑⠀⠁⢐
# ⣿⠀⠀⠀⠢⢑⠠⠑⠕⡝⡎⡗⡝⡎⣞⢽⡹⣕⢯⢻⠹⡹⢚⠝⡷⡽⡨⠀⠀⢔
# ⣿⡯⠀⢈⠈⢄⠂⠂⠐⠀⠌⠠⢑⠱⡱⡱⡑⢔⠁⠀⡀⠐⠐⠐⡡⡹⣪⠀⠀⢘
# ⣿⣽⠀⡀⡊⠀⠐⠨⠈⡁⠂⢈⠠⡱⡽⣷⡑⠁⠠⠑⠀⢉⢇⣤⢘⣪⢽⠀⢌⢎
# ⣿⢾⠀⢌⠌⠀⡁⠢⠂⠐⡀⠀⢀⢳⢽⣽⡺⣨⢄⣑⢉⢃⢭⡲⣕⡭⣹⠠⢐⢗
# ⣿⡗⠀⠢⠡⡱⡸⣔⢵⢱⢸⠈⠀⡪⣳⣳⢹⢜⡵⣱⢱⡱⣳⡹⣵⣻⢔⢅⢬⡷
# ⣷⡇⡂⠡⡑⢕⢕⠕⡑⠡⢂⢊⢐⢕⡝⡮⡧⡳⣝⢴⡐⣁⠃⡫⡒⣕⢏⡮⣷⡟
# ⣷⣻⣅⠑⢌⠢⠁⢐⠠⠑⡐⠐⠌⡪⠮⡫⠪⡪⡪⣺⢸⠰⠡⠠⠐⢱⠨⡪⡪⡰
# ⣯⢷⣟⣇⡂⡂⡌⡀⠀⠁⡂⠅⠂⠀⡑⡄⢇⠇⢝⡨⡠⡁⢐⠠⢀⢪⡐⡜⡪⡊
# ⣿⢽⡾⢹⡄⠕⡅⢇⠂⠑⣴⡬⣬⣬⣆⢮⣦⣷⣵⣷⡗⢃⢮⠱⡸⢰⢱⢸⢨⢌
# ⣯⢯⣟⠸⣳⡅⠜⠔⡌⡐⠈⠻⠟⣿⢿⣿⣿⠿⡻⣃⠢⣱⡳⡱⡩⢢⠣⡃⠢⠁
# ⡯⣟⣞⡇⡿⣽⡪⡘⡰⠨⢐⢀⠢⢢⢄⢤⣰⠼⡾⢕⢕⡵⣝⠎⢌⢪⠪⡘⡌⠀
# ⡯⣳⠯⠚⢊⠡⡂⢂⠨⠊⠔⡑⠬⡸⣘⢬⢪⣪⡺⡼⣕⢯⢞⢕⢝⠎⢻⢼⣀⠀
# ⠁⡂⠔⡁⡢⠣⢀⠢⠀⠅⠱⡐⡱⡘⡔⡕⡕⣲⡹⣎⡮⡏⡑⢜⢼⡱⢩⣗⣯⣟
# ⢀⢂⢑⠀⡂⡃⠅⠊⢄⢑⠠⠑⢕⢕⢝⢮⢺⢕⢟⢮⢊⢢⢱⢄⠃⣇⣞⢞⣞⢾
# ⢀⠢⡑⡀⢂⢊⠠⠁⡂⡐⠀⠅⡈⠪⠪⠪⠣⠫⠑⡁⢔⠕⣜⣜⢦⡰⡎⡯⡾⡽
User-agent: *
Disallow:
host: 4get.ca
sitemap: https://4get.ca/sitemap.xml

2287
scraper/brave.php Normal file

File diff suppressed because it is too large Load Diff

2722
scraper/ddg.php Normal file

File diff suppressed because it is too large Load Diff

1562
scraper/google.php Normal file

File diff suppressed because it is too large Load Diff

242
scraper/marginalia.php Normal file
View File

@ -0,0 +1,242 @@
<?php
class marginalia{
public function __construct(){
$this->key = "public";
}
public function getfilters($page){
switch($page){
case "web":
return [
"profile" => [
"display" => "Profile",
"option" => [
"any" => "Default",
"modern" => "Modern"
]
],
"format" => [
"display" => "Format",
"option" => [
"any" => "Any",
"html5" => "html5",
"xhtml" => "xhtml",
"html123" => "html123"
]
],
"file" => [
"display" => "File",
"option" => [
"any" => "Any",
"nomedia" => "Deny media",
"media" => "Contains media",
"audio" => "Contains audio",
"video" => "Contains video",
"archive" => "Contains archive",
"document" => "Contains document"
]
],
"javascript" => [
"display" => "Javascript",
"option" => [
"any" => "Allow JS",
"deny" => "Deny JS",
"require" => "Require JS"
]
],
"trackers" => [
"display" => "Trackers",
"option" => [
"any" => "Allow trackers",
"deny" => "Deny trackers",
"require" => "Require trackers"
]
],
"cookies" => [
"display" => "Cookies",
"option" => [
"any" => "Allow cookies",
"deny" => "Deny cookies",
"require" => "Require cookies"
]
],
"affiliate" => [
"display" => "Affiliate links in body",
"option" => [
"any" => "Allow affiliate links",
"deny" => "Deny affiliate links",
"require" => "Require affiliate links"
]
]
];
}
}
private function get($url, $get = []){
$headers = [
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"
];
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function web($get){
$search = [$get["s"]];
$profile = $get["profile"];
$format = $get["format"];
$file = $get["file"];
foreach(
[
"javascript" => $get["javascript"],
"trackers" => $get["trackers"],
"cookies" => $get["cookies"],
"affiliate" => $get["affiliate"]
]
as $key => $value
){
if($value == "any"){ continue; }
switch($key){
case "javascript": $str = "js:true"; break;
case "trackers": $str = "special:tracking"; break;
case "cookies": $str = "special:cookies"; break;
case "affiliate": $str = "special:affiliate"; break;
}
if($value == "deny"){
$str = "-" . $str;
}
$search[] = $str;
}
if($format != "any"){
$search[] = "format:$format";
}
switch($file){
case "any": break;
case "nomedia": $search[] = "-special:media"; break;
case "media": $search[] = "special:media"; break;
default:
$search[] = "file:$file";
}
$search = implode(" ", $search);
$params = [
"count" => 20
];
if($profile == "modern"){
$params["index"] = 1;
}
try{
$json =
$this->get(
"https://api.marginalia.nu/{$this->key}/search/" . urlencode($search),
$params
);
}catch(Exception $error){
throw new Exception("Failed to get JSON");
}
if($json == "Slow down"){
throw new Exception("The API key used is rate limited. Please try again in a few minutes.");
}
$json = json_decode($json, true);
/*
$handle = fopen("scraper/marginalia.json", "r");
$json = json_decode(fread($handle, filesize("scraper/marginalia.json")), true);
fclose($handle);*/
$out = [
"status" => "ok",
"spelling" => [
"type" => "no_correction",
"using" => null,
"correction" => null
],
"npt" => null,
"answer" => [],
"web" => [],
"image" => [],
"video" => [],
"news" => [],
"related" => []
];
foreach($json["results"] as $result){
$out["web"][] = [
"title" => $result["title"],
"description" => str_replace("\n", " ", $result["description"]),
"url" => $result["url"],
"date" => null,
"type" => "web",
"thumb" => [
"url" => null,
"ratio" => null
],
"sublink" => [],
"table" => []
];
}
return $out;
}
}

1182
scraper/mojeek.php Normal file

File diff suppressed because it is too large Load Diff

244
scraper/wiby.php Normal file
View File

@ -0,0 +1,244 @@
<?php
class wiby{
public function __construct(){
include "lib/nextpage.php";
$this->nextpage = new nextpage("wiby");
}
public function getfilters($page){
if($page != "web"){
return [];
}
return [
"nsfw" => [
"display" => "NSFW",
"option" => [
"yes" => "Yes",
"no" => "No"
]
],
"date" => [
"display" => "Time posted",
"option" => [
"any" => "Any time",
"day" => "Past day",
"week" => "Past week",
"month" => "Past month",
"year" => "Past year",
]
]
];
}
private function get($url, $get = [], $nsfw){
$curlproc = curl_init();
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"Cookie: ws={$nsfw}",
"DNT: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function web($get){
if($get["npt"]){
$q =
json_decode(
$this->nextpage->get($get["npt"], "web"),
true
);
$nsfw = $q["nsfw"];
unset($q["nsfw"]);
}else{
$search = $get["s"];
if(strlen($search) === 0){
throw new Exception("Search term is empty!");
}
$date = $get["date"];
$nsfw = $get["nsfw"] == "yes" ? "0" : "1";
$search =
str_replace(
[
"!g",
"!gi",
"!gv",
"!gm",
"!b",
"!bi",
"!bv",
"!bm",
"!td",
"!tw",
"!tm",
"!ty",
"&g",
"&gi",
"&gv",
"&gm",
"&b",
"&bi",
"&bv",
"&bm",
"&td",
"&tw",
"&tm",
"&ty",
],
"",
$search
);
switch($date){
case "day": $search = "!td " . $search; break;
case "week": $search = "!tw " . $search; break;
case "month": $search = "!tm " . $search; break;
case "year": $search = "!ty " . $search; break;
}
$q = [
"q" => $search
];
}
try{
$html = $this->get(
"https://wiby.me/",
$q,
$nsfw
);
}catch(Exception $error){
throw new Exception("Failed to fetch search page");
}
preg_match(
'/<p class="pin"><blockquote>(?:<\/p>)?<br><a class="more" href="\/\?q=[^"]+&p=([0-9]+)">Find more\.\.\.<\/a><\/blockquote>/',
$html,
$nextpage
);
if(count($nextpage) === 0){
$nextpage = null;
}else{
$nextpage =
$this->nextpage->store(
json_encode([
"q" => $q["q"],
"p" => (int)$nextpage[1],
"nsfw" => $nsfw
]),
"web"
);
}
$out = [
"status" => "ok",
"spelling" => [
"type" => "no_correction",
"using" => null,
"correction" => null
],
"npt" => $nextpage,
"answer" => [],
"web" => [],
"image" => [],
"video" => [],
"news" => [],
"related" => []
];
preg_match_all(
'/<blockquote>[\s]*<a .* href="(.*)">(.*)<\/a>.*<p>(.*)<\/p>[\s]*<\/blockquote>/Ui',
$html,
$links
);
for($i=0; $i<count($links[0]); $i++){
$out["web"][] = [
"title" => $this->unescapehtml(trim($links[2][$i])),
"description" => $this->unescapehtml(trim(strip_tags($links[3][$i]))),
"url" => trim($links[1][$i]),
"date" => null,
"type" => "web",
"thumb" => [
"url" => null,
"ratio" => null
],
"sublink" => [],
"table" => []
];
}
return $out;
}
private function unescapehtml($str){
return html_entity_decode(
str_replace(
[
"<br>",
"<br/>",
"</br>",
"<BR>",
"<BR/>",
"</BR>",
],
"\n",
$str
),
ENT_QUOTES | ENT_XML1, 'UTF-8'
);
}
}

530
scraper/yandex.php Normal file
View File

@ -0,0 +1,530 @@
<?php
class yandex{
/*
curl functions
*/
public function __construct(){
include "lib/fuckhtml.php";
$this->fuckhtml = new fuckhtml();
include "lib/nextpage.php";
$this->nextpage = new nextpage("yandex");
}
private function get($url, $get = [], $nsfw){
$curlproc = curl_init();
$search = $get["text"];
if($get !== []){
$get = http_build_query($get);
$url .= "?" . $get;
}
curl_setopt($curlproc, CURLOPT_URL, $url);
switch($nsfw){
case "yes": $nsfw = "0"; break;
case "maybe": $nsfw = "1"; break;
case "no": $nsfw = "2"; break;
}
$headers =
["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding: gzip",
"Accept-Language: en-US,en;q=0.5",
"DNT: 1",
"Cookie: yp=1716337604.sp.family%3A{$nsfw}#1685406411.szm.1:1920x1080:1920x999",
"Referer: https://yandex.com/images/search?text={$search}",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: cross-site",
"Upgrade-Insecure-Requests: 1"];
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
}
public function getfilters($pagetype){
switch($pagetype){
case "images":
return
[
"nsfw" => [
"display" => "NSFW",
"option" => [
"yes" => "Yes",
"maybe" => "Maybe",
"no" => "No"
]
],
"time" => [
"display" => "Time posted",
"option" => [
"any" => "Any time",
"week" => "Last week"
]
],
"size" => [
"display" => "Size",
"option" => [
"any" => "Any size",
"small" => "Small",
"medium" => "Medium",
"large" => "Large",
"wallpaper" => "Wallpaper"
]
],
"color" => [
"display" => "Colors",
"option" => [
"any" => "All colors",
"color" => "Color images only",
"gray" => "Black and white",
"red" => "Red",
"orange" => "Orange",
"yellow" => "Yellow",
"cyan" => "Cyan",
"green" => "Green",
"blue" => "Blue",
"violet" => "Purple",
"white" => "White",
"black" => "Black"
]
],
"type" => [
"display" => "Type",
"option" => [
"any" => "All types",
"photo" => "Photos",
"clipart" => "White background",
"lineart" => "Drawings and sketches",
"face" => "People",
"demotivator" => "Demotivators"
]
],
"layout" => [
"display" => "Layout",
"option" => [
"any" => "All layouts",
"horizontal" => "Horizontal",
"vertical" => "Vertical",
"square" => "Square"
]
],
"format" => [
"display" => "Format",
"option" => [
"any" => "Any format",
"jpeg" => "JPEG",
"png" => "PNG",
"gif" => "GIF"
]
]
];
break;
default:
return [];
break;
}
}
public function image($get){
if($get["npt"]){
$request =
json_decode(
$this->nextpage->get(
$get["npt"],
"images"
),
true
);
$nsfw = $request["nsfw"];
unset($request["nsfw"]);
}else{
$search = $get["s"];
if(strlen($search) === 0){
throw new Exception("Search term is empty!");
}
$nsfw = $get["nsfw"];
$time = $get["time"];
$size = $get["size"];
$color = $get["color"];
$type = $get["type"];
$layout = $get["layout"];
$format = $get["format"];
/*
$handle = fopen("scraper/yandex.json", "r");
$json = fread($handle, filesize("scraper/yandex.json"));
fclose($handle);*/
// SIZE
// large
// 227.0=1;203.0=1;76fe94.0=1;41d251.0=1;75.0=1;371.0=1;291.0=1;307.0=1;f797ee.0=1;1cf7c2.0=1;deca32.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&isize=large&suggest_reqid=486139416166165501540886508227485&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// medium
// 227.0=1;203.0=1;76fe94.0=1;41d251.0=1;75.0=1;371.0=1;291.0=1;307.0=1;f797ee.0=1;1cf7c2.0=1;deca32.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&isize=medium&suggest_reqid=486139416166165501540886508227485&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// small
// 227.0=1;203.0=1;76fe94.0=1;41d251.0=1;75.0=1;371.0=1;291.0=1;307.0=1;f797ee.0=1;1cf7c2.0=1;deca32.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&isize=small&suggest_reqid=486139416166165501540886508227485&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// ORIENTATION
// Horizontal
// 227.0=1;203.0=1;76fe94.0=1;41d251.0=1;75.0=1;371.0=1;291.0=1;307.0=1;f797ee.0=1;1cf7c2.0=1;deca32.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&iorient=horizontal&suggest_reqid=486139416166165501540886508227485&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Vertical
// 227.0=1;203.0=1;76fe94.0=1;41d251.0=1;75.0=1;371.0=1;291.0=1;307.0=1;f797ee.0=1;1cf7c2.0=1;deca32.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&iorient=vertical&suggest_reqid=486139416166165501540886508227485&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Square
// 227.0=1;203.0=1;76fe94.0=1;41d251.0=1;75.0=1;371.0=1;291.0=1;307.0=1;f797ee.0=1;1cf7c2.0=1;deca32.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&iorient=square&suggest_reqid=486139416166165501540886508227485&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// TYPE
// Photos
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&text=minecraft&type=photo&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// White background
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&text=minecraft&type=clipart&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Drawings and sketches
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&text=minecraft&type=lineart&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// People
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&text=minecraft&type=face&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Demotivators
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&text=minecraft&type=demotivator&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// COLOR
// Color images only
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=color&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Black and white
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=gray&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Red
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=red&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Orange
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=orange&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Yellow
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=yellow&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Cyan
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=cyan&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Green
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=green&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Blue
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=blue&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Purple
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=violet&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// White
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=white&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// Black
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&icolor=black&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// FORMAT
// jpeg
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&itype=jpg&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// png
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&itype=png&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// gif
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&itype=gifan&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// RECENT
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&recent=7D&text=minecraft&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
// WALLPAPER
// 307.0=1;371.0=1;291.0=1;203.0=1;deca32.0=1;f797ee.0=1;1cf7c2.0=1;41d251.0=1;267.0=1;bde197.0=1"},"extraContent":{"names":["i-react-ajax-adapter"]}}}&yu=4861394161661655015&isize=wallpaper&text=minecraft&wp=wh16x9_1920x1080&uinfo=sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080
$request = [
"format" => "json",
"request" => [
"blocks" => [
[
"block" => "extra-content",
"params" => (object)[],
"version" => 2
],
[
"block" => "i-global__params:ajax",
"params" => (object)[],
"version" => 2
],
[
"block" => "search2:ajax",
"params" => (object)[],
"version" => 2
],
[
"block" => "preview__isWallpaper",
"params" => (object)[],
"version" => 2
],
[
"block" => "content_type_search",
"params" => (object)[],
"version" => 2
],
[
"block" => "serp-controller",
"params" => (object)[],
"version" => 2
],
[
"block" => "cookies_ajax",
"params" => (object)[],
"version" => 2
],
[
"block" => "advanced-search-block",
"params" => (object)[],
"version" => 2
]
],
"metadata" => [
"bundles" => [
"lb" => "AS?(E<X120"
],
"assets" => [
// las base
"las" => "justifier-height=1;justifier-setheight=1;fitimages-height=1;justifier-fitincuts=1;react-with-dom=1;"
// las default
//"las" => "justifier-height=1;justifier-setheight=1;fitimages-height=1;justifier-fitincuts=1;react-with-dom=1;227.0=1;203.0=1;76fe94.0=1;215f96.0=1;75.0=1"
],
"extraContent" => [
"names" => [
"i-react-ajax-adapter"
]
]
]
]
];
/*
Apply filters
*/
if($time == "week"){
$request["recent"] = "7D";
}
if($size != "any"){
$request["isize"] = $size;
}
if($type != "any"){
$request["type"] = $type;
}
if($color != "any"){
$request["icolor"] = $color;
}
if($layout != "any"){
$request["iorient"] = $layout;
}
if($format != "any"){
$request["itype"] = $format;
}
$request["text"] = $search;
$request["uinfo"] = "sw-1920-sh-1080-ww-1125-wh-999-pd-1-wp-16x9_1920x1080";
$request["request"] = json_encode($request["request"]);
}
try{
$json = $this->get(
"https://yandex.com/images/search",
$request,
$nsfw
);
}catch(Exception $err){
throw new Exception("Failed to get JSON");
}
/*
$handle = fopen("scraper/yandex.json", "r");
$json = fread($handle, filesize("scraper/yandex.json"));
fclose($handle);*/
$json = json_decode($json, true);
if(
isset($json["type"]) &&
$json["type"] == "captcha"
){
throw new Exception("Yandex blocked this 4get instance. Yandex blocks don't last very long, but the block timer gets reset everytime you make another unsuccessful request. Please try again in ~7 minutes.");
}
if($json === null){
throw new Exception("Failed to decode JSON");
}
// get html
$html = "";
foreach($json["blocks"] as $block){
$html .= $block["html"];
}
$this->fuckhtml->load($html);
$div = $this->fuckhtml->getElementsByTagName("div");
$out = [
"status" => "ok",
"npt" => null,
"image" => []
];
// check for next page
if(
count(
$this->fuckhtml
->getElementsByClassName(
"more more_direction_next",
$div
)
) !== 0
){
$request["nsfw"] = $nsfw;
if(isset($request["p"])){
$request["p"]++;
}else{
$request["p"] = 1;
}
$out["npt"] = $this->nextpage->store(json_encode($request), "images");
}
// get search results
foreach(
$this->fuckhtml
->getElementsByClassName(
"serp-item serp-item_type_search",
$div
)
as $image
){
$image =
json_decode(
$image
["attributes"]
["data-bem"],
true
)["serp-item"];
$title = [html_entity_decode($image["snippet"]["title"], ENT_QUOTES | ENT_HTML5)];
if(isset($image["snippet"]["text"])){
$title[] = html_entity_decode($image["snippet"]["text"], ENT_QUOTES | ENT_HTML5);
}
$tmp = [
"title" =>
$this->fuckhtml
->getTextContent(
$this->titledots(
implode(": ", $title)
)
),
"source" => [],
"url" => htmlspecialchars_decode($image["snippet"]["url"])
];
foreach($image["dups"] as $dup){
$tmp["source"][] = [
"url" => htmlspecialchars_decode($dup["url"]),
"width" => (int)$dup["w"],
"height" => (int)$dup["h"],
];
}
$tmp["source"][] = [
"url" =>
preg_replace(
'/^\/\//',
"https://",
htmlspecialchars_decode($image["thumb"]["url"])
),
"width" => (int)$image["thumb"]["size"]["width"],
"height" => (int)$image["thumb"]["size"]["height"]
];
$out["image"][] = $tmp;
}
return $out;
}
private function titledots($title){
$substr = substr($title, -3);
if(
$substr == "..." ||
$substr == ""
){
return trim(substr($title, 0, -3));
}
return trim($title);
}
}

1723
scraper/youtube.php Normal file

File diff suppressed because it is too large Load Diff

316
settings.php Normal file
View File

@ -0,0 +1,316 @@
<?php
/*
Define settings
*/
$settings = [
[
"name" => "General",
"settings" => [
[
"description" => "Allow NSFW content",
"parameter" => "nsfw",
"options" => [
[
"value" => "yes",
"text" => "Yes"
],
[
"value" => "maybe",
"text" => "Maybe"
],
[
"value" => "no",
"text" => "No"
]
]
],
[
"description" => "Theme",
"parameter" => "theme",
"options" => [
[
"value" => "dark",
"text" => "Gruvbox dark"
],
[
"value" => "cream",
"text" => "Gruvbox cream"
]
]
],
[
"description" => "Prevent clicking background elements when image viewer is open",
"parameter" => "bg_noclick",
"options" => [
[
"value" => "no",
"text" => "No"
],
[
"value" => "yes",
"text" => "Yes"
]
]
]
]
],
[
"name" => "Scrapers to use",
"settings" => [
[
"description" => "Web",
"parameter" => "scraper_web",
"options" => [
[
"value" => "ddg",
"text" => "DuckDuckGo"
],
[
"value" => "brave",
"text" => "Brave"
],
[
"value" => "google",
"text" => "Google"
],
[
"value" => "mojeek",
"text" => "Mojeek"
],
[
"value" => "marginalia",
"text" => "Marginalia"
],
[
"value" => "wiby",
"text" => "wiby"
]
]
],
[
"description" => "Images",
"parameter" => "scraper_images",
"options" => [
[
"value" => "ddg",
"text" => "DuckDuckGo"
],
[
"value" => "yandex",
"text" => "Yandex"
],
[
"value" => "google",
"text" => "Google"
]
]
],
[
"description" => "Videos",
"parameter" => "scraper_videos",
"options" => [
[
"value" => "yt",
"text" => "YouTube"
],
[
"value" => "ddg",
"text" => "DuckDuckGo"
],
[
"value" => "google",
"text" => "Google"
]
]
],
[
"description" => "News",
"parameter" => "scraper_news",
"options" => [
[
"value" => "ddg",
"text" => "DuckDuckGo"
],
[
"value" => "brave",
"text" => "Brave"
],
[
"value" => "google",
"text" => "Google"
],
[
"value" => "mojeek",
"text" => "Mojeek"
]
]
]
]
]
];
/*
Set cookies
*/
if($_POST){
$loop = &$_POST;
}else{
// refresh cookie dates
$loop = &$_COOKIE;
}
foreach($loop as $key => $value){
foreach($settings as $title){
foreach($title["settings"] as $list){
if(
$list["parameter"] == $key &&
$list["options"][0]["value"] == $value
){
unset($_COOKIE[$key]);
setcookie(
$key,
"",
[
"expires" => -1, // removes cookie
"samesite" => "Strict"
]
);
continue 3;
}
}
}
if(!is_string($value)){
continue;
}
$key = trim($key);
$value = trim($value);
$_COOKIE[$key] = $value;
setcookie(
$key,
$value,
[
"expires" => strtotime("+400 days"), // maximal cookie ttl in chrome
"samesite" => "Strict"
]
);
}
include "lib/frontend.php";
$frontend = new frontend();
echo
'<!DOCTYPE html>' .
'<html lang="en">' .
'<head>' .
'<meta http-equiv="Content-Type" content="text/html;charset=utf-8">' .
'<title>Settings</title>' .
'<link rel="stylesheet" href="/static/style.css">' .
'<meta name="viewport" content="width=device-width,initial-scale=1">' .
'<meta name="robots" content="index,follow">' .
'<link rel="icon" type="image/x-icon" href="/favicon.ico">' .
'<meta name="description" content="4get.ca: Settings">' .
'<link rel="search" type="application/opensearchdescription+xml" title="4get" href="/opensearch.xml">' .
'</head>' .
'<body' . $frontend->getthemeclass() . '>';
$left =
'<h1>Settings</h1>' .
'<form method="post" autocomplete="off">' .
'By clicking <div class="code-inline">Update settings!</div>, a plaintext <div class="code-inline">key=value</div> cookie will be stored on your browser. When selecting a default setting, the parameter is removed from your cookies.';
$c = count($_COOKIE);
if($c !== 0){
$left .=
'<br><br>Your current cookie looks like this:' .
'<div class="code">';
$code = "";
$ca = 0;
foreach($_COOKIE as $key => $value){
$code .= $key . "=" . $value;
$ca++;
if($ca !== $c){
$code .= "; ";
}
}
$left .= $frontend->highlightcode($code);
$left .= '</div>';
}else{
$left .=
'<br><br>You currently don\'t have any cookies set.';
}
$left .=
'<div class="settings">';
foreach($settings as $title){
$left .= '<h2>' . $title["name"] . '</h2>';
foreach($title["settings"] as $setting){
$left .=
'<div class="setting">' .
'<div class="title">' . $setting["description"] . '</div>' .
'<select name="' . $setting["parameter"] . '">';
foreach($setting["options"] as $option){
$left .=
'<option value="' . $option["value"] . '"';
if(
isset($_COOKIE[$setting["parameter"]]) &&
$_COOKIE[$setting["parameter"]] == $option["value"]
){
$left .= ' selected';
}
$left .= '>' . $option["text"] . '</option>';
}
$left .= '</select></div>';
}
}
$left .=
'</div>' .
'<div class="settings-submit">' .
'<input type="submit" value="Update settings!">' .
'<a href="../">&lt; Return to main page</a>' .
'</div>' .
'</form>';
echo
$frontend->load(
"search.html",
[
"class" => "",
"right-left" => "",
"right-right" => "",
"left" => $left
]
);

19
sitemap.xml Normal file
View File

@ -0,0 +1,19 @@
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://4get.ca</loc>
<lastmod>2023-07-31T07:56:12+03:00</lastmod>
</url>
<url>
<loc>https://4get.ca/about</loc>
<lastmod>2023-07-31T07:56:12+03:00</lastmod>
</url>
<url>
<loc>https://4get.ca/settings</loc>
<lastmod>2023-07-31T07:56:12+03:00</lastmod>
</url>
<url>
<loc>https://4get.ca/api.txt</loc>
<lastmod>2023-07-31T07:56:12+03:00</lastmod>
</url>
</urlset>

682
static/client.js Normal file
View File

@ -0,0 +1,682 @@
/*
Global functions
*/
function htmlspecialchars(str){
var map = {
'&': '&amp;',
'<': '&lt;',
'>': '&gt;',
'"': '&quot;',
"'": '&#039;'
}
return str.replace(/[&<>"']/g, function(m){return map[m];});
}
function htmlspecialchars_decode(str){
var map = {
'&amp;': '&',
'&lt;': '<',
'&gt;': '>',
'&quot;': '"',
'&#039;': "'"
}
return str.replace(/&amp;|&lt;|&gt;|&quot;|&#039;/g, function(m){return map[m];});
}
function is_click_within(elem, classname, is_id = false){
while(true){
if(elem === null){
return false;
}
if(
(
is_id === false &&
elem.className == classname
) ||
(
is_id === true &&
elem.id == classname
)
){
return elem;
}
elem = elem.parentElement;
}
}
/*
Prevent GET parameter pollution
*/
var form = document.getElementsByTagName("form");
if(
form.length !== 0 &&
window.location.pathname != "/" &&
window.location.pathname != "/settings.php" &&
window.location.pathname != "/settings"
){
form = form[0];
var scraper_dropdown = document.getElementsByName("scraper")[0];
scraper_dropdown.addEventListener("change", function(choice){
submit(form);
});
form.addEventListener("submit", function(e){
e.preventDefault();
submit(e.srcElement);
});
}
function submit(e){
var GET = "";
var first = true;
if((s = document.getElementsByName("s")).length !== 0){
GET += "?s=" + encodeURIComponent(s[0].value).replaceAll("%20", "+");
first = false;
}
Array.from(
e.getElementsByTagName("select")
).concat(
Array.from(
e.getElementsByTagName("input")
)
).forEach(function(el){
var firstelem = el.getElementsByTagName("option");
if(
(
(
firstelem.length === 0 ||
firstelem[0].value != el.value
) &&
el.name != "" &&
el.value != "" &&
el.name != "s"
) ||
el.name == "scraper" ||
el.name == "nsfw"
){
if(first){
GET += "?";
first = false;
}else{
GET += "&";
}
GET += encodeURIComponent(el.name).replaceAll("%20", "+") + "=" + encodeURIComponent(el.value).replaceAll("%20", "+");
}
});
window.location.href = GET;
}
/*
Hide show more button when it's not needed on answers
*/
var answer_div = document.getElementsByClassName("answer");
if(answer_div.length !== 0){
answer_div = Array.from(answer_div);
var spoiler_button_div = Array.from(document.getElementsByClassName("spoiler-button"));
// execute on pageload
hide_show_more();
window.addEventListener("resize", hide_show_more);
function hide_show_more(){
var height = window.innerWidth >= 1000 ? 600 : 200;
for(i=0; i<answer_div.length; i++){
if(answer_div[i].scrollHeight < height){
spoiler_button_div[i].style.display = "none";
document.getElementById(spoiler_button_div[i].htmlFor).checked = true;
}else{
spoiler_button_div[i].style.display = "block";
}
}
}
}
switch(document.location.pathname){
case "/web":
case "/web.php":
var image_class = "image";
break;
case "/images":
case "/images.php":
var image_class = "thumb";
break;
default:
var image_class = null;
}
if(image_class !== null){
/*
Add popup to document
*/
var popup_bg = document.createElement("div");
popup_bg.id = "popup-bg";
document.body.appendChild(popup_bg);
// enable/disable pointer events
if(!document.cookie.includes("bg_noclick=yes")){
popup_bg.style.pointerEvents = "none";
}
var popup_status = document.createElement("div");
popup_status.id = "popup-status";
document.body.appendChild(popup_status);
var popup_body = document.createElement("div");
popup_body.id = "popup";
document.body.appendChild(popup_body);
// import popup
var popup_body = document.getElementById("popup");
var popup_status = document.getElementById("popup-status");
var popup_image = null; // is set later on popup click
// image metadata
var collection = []; // will contain width, height, image URL
var collection_index = 0;
// event handling helper variables
var is_popup_shown = false;
var mouse_down = false;
var mouse_move = false;
var move_x = 0;
var move_y = 0;
var target_is_popup = false;
var mirror_x = false;
var mirror_y = false;
var rotation = 0;
/*
Image dragging (mousedown)
*/
document.addEventListener("mousedown", function(div){
if(div.buttons !== 1){
return;
}
mouse_down = true;
mouse_move = false;
if(is_click_within(div.target, "popup", true) === false){
target_is_popup = false;
}else{
target_is_popup = true;
var pos = popup_body.getBoundingClientRect();
move_x = div.x - pos.x;
move_y = div.y - pos.y;
}
});
/*
Image dragging (mousemove)
*/
document.addEventListener("mousemove", function(pos){
if(
target_is_popup &&
mouse_down
){
mouse_move = true;
movepopup(popup_body, pos.clientX - move_x, pos.clientY - move_y);
}
});
/*
Image dragging (mouseup)
*/
document.addEventListener("mouseup", function(){
mouse_down = false;
});
/*
Image popup open
*/
document.addEventListener("click", function(click){
// should our click trigger image open?
if(
elem = is_click_within(click.target, image_class) ||
click.target.classList.contains("openimg")
){
event.preventDefault();
is_popup_shown = true;
// reset position params
mirror_x = false;
mirror_y = false;
rotation = 0;
scale = 60;
collection_index = 0;
// get popup data
if(elem === true){
// we clicked a simple image preview
elem = click.target;
var image_url = elem.getAttribute("src");
if(image_url.startsWith("/proxy")){
var match = image_url.match(/i=([^&]+)/);
if(match !== null){
image_url = decodeURIComponent(match[1]);
}
}else{
image_url = htmlspecialchars_decode(image_url);
}
collection = [
{
"url": image_url,
"width": Math.round(click.target.naturalWidth),
"height": Math.round(click.target.naturalHeight)
}
];
var title = "No description provided";
if(click.target.title != ""){
title = click.target.title;
}else{
if(click.target.alt != ""){
title = click.target.alt;
}
}
}else{
if(image_class == "thumb"){
// we're inside image.php
elem =
elem
.parentElement
.parentElement;
var image_url = elem.getElementsByTagName("a")[1].href;
}else{
// we're inside web.php
var image_url = elem.href;
}
collection =
JSON.parse(
elem.getAttribute("data-json")
);
var title = elem.title;
}
// prepare HTML
var html =
'<div id="popup-num">(' + collection.length + ')</div>' +
'<div id="popup-dropdown">' +
'<select name="viewer-res" onchange="changeimage(event)">';
for(i=0; i<collection.length; i++){
if(collection[i].url.startsWith("data:")){
var domain = "&lt;Base64 Data&gt;";
}else{
var domain = new URL(collection[i].url).hostname;
}
html += '<option value="' + i + '">' + '(' + collection[i].width + 'x' + collection[i].height + ') ' + domain + '</option>';
}
popup_status.innerHTML =
html + '</select></div>' +
'<a href="' + htmlspecialchars(image_url) + '" rel="noreferrer nofollow "id="popup-title">' + htmlspecialchars(title) + '</a>';
popup_body.innerHTML =
'<img src="' + getproxylink(collection[0].url) + '" draggable="false" id="popup-image">';
// make changes to DOM
popup_body.style.display = "block";
popup_bg.style.display = "block";
popup_status.style.display = "table";
// store for rotation functions & changeimage()
popup_image = document.getElementById("popup-image");
scalepopup(collection[collection_index], scale);
centerpopup();
}else{
// click inside the image viewer
// resize image
if(is_click_within(click.target, "popup", true)){
if(mouse_move === false){
scale = 80;
scalepopup(collection[collection_index], scale);
centerpopup();
}
}else{
if(is_click_within(click.target, "popup-status", true) === false){
// click outside the popup while its open
// close it
if(is_popup_shown){
hidepopup();
}
}
}
}
});
/*
Scale image viewer
*/
popup_body.addEventListener("wheel", function(scroll){
event.preventDefault();
if(
scroll.altKey ||
scroll.ctrlKey ||
scroll.shiftKey
){
var increment = 7;
}else{
var increment = 14;
}
if(scroll.wheelDelta > 0){
// scrolling up
scale = scale + increment;
}else{
// scrolling down
if(scale - increment > 7){
scale = scale - increment;
}
}
// calculate relative size before scroll
var pos = popup_body.getBoundingClientRect();
var x = (scroll.x - pos.x) / pos.width;
var y = (scroll.y - pos.y) / pos.height;
scalepopup(collection[collection_index], scale);
// move popup to % we found
pos = popup_body.getBoundingClientRect();
movepopup(
popup_body,
scroll.clientX - (x * pos.width),
scroll.clientY - (y * pos.height)
);
});
/*
Keyboard controls
*/
document.addEventListener("keydown", function(key){
// close popup
if(
is_popup_shown &&
key.keyCode === 27
){
hidepopup();
return;
}
if(is_popup_shown === false){
return;
}
if(
key.altKey ||
key.ctrlKey ||
key.shiftKey
){
// mirror image
switch(key.keyCode){
case 37:
// left
key.preventDefault();
mirror_x = true;
break;
case 38:
// up
key.preventDefault();
mirror_y = false;
break;
case 39:
// right
key.preventDefault();
mirror_x = false;
break;
case 40:
// down
key.preventDefault();
mirror_y = true;
break;
}
}else{
// rotate image
switch(key.keyCode){
case 37:
// left
key.preventDefault();
rotation = -90;
break;
case 38:
// up
key.preventDefault();
rotation = 0;
break;
case 39:
// right
key.preventDefault();
rotation = 90;
break;
case 40:
// down
key.preventDefault();
rotation = -180;
break;
}
}
popup_image.style.transform =
"scale(" +
(mirror_x ? "-1" : "1") +
", " +
(mirror_y ? "-1" : "1") +
") " +
"rotate(" +
rotation + "deg" +
")";
});
}
function getproxylink(url){
if(url.startsWith("data:")){
return htmlspecialchars(url);
}else{
console.log(url);
return '/proxy?i=' + encodeURIComponent(url);
}
}
function hidepopup(){
is_popup_shown = false;
popup_status.style.display = "none";
popup_body.style.display = "none";
popup_bg.style.display = "none";
}
function scalepopup(size, scale){
var ratio =
Math.min(
(window.innerWidth * (scale / 100)) / collection[collection_index].width, (window.innerHeight * (scale / 100)) / collection[collection_index].height
);
popup_body.style.width = size.width * ratio + "px";
popup_body.style.height = size.height * ratio + "px";
}
function centerpopup(){
var size = popup_body.getBoundingClientRect();
var size = {
"width": parseInt(size.width),
"height": parseInt(size.height)
};
movepopup(
popup_body,
(window.innerWidth / 2) - (size.width / 2),
(window.innerHeight / 2) - (size.height / 2)
);
}
function movepopup(popup_body, x, y){
popup_body.style.left = x + "px";
popup_body.style.top = y + "px";
}
function changeimage(event){
// reset rotation params
mirror_x = false;
mirror_y = false;
rotation = 0;
scale = 60;
collection_index = parseInt(event.target.value);
// we set innerHTML otherwise old image lingers a little
popup_body.innerHTML =
'<img src="' + getproxylink(collection[collection_index].url) + '" draggable="false" id="popup-image">';
// store for rotation functions & changeimage()
popup_image = document.getElementById("popup-image");
scalepopup(collection[collection_index], scale);
centerpopup();
}
/*
Shortcuts
*/
var searchbox_wrapper = document.getElementsByClassName("searchbox");
if(searchbox_wrapper.length !== 0){
searchbox_wrapper = searchbox_wrapper[0];
var searchbox = searchbox_wrapper.getElementsByTagName("input")[1];
document.addEventListener("keydown", function(key){
switch(key.keyCode){
case 191:
// 191 = /
if(document.activeElement.tagName == "INPUT"){
// already focused, ignore
break;
}
if(
typeof is_popup_shown != "undefined" &&
is_popup_shown
){
hidepopup();
}
window.scrollTo(0, 0);
searchbox.focus();
key.preventDefault();
break;
}
});
}

BIN
static/icon/amazon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 KiB

BIN
static/icon/appstore.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

BIN
static/icon/facebook.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 KiB

BIN
static/icon/gamespot.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 KiB

BIN
static/icon/github.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 KiB

BIN
static/icon/googleplay.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 KiB

BIN
static/icon/imdb.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.6 KiB

BIN
static/icon/instagram.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

BIN
static/icon/itunes.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

BIN
static/icon/microsoft.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

BIN
static/icon/quora.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 KiB

BIN
static/icon/reddit.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 585 B

BIN
static/icon/soundcloud.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

BIN
static/icon/spotify.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

BIN
static/icon/steam.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

BIN
static/icon/twitter.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

BIN
static/icon/w3html.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 KiB

BIN
static/icon/website.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

BIN
static/icon/wikipedia.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

BIN
static/icon/youtube.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

1176
static/style.css Normal file

File diff suppressed because it is too large Load Diff

28
template/header.html Normal file
View File

@ -0,0 +1,28 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<title>{%title%}</title>
<link rel="stylesheet" href="/static/style.css">
<meta name="viewport" content="width=device-width,initial-scale=1">
<meta name="robots" content="{%index%}index,{%index%}follow">
<link rel="icon" type="image/x-icon" href="/favicon.ico">
<meta name="description" content="4get.ca: {%description%}">
<link rel="search" type="application/opensearchdescription+xml" title="4get" href="/opensearch.xml">
</head>
<body{%body_class%}>
<form method="GET" autocomplete="off">
<div class="searchbox">
<input type="submit" value="Search" tabindex="-1">
<div class="wrapper">
<input type="text" value="{%search%}" maxlength="500" name="s" placeholder="Proxy search..." required>
</div>
<div class="autocomplete"></div>
</div>
<div class="tabs">
{%tabs%}
</div>
<div class="filters">
{%filters%}
</div>
</form>

36
template/home.html Normal file
View File

@ -0,0 +1,36 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<title>4get</title>
<meta name="viewport" content="width=device-width,initial-scale=1">
<link rel="stylesheet" href="/static/style.css">
<meta name="robots" content="index,follow">
<link rel="icon" type="image/x-icon" href="/favicon.ico">
<meta name="description" content="4get.ca: They live in our walls!">
<link rel="search" type="application/opensearchdescription+xml" title="4get" href="/opensearch.xml">
</head>
<body class="home {%body_class%}">
<div id="center">
<form method="GET" autocomplete="off" action="web">
<div class="logo">
<img src="{%banner%}" alt="4get">
</div>
<div class="searchbox">
<input type="submit" value="Search" tabindex="-1">
<div class="wrapper">
<input type="text" maxlength="500" name="s" placeholder="Proxy search..." required autofocus>
</div>
<div class="autocomplete"></div>
</div>
</form>
<a href="settings">Settings</a><a href="api.txt">API</a><a href="about">About</a><a href="https://git.lolcat.ca/lolcat/4get">Source</a>
<div class="subtext">
Clearnet: <a href="https://4get.ca">4get.ca</a><br>
Tor: <a href="http://4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion">4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion</a><br>
Report a problem: <a href="https://lolcat.ca/contact">lolcat.ca/contact</a>
</div>
</div>
<script src="/static/client.js"></script>
</body>
</html>

7
template/images.html Normal file
View File

@ -0,0 +1,7 @@
<div id="images">
{%images%}
</div>
{%nextpage%}
<script src="/static/client.js"></script>
</body>
</html>

16
template/search.html Normal file
View File

@ -0,0 +1,16 @@
<div id="overflow" class="web{%class%}">
<div class="right-wrapper">
<div class="right-right">
{%right-right%}
</div>
<div class="right-left">
{%right-left%}
</div>
</div>
<div class="left">
{%left%}
</div>
</div>
<script src="/static/client.js"></script>
</body>
</html>

241
videos.php Normal file
View File

@ -0,0 +1,241 @@
<?php
/*
Initialize random shit
*/
include "lib/frontend.php";
$frontend = new frontend();
[$scraper, $filters] = $frontend->getscraperfilters("videos");
$get = $frontend->parsegetfilters($_GET, $filters);
$frontend->loadheader(
$get,
$filters,
"videos"
);
$payload = [
"class" => "",
"right-left" => "",
"right-right" => "",
"left" => ""
];
try{
$results = $scraper->video($get);
}catch(Exception $error){
echo
$frontend->drawerror(
"Shit",
'This scraper returned an error:' .
'<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' .
'Things you can try:' .
'<ul>' .
'<li>Use a different scraper</li>' .
'<li>Remove keywords that could cause errors</li>' .
'<li>Use another 4get instance</li>' .
'</ul><br>' .
'If the error persists, please <a href="/about">contact the administrator</a>.'
);
die();
}
$categories = [
"video" => "",
"author" => "",
"livestream" => "",
"playlist" => "",
"reel" => ""
];
/*
Set the main container
*/
$main = null;
if(count($results["video"]) !== 0){
$main = "video";
}elseif(count($results["playlist"]) !== 0){
$main = "playlist";
}elseif(count($results["livestream"]) !== 0){
$main = "livestream";
}elseif(count($results["author"]) !== 0){
$main = "author";
}elseif(count($results["reel"]) !== 0){
$main = "reel";
}else{
// No results found!
echo
$frontend->drawerror(
"Nobody here but us chickens!",
'Have you tried:' .
'<ul>' .
'<li>Using a different scraper</li>' .
'<li>Using fewer keywords</li>' .
'<li>Defining broader filters (Is NSFW turned off?)</li>' .
'</ul>' .
'</div>'
);
die();
}
/*
Generate list of videos
*/
foreach($categories as $name => $data){
foreach($results[$name] as $item){
$greentext = [];
if(
isset($item["date"]) &&
$item["date"] !== null
){
$greentext[] = date("jS M y @ g:ia", $item["date"]);
}
if(
isset($item["views"]) &&
$item["views"] !== null
){
$views = number_format($item["views"]);
if($name != "livestream"){
$views .= " views";
}else{
$views .= " watching";
}
$greentext[] = $views;
}
if(
isset($item["followers"]) &&
$item["followers"] !== null
){
$greentext[] = number_format($item["followers"]) . " followers";
}
if(
isset($item["author"]["name"]) &&
$item["author"]["name"] !== null
){
$greentext[] = $item["author"]["name"];
}
$greentext = implode("", $greentext);
if(
isset($item["duration"]) &&
$item["duration"] !== null
){
$duration = $frontend->s_to_timestamp($item["duration"]);
}else{
$duration = null;
}
$tabindex = $name == $main ? true : false;
$categories[$name] .= $frontend->drawtextresult($item, $greentext, $duration, $get["s"], $tabindex);
}
}
$payload["left"] = $categories[$main];
// dont re-draw the category
unset($categories[$main]);
/*
Populate right handside
*/
$i = 1;
foreach($categories as $name => $value){
if($value == ""){
continue;
}
if($i % 2 === 1){
$write = "right-left";
}else{
$write = "right-right";
}
$payload[$write] .=
'<div class="answer-wrapper">' .
'<input id="answer' . $i . '" class="spoiler" type="checkbox">' .
'<div class="answer">' .
'<div class="answer-title">' .
'<a class="answer-title" href="?s=' . urlencode($get["s"]);
switch($name){
case "playlist":
$payload[$write] .=
'&type=playlist"><h2>Playlists</h2></a>';
break;
case "livestream":
$payload[$write] .=
'&feature=live"><h2>Livestreams</h2></a>';
break;
case "author":
$payload[$write] .=
'&type=channel"><h2>Authors</h2></a>';
break;
case "reel":
$payload[$write] .=
'&duration=short"><h2>Reels</h2></a>';
break;
}
$payload[$write] .=
'</div>' .
$categories[$name] .
'</div>' .
'<label class="spoiler-button" for="answer' . $i . '"></label></div>';
$i++;
}
if($i !== 1){
$payload["class"] = " has-answer";
}
if($results["npt"] !== null){
$payload["left"] .=
'<a href="' . $frontend->htmlnextpage($get, $results["npt"], "videos") . '" class="nextpage">Next page &gt;</a>';
}
echo $frontend->load("search.html", $payload);

496
web.php Normal file
View File

@ -0,0 +1,496 @@
<?php
/*
Initialize random shit
*/
include "lib/frontend.php";
$frontend = new frontend();
[$scraper, $filters] = $frontend->getscraperfilters("web");
$get = $frontend->parsegetfilters($_GET, $filters);
$frontend->loadheader(
$get,
$filters,
"web"
);
$payload = [
"class" => "",
"right-left" => "",
"right-right" => "",
"left" => ""
];
try{
$results = $scraper->web($get);
}catch(Exception $error){
echo
$frontend->drawerror(
"Shit",
'This scraper returned an error:' .
'<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' .
'Things you can try:' .
'<ul>' .
'<li>Use a different scraper</li>' .
'<li>Remove keywords that could cause errors</li>' .
'<li>Use another 4get instance</li>' .
'</ul><br>' .
'If the error persists, please <a href="/about">contact the administrator</a>.'
);
die();
}
$answerlen = 0;
/*
Spelling checker
*/
if($results["spelling"]["type"] != "no_correction"){
switch($results["spelling"]["type"]){
case "including":
$type = "Including results for";
break;
case "not_many":
$type = "Not many results contains";
break;
}
$payload["left"] .=
'<div class="infobox">' .
$type . ' <b>' . htmlspecialchars($results["spelling"]["using"]) . '</b>.<br>' .
'Did you mean <a href="?s=' . urlencode($results["spelling"]["correction"]) . '">' . $results["spelling"]["correction"] . '</a>?' .
'</div>';
}
/*
Populate links
*/
if(count($results["web"]) === 0){
$payload["left"] .=
'<div class="infobox">' .
"<h1>Nobody here but us chickens!</h1>" .
'Have you tried:' .
'<ul>' .
'<li>Using a different scraper</li>' .
'<li>Using fewer keywords</li>' .
'<li>Defining broader filters (Is NSFW turned off?)</li>' .
'</ul>' .
'</div>';
}
foreach($results["web"] as $site){
$n = null;
if($site["date"] !== null){
$date = date("jS M y @ g:ia", $site["date"]);
}else{
$date = null;
}
$payload["left"] .= $frontend->drawtextresult($site, $date, $n, $get["s"]);
}
$right = [];
/*
Generate images
*/
if(count($results["image"]) !== 0){
$answerlen++;
$right["image"] =
'<div class="answer-wrapper">' .
'<input id="answer' . $answerlen . '" class="spoiler" type="checkbox">' .
'<div class="answer">' .
'<div class="answer-title">' .
'<a class="answer-title" href="/images?s=' . urlencode($get["s"]) . '"><h2>Images</h2></a>' .
'</div>' .
'<div class="images">';
foreach($results["image"] as $image){
$c = count($image["source"]) - 1;
if(
preg_match(
'/^data:/',
$image["source"][$c]["url"]
)
){
$src = htmlspecialchars($image["source"][$c]["url"]);
}else{
$src = "/proxy?i=" . urlencode($image["source"][$c]["url"]) . "&s=square";
}
$right["image"] .=
'<a class="image" href="' . htmlspecialchars($image["url"]) . '" rel="noreferrer nofollow" title="' . htmlspecialchars($image["title"]) . '" data-json="' . htmlspecialchars(json_encode($image["source"])) . '" tabindex="-1">' .
'<img src="' . $src . '" alt="thumb">' .
'<div class="duration">' . $image["source"][0]["width"] . 'x' . $image["source"][0]["height"] . '</div>' .
'</a>';
}
$right["image"] .=
'</div></div>' .
'<label class="spoiler-button" for="answer' . $answerlen . '"></label></div>';
}
/*
Generate videos
*/
if(count($results["video"]) !== 0){
$answerlen++;
$right["video"] =
'<div class="answer-wrapper">' .
'<input id="answer' . $answerlen . '" class="spoiler" type="checkbox">' .
'<div class="answer">' .
'<div class="answer-title">' .
'<a class="answer-title" href="/videos?s=' . urlencode($get["s"]) . '"><h2>Videos</h2></a>' .
'</div>';
foreach($results["video"] as $video){
if($video["views"] !== null){
$greentext = number_format($video["views"]) . " views";
}else{
$greentext = null;
}
if($video["date"] !== null){
if($greentext !== null){
$greentext .= "";
}
$greentext .= date("jS M y @ g:ia", $video["date"]);
}
if($video["duration"] !== null){
if($video["duration"] == "_LIVE"){
$duration = 'LIVE';
}else{
$duration = $frontend->s_to_timestamp($video["duration"]);
}
}else{
$duration = null;
}
$right["video"] .= $frontend->drawtextresult($video, $greentext, $duration, $get["s"], false);
}
$right["video"] .=
'</div>' .
'<label class="spoiler-button" for="answer' . $answerlen . '"></label></div>';
}
/*
Generate news
*/
if(count($results["news"]) !== 0){
$answerlen++;
$right["news"] =
'<div class="answer-wrapper">' .
'<input id="answer' . $answerlen . '" class="spoiler" type="checkbox">' .
'<div class="answer">' .
'<div class="answer-title">' .
'<a class="answer-title" href="/news?s=' . urlencode($get["s"]) . '"><h2>News</h2></a>' .
'</div>';
foreach($results["news"] as $news){
if($news["date"] !== null){
$greentext = date("jS M y @ g:ia", $news["date"]);
}else{
$greentext = null;
}
$right["news"] .= $frontend->drawtextresult($news, $greentext, null, $get["s"], false);
}
$right["news"] .=
'</div>' .
'<label class="spoiler-button" for="answer' . $answerlen . '"></label></div>';
}
/*
Generate answers
*/
if(count($results["answer"]) !== 0){
$right["answer"] = "";
foreach($results["answer"] as $answer){
$answerlen++;
$right["answer"] .=
'<div class="answer-wrapper">' .
'<input id="answer' . $answerlen . '" class="spoiler" type="checkbox">' .
'<div class="answer"><div class="wiki-head">';
if(!empty($answer["title"])){
$right["answer"] .=
'<div class="answer-title">';
if(!empty($answer["url"])){
$right["answer"] .= '<a class="answer-title" href="' . htmlspecialchars($answer["url"]) . '" rel="noreferrer nofollow">';
}
$right["answer"] .= '<h1>' . htmlspecialchars($answer["title"]) . '</h1>';
if(!empty($answer["url"])){
$right["answer"] .= '</a>';
}
$right["answer"] .= '</div>';
}
if(!empty($answer["url"])){
$right["answer"] .=
$frontend->drawlink($answer["url"]);
}
$right["answer"] .= '<div class="description">';
if(!empty($answer["thumb"])){
$right["answer"] .=
'<a href="' . htmlspecialchars($answer["thumb"]) . '" rel="noreferrer nofollow" class="photo">' .
'<img src="/proxy?i=' . urlencode($answer["thumb"]) . '&s=cover" alt="thumb" class="openimg">' .
'</a>';
}
foreach($answer["description"] as $description){
switch($description["type"]){
case "text":
$right["answer"] .= $frontend->highlighttext($get["s"], $description["value"]);
break;
case "title":
$right["answer"] .=
'<h2>' .
htmlspecialchars($description["value"]) .
'</h2>';
break;
case "italic":
$right["answer"] .=
'<i>' .
$frontend->highlighttext($get["s"], $description["value"]) .
'</i>';
break;
case "quote":
$right["answer"] .=
'<div class="quote">' .
$frontend->highlighttext($get["s"], $description["value"]) .
'</div>';
break;
case "code":
$right["answer"] .=
'<div class="code" tabindex="-1">' .
$frontend->highlightcode($description["value"], true) .
'</div>';
break;
case "inline_code":
$right["answer"] .=
'<div class="code-inline">' .
htmlspecialchars($description["value"]) .
'</div>';
break;
case "link":
$right["answer"] .=
'<a href="' . htmlspecialchars($description["url"]) . '" rel="noreferrer nofollow" class="underline" tabindex="-1">' . htmlspecialchars($description["value"]) . '</a>';
break;
case "image":
$right["answer"] .=
'<a href="' . htmlspecialchars($description["url"]) . '" rel="noreferrer nofollow" tabindex="-1"><img src="/proxy?i=' . urlencode($description["url"]) . '&s=thumb" alt="image" class="fullimg openimg"></a>';
break;
case "audio":
$right["answer"] .=
'<audio src="/audio?s=' . urlencode($description["url"]) . '" controls><a href="/audio.php?s=' . urlencode($description["url"]) . '">Listen to the pronunciation audio</a></audio>';
break;
}
}
$right["answer"] .= '</div>';
if(count($answer["table"]) !== 0){
$right["answer"] .= '<table>';
foreach($answer["table"] as $info => $value){
$right["answer"] .=
'<tr>' .
'<td>' . $info . '</td>' .
'<td>' . $value . '</td>' .
'</tr>';
}
$right["answer"] .= '</table>';
}
if(count($answer["sublink"]) !== 0){
$right["answer"] .= '<div class="socials">';
$icons = glob("static/icon/*");
foreach($answer["sublink"] as $website => $url){
$flag = false;
$icon = str_replace(" ", "", strtolower($website));
foreach($icons as $path){
if(pathinfo($path, PATHINFO_FILENAME) == $icon){
$flag = true;
break;
}
}
if($flag === false){
$icon = "website";
}
$right["answer"] .=
'<a href="' . htmlspecialchars($url) . '" rel="noreferrer nofollow" tabindex="-1">' .
'<div class="center">' .
'<img src="/static/icon/' . $icon . '.png" alt="icon">' .
'<div class="title">' . $website . '</div>' .
'</div>' .
'</a>';
}
$right["answer"] .= '</div>';
}
$right["answer"] .=
'</div></div>' .
'<label class="spoiler-button" for="answer' . $answerlen . '"></label></div>';
}
}
/*
Add right containers
*/
if(isset($right["answer"])){
if(count($right) >= 2){
$payload["right-right"] = $right["answer"];
unset($right["answer"]);
}
}
$c = 0;
foreach($right as $snippet){
if($c % 2 === 0){
$payload["right-left"] .= $snippet;
}else{
$payload["right-right"] .= $snippet;
}
$c++;
}
if($c !== 0){
$payload["class"] = " has-answer";
}
/*
Generate related searches
*/
$c = count($results["related"]);
if($c !== 0){
$payload["left"] .= '<h3>Related searches</h3><table class="related">';
$opentr = false;
for($i=0; $i<$c; $i++){
if(($i % 2) === 0){
$opentr = true;
$payload["left"] .= '<tr>';
}else{
$opentr = false;
}
$payload["left"] .=
'<td>' .
'<a href="/web?s=' .
urlencode($results["related"][$i]) . "&" .
$frontend->buildquery($get, true) .
'">' .
htmlspecialchars($results["related"][$i]) .
'</a>';
$payload["left"] .= '</td>';
if($opentr === false){
$payload["left"] .= '</tr>';
}
}
if($opentr === true){
$payload["left"] .= '<td></td></tr>';
}
$payload["left"] .= '</table>';
}
/*
Load next page
*/
if($results["npt"] !== null){
$payload["left"] .=
'<a href="' . $frontend->htmlnextpage($get, $results["npt"], "web") . '" class="nextpage">Next page &gt;</a>';
}
echo $frontend->load("search.html", $payload);