commit bca265aea67ec62499aaa113a6490ce9ec7fe730 Author: lolcat Date: Sat Jul 22 14:41:14 2023 -0400 still missing things on google scraper diff --git a/README.md b/README.md new file mode 100644 index 0000000..039d6a0 --- /dev/null +++ b/README.md @@ -0,0 +1,72 @@ +# 4get +4get is a metasearch engine that doesn't suck (they live in our walls!) + +## About 4get +https://4get.ca/about + +## Try it out +https://4get.ca + +# Setup +Login as root. + +```sh +apt install apache2 certbot php-dom php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-php python3-certbot-apache +service apache2 start +a2enmod rewrite +``` + +For all of the files in `/etc/apache2/sites-enabled/`, you must apply the following changes: +- Uncomment `ServerName` directive, put your domain name there +- Change `ServerAdmin` to your email +- Change `DocumentRoot` to `/var/www/html/4get` +- Change `ErrorLog` and `CustomLog` directives to log stuff out to `/dev/null/` + +Now open `/etc/apache2/apache2.conf` and change `ErrorLog` and `CustomLog` directives to have `/dev/null/` as a value + +This *should* disable logging completely, but I'm not 100% sure since I sort of had to troubleshoot alot of shit while writing this. So after we're done check if `/var/log/apache2/*` contains any personal info, and if it does, call me retarded trough email exchange. + +Blindly run the following shit + +```sh +cd /var/www/html +git clone https://git.lolcat.ca/lolcat/4get +cd 4get +mkdir icons +chmod 777 -R icons/ +``` + +Restart the service for good measure... `service apache2 restart` + +## Setup encryption +I'm schizoid (as you should) so I'm gonna setup 4096bit key encryption. To complete this step, you need a domain or subdomain in your possession. Make sure that the DNS shit for your domain has propagated properly before continuing, because certbot is a piece of shit that will error out the ass once you reach 5 attempts under an hour. + +```sh +certbot --apache --rsa-key-size 4096 -d www.yourdomain.com -d yourdomain.com +``` +When it asks to choose a vhost, choose the option with "HTTPS" listed. Don't setup HTTPS for tor, we don't need it (it doesn't even work anyways with let's encrypt) + +Edit `000-default-le-ssl.conf` + +Add this at the end: +```xml + + RewriteEngine On + RewriteCond %{REQUEST_FILENAME}.php -f + RewriteRule (.*) $1.php [L] + Options Indexes FollowSymLinks + AllowOverride All + Require all granted + +``` + +Now since this file is located in `/etc/apache2/sites-enabled/`, you must change all of the logging shit as to make it not log anything, like we did earlier. + +Restart again +```sh +service apache2 restart +``` + +You'll probably want to setup a tor address at this point, but I'm too lazy to put instructions here. + +Ok bye!!! diff --git a/about.php b/about.php new file mode 100644 index 0000000..fdc4812 --- /dev/null +++ b/about.php @@ -0,0 +1,130 @@ +' . + '' . + '' . + '' . + 'About' . + '' . + '' . + '' . + '' . + '' . + '' . + '' . + ''; + +$left = + '< Go back + +

Set as default search engine

+

On Firefox and other Gecko based browsers

+ To set this as your default search engine on Firefox, right click the URL bar and select
Add "4get"
. Then, visit about:preferences#search and select
4get
in the dropdown menu. + +

On Chromium and Blink based browsers

+ Right click the URL bar and click
Manage search engines and site search
, or visit chrome://settings/searchEngines. Then, create a new entry under
Search engines
and fill in the following details: + + + + + + + + + + + + + + + + + + +
FieldValue
Search engine4get
Shortcut4get.ca
URL with %s in place of queryhttps://4get.ca/web?q=%s
+ + Once that\'s done, click
Save
. Then, on the right handside of the newly created entry, open the dropdown menu and select
Make default
. + +

Other browsers

+ Get a real browser. + +

Frequently asked questions

+

What is this?

+ This is a metasearch engine that gets results from other engines, and strips away all of the tracking parameters and Microsoft/globohomo bullshit they add. Most of the other alternatives to Google jack themselves off about being ""privacy respecting"" or whatever the fuck but it always turns out to be a total lie, and I just got fed up with their shit honestly. Alternatives like Searx or YaCy all fucking sucks so I made my own thing. + +

My goal

+ Provide users with a privacy oriented, extremely lightweight, ad free, free as in freedom (and free beer!) way to search for documents around the internet, with minimal, optional javascript code. My long term goal would be to build my own index (that doesn\'t suck) and provide users with an unbiased search engine, with no political inclinations. + +

Do you keep logs?

+ I store data temporarly to get the next page of results. This might include search queries, tokens and other parameters. These parameters are encrypted using
aes-256-gcm
on the serber, for which I give you a key (also known internally as
npt
token). When you make a request to get the next page, you supply the token, the data is decrypted and the request is fulfilled. This encrypted data is deleted after 7 minutes, or after it\'s used, whichever comes first.

+ + I don\'t log IP addresses, user agents, or anything else. The
npt
tokens are the only thing that are stored (in RAM, mind you), temporarly, encrypted. + +

Do you share information with third parties?

+ Your search queries and supplied filters are shared with the scraper you chose (so I can get the search results, duh). I don\'t share anything else (that means I don\'t share your IP address, location, or anything of this kind). There is no way that site can know you\'re the one searching for something, unless you send out a search query that de-anonymises you. For example, a search query like "hello my full legal name is jonathan gallindo and i want pictures of cloacas" would definitively blow your cover. 4get doesn\'t contain ads or any third party javascript applets or trackers. I don\'t profile you, and quite frankly, I don\'t give a shit about what you search on there.

+ + TL;DR assume those websites can see what you search for, but can\'t see who you are (unless you\'re really dumb). + +

Where is this website hosted?

+ This website is hosted on a Contabo shitbox in the United States. + +

Keyboard shortcuts?

+ Use
/
to focus the search box.

+ + When the image viewer is open, you can use the following keybinds:
+
Up
,
Down
,
Left
,
Right
to rotate the image.
+
CTRL+Up
,
CTRL+Down
,
CTRL+Left
,
CTRL+Right
to mirror the image.
+
Escape
to exit the image viewer. + +

Instances

+ 4get is open source, anyone can create their own 4get instance! If you wish to add your website to this list, please contact me. + + + + + + + + + + +
NameAddress
4get4get.ca(tor)
+ +

How can I trust you?

+ You just sort of have to take my word for it right now. If you\'d rather trust yourself instead of me (I believe in you!!), all of the code on this website is available trough my git page for you to host on your own machines. Just a reminder: if you\'re the sole user of your instance, it doesn\'t take immense brain power for Microshit to figure out you basically just switched IP addresses. Invite your friends to use your instance! + +

I want to report abuse or have erotic roleplay trough email

+ I don\'t know about that second part but if you want to talk to me, just drop me an email...

+ + Message to all DMCA enforcers: I don\'t host any of the content. Everything you see here is proxied trough my shitbox with no moderation. Please reach out to the people hosting the infringing content instead.

+ + Click here to contact me!

+ + + Valid W3C HTML 4.01 + '; + +// trim out whitespace +$left = explode("\n", $left); + +$out = ""; + +foreach($left as $line){ + + $out .= trim($line); +} + +echo + $frontend->load( + "search.html", + [ + "class" => "", + "right-left" => "", + "right-right" => "", + "left" => $out + ] + ); diff --git a/api.txt b/api.txt new file mode 100644 index 0000000..d63269f --- /dev/null +++ b/api.txt @@ -0,0 +1,289 @@ + __ __ __ + / // / ____ ____ / /_ + / // /_/ __ `/ _ \/ __/ + /__ __/ /_/ / __/ /_ + /_/ \__, /\___/\__/ + /____/ + + + Welcome to the 4get API documentation + + ++ Terms of use + Do NOT misuse the API. Misuses can include... :: + + 1. Serp SEO scanning + 2. Intensive scraping + 3. Any other activity that isn't triggered by a human + 4. Illegal activities in Canada + 5. Constant "test" queries while developping your program + (please cache the API responses!) + + + Examples of good uses of the API :: + + 1. A chatroom bot that presents users with search results + 2. Personal use + 3. Any other activity that is initiated by a human + + + If you wish to engage in the activities listed under "misuses", feel + free to download the source code of the project and running 4get + under your own terms. Please respect the terms of use listed here so + that this website may be available to all in the far future. + + Get your instance running here :: + https://git.lolcat.ca/lolcat/4get + + Thanks! + + ++ Decode the data + All payloads returned by the API are encoded in the JSON format. If + you don't know how to tackle the problem, maybe programming is not + for you. + + All of the endpoints use the GET method. + + ++ Check if an API call was successful + All API responses come with an array index named "status". If the + status is something else than the string "ok", something went wrong. + + The HTTP code will always be 200 as to not cause issues with CORS. + + ++ Get the next page of results + All API responses come with an array index named "nextpage". To get + the next page of results, you must make another API call with &npt. + + Example :: + + + First API call + /api/v1/web?s=higurashi + + + Second API call + /api/v1/web?npt=ddg1._rJ2hWmYSjpI2hsXWmYajJx < ... > + + You shouldn't specify the search term, only the &npt parameter + suffices. + + The first part of the token before the dot (ddg1) refers to an + array position on the serber's memory. The second part is an + encryption key used to decode the data at that position. This way, + it is impossible to supply invalid pagination data and it is + impossible for a 4get operator to peek at the private data of the + user after a request has been made. + + The tokens will expire as soon as they are used or after a 7 minutes + inactivity period, whichever comes first. + + ++ Beware of null values! + Most fields in the API responses can return "null". You don't need + to worry about unset values. + + ++ API Parameters + To construct a valid request, you can use the 4get web interface + to craft a valid request, and replace "/web" with "/api/v1/web". + + ++ "date" and "time" parameters + "date" always refer to a calendar date. + "time" always refer to the duration of some media. + + They are both integers that uses seconds as its unit. The "date" + parameter specifies the number of seconds that passed since January + 1st 1970. + + + ______ __ _ __ + / ____/___ ____/ /___ ____ (_)___ / /______ + / __/ / __ \/ __ / __ \/ __ \/ / __ \/ __/ ___/ + / /___/ / / / /_/ / /_/ / /_/ / / / / / /_(__ ) + /_____/_/ /_/\__,_/ .___/\____/_/_/ /_/\__/____/ + /_/ + ++ /api/v1/web + + &extendedsearch + When using the ddg(DuckDuckGo) scraper, you may make use of the + &extendedsearch parameter. If you need rich answer data from + additional sources like StackOverflow, music lyrics sites, etc., + you need to specify the value of (string)"true". + + The default value is "false" for API calls. + + + + Parse the "spelling" + The array index named "spelling" contains 3 indexes :: + + spelling: + type: "including" + using: "4chan" + correction: '"4cha"' + + + The "type" may be any of these 3 values. When rendering the + autocorrect text inside your application, it should look like + what follows right after the parameter value :: + + no_correction + including Including results for %using%. Did you mean + %correction%? + + not_many Not many results for %using%. Did you mean + %correction%? + + + As of right now, the "spelling" is only available on + "/api/v1/web". + + + + Parse the "answer" + The array index named "answer" may contain a list of multiple + answers. The array index "description" contains a linear list of + nodes that can help you construct rich formatted data inside of + your application. The structure is similar to the one below: + + answer: + 0: + title: "Higurashi" + description: + 0: + type: "text" + value: "Higurashi is a great show!" + 1: + type: "quote" + value: "Source: my ass" + + + Each "description" node contains an array index named "type". + Here is a list of them: + + text + + title + italic + + quote + + code + inline_code + link + + image + + audio + + + Each individual node prepended with a "+" should be prepended by + a newline when constructing the rendered description object. + + There are some nodes that differ from the type-value format. + Please parse them accordingly :: + + + link + type: "link" + url: "https://lolcat.ca" + value: "Visit my website!" + + + + image + type: "image" + url: "https://lolcat.ca/static/pixels.png" + + + + audio + type: "audio" + url: "https://lolcat.ca/static/whatever.mp3" + + + The array index named "table" is an associative array. You can + loop over the data using this PHP code, for example :: + + foreach($table as $website_name => $url){ // ... + + + The rest of the JSON is pretty self explanatory. + + ++ /api/v1/images + All images are contained within "image". The structure looks like + below :: + + image: + 0: + title: "My awesome Higurashi image" + source: + 0: + url: "https://lolcat.ca/static/profile_pix.png" + width: 400 + height: 400 + 1: + url: "https://lolcat.ca/static/pixels.png" + width: 640 + height: 640 + 2: + url: "https://tse1.mm.bing.net/th?id=OIP.VBM3BQg + euf0-xScO1bl1UgHaGG" + width: 194 + height: 160 + + + The last image of the "source" array is always the thumbnail, and is + a good fallback to use when other sources fail to load. There can be + more than 1 source; this is especially true when using the Yandex + scraper, but beware of captcha rate limits. + + ++ /api/v1/videos + The "time" parameter for videos may be set to "_LIVE". For live + streams, the amount of people currently watching is passed in + "views". + + ++ /api/v1/news + Just make a request to "/api/v1/news?s=elon+musk". The payload + has nothing special about it and is very self explanatory, just like + the endpoint above. + + ++ /favicon + Get the favicon for a website. The only parameter is "s", and must + include the protocol. + + Example :: + + /favicon?s=https://lolcat.ca + + + If we had to revert to using Google's favicon cache, it will throw + an error in the X-Error header field. If Google's favicon cache + also failed to return an image, or if you're too retarded to specify + a valid domain name, a default placeholder image will be returned + alongside the "404" HTTP error code. + + ++ /proxy + Get a proxied image. Useful if you don't want to leak your user's IP + address. The parameters are "i" for the image link and "s" for the + size. + + Acceptable "s" parameters: + + portrait 90x160 + landscape 160x90 + square 90x90 + thumb 236x180 + cover 207x270 + original + + You can also ommit the "s" parameter if you wish to view the + original image. When an error occurs, an "X-Error" header field + is set. + + ++ /audio + Get a proxied audio file. Does not support "Range" headers, as it's + only used to proxy small files. + + The parameter is "s" for the audio link. + + ++ Appendix + If you have any questions or need clarifications, please send an + email my way to will at lolcat.ca diff --git a/api/index.php b/api/index.php new file mode 100644 index 0000000..dae86ab --- /dev/null +++ b/api/index.php @@ -0,0 +1,10 @@ + "Unknown endpoint" + ] +); diff --git a/api/v1/images.php b/api/v1/images.php new file mode 100644 index 0000000..e05ba26 --- /dev/null +++ b/api/v1/images.php @@ -0,0 +1,25 @@ +getscraperfilters( + "images", + isset($_GET["scraper"]) ? $_GET["scraper"] : null +); + +$get = $frontend->parsegetfilters($_GET, $filters); + +try{ + echo json_encode( + $scraper->image($get) + ); + +}catch(Exception $e){ + + echo json_encode(["status" => $e->getMessage()]); +} diff --git a/api/v1/index.php b/api/v1/index.php new file mode 100644 index 0000000..dae86ab --- /dev/null +++ b/api/v1/index.php @@ -0,0 +1,10 @@ + "Unknown endpoint" + ] +); diff --git a/api/v1/news.php b/api/v1/news.php new file mode 100644 index 0000000..7e24247 --- /dev/null +++ b/api/v1/news.php @@ -0,0 +1,25 @@ +getscraperfilters( + "news", + isset($_GET["scraper"]) ? $_GET["scraper"] : null +); + +$get = $frontend->parsegetfilters($_GET, $filters); + +try{ + echo json_encode( + $scraper->news($get) + ); + +}catch(Exception $e){ + + echo json_encode(["status" => $e->getMessage()]); +} diff --git a/api/v1/videos.php b/api/v1/videos.php new file mode 100644 index 0000000..60c105a --- /dev/null +++ b/api/v1/videos.php @@ -0,0 +1,25 @@ +getscraperfilters( + "videos", + isset($_GET["scraper"]) ? $_GET["scraper"] : null +); + +$get = $frontend->parsegetfilters($_GET, $filters); + +try{ + echo json_encode( + $scraper->video($get) + ); + +}catch(Exception $e){ + + echo json_encode(["status" => $e->getMessage()]); +} diff --git a/api/v1/web.php b/api/v1/web.php new file mode 100644 index 0000000..7895183 --- /dev/null +++ b/api/v1/web.php @@ -0,0 +1,30 @@ +getscraperfilters( + "web", + isset($_GET["scraper"]) ? $_GET["scraper"] : null +); + +$get = $frontend->parsegetfilters($_GET, $filters); + +if(!isset($_GET["extendedsearch"])){ + + $get["extendedsearch"] = "no"; +} + +try{ + echo json_encode( + $scraper->web($get) + ); + +}catch(Exception $e){ + + echo json_encode(["status" => $e->getMessage()]); +} diff --git a/audio.php b/audio.php new file mode 100644 index 0000000..bb018da --- /dev/null +++ b/audio.php @@ -0,0 +1,19 @@ +stream_linear_audio($_GET["s"]); +}catch(Exception $error){ + + header("X-Error: " . $error->getMessage()); +} diff --git a/banner/aves.png b/banner/aves.png new file mode 100644 index 0000000..ace604f Binary files /dev/null and b/banner/aves.png differ diff --git a/banner/aves_2.png b/banner/aves_2.png new file mode 100644 index 0000000..c78839f Binary files /dev/null and b/banner/aves_2.png differ diff --git a/banner/bibblebop.png b/banner/bibblebop.png new file mode 100644 index 0000000..0c061e0 Binary files /dev/null and b/banner/bibblebop.png differ diff --git a/banner/birds birds birdsw_4.jpg b/banner/birds birds birdsw_4.jpg new file mode 100644 index 0000000..ba7d637 Binary files /dev/null and b/banner/birds birds birdsw_4.jpg differ diff --git a/banner/birds_birds_birdsw.jpg b/banner/birds_birds_birdsw.jpg new file mode 100644 index 0000000..ff04b23 Binary files /dev/null and b/banner/birds_birds_birdsw.jpg differ diff --git a/banner/birds_birds_birdsw_2.jpg b/banner/birds_birds_birdsw_2.jpg new file mode 100644 index 0000000..dcd6125 Binary files /dev/null and b/banner/birds_birds_birdsw_2.jpg differ diff --git a/banner/birds_birds_birdsw_3.jpg b/banner/birds_birds_birdsw_3.jpg new file mode 100644 index 0000000..1446207 Binary files /dev/null and b/banner/birds_birds_birdsw_3.jpg differ diff --git a/banner/deek.png b/banner/deek.png new file mode 100644 index 0000000..ef80354 Binary files /dev/null and b/banner/deek.png differ diff --git a/banner/deekchat.gif b/banner/deekchat.gif new file mode 100644 index 0000000..bba01da Binary files /dev/null and b/banner/deekchat.gif differ diff --git a/banner/eagle.png b/banner/eagle.png new file mode 100644 index 0000000..f074341 Binary files /dev/null and b/banner/eagle.png differ diff --git a/banner/eagle2.png b/banner/eagle2.png new file mode 100644 index 0000000..175366b Binary files /dev/null and b/banner/eagle2.png differ diff --git a/banner/eagle3.jpg b/banner/eagle3.jpg new file mode 100644 index 0000000..1e65b59 Binary files /dev/null and b/banner/eagle3.jpg differ diff --git a/banner/eddd_1.png b/banner/eddd_1.png new file mode 100644 index 0000000..fab460b Binary files /dev/null and b/banner/eddd_1.png differ diff --git a/banner/eddd_2.png b/banner/eddd_2.png new file mode 100644 index 0000000..5ce4c2c Binary files /dev/null and b/banner/eddd_2.png differ diff --git a/banner/eddd_3.png b/banner/eddd_3.png new file mode 100644 index 0000000..b4ca48d Binary files /dev/null and b/banner/eddd_3.png differ diff --git a/banner/gnuwu.png b/banner/gnuwu.png new file mode 100644 index 0000000..634b59d Binary files /dev/null and b/banner/gnuwu.png differ diff --git a/banner/gnuwu_2.png b/banner/gnuwu_2.png new file mode 100644 index 0000000..493a6d9 Binary files /dev/null and b/banner/gnuwu_2.png differ diff --git a/banner/horse.png b/banner/horse.png new file mode 100644 index 0000000..0075a9c Binary files /dev/null and b/banner/horse.png differ diff --git a/banner/linucks.jpg b/banner/linucks.jpg new file mode 100644 index 0000000..8874451 Binary files /dev/null and b/banner/linucks.jpg differ diff --git a/banner/real_nig_3.jpg b/banner/real_nig_3.jpg new file mode 100644 index 0000000..8091146 Binary files /dev/null and b/banner/real_nig_3.jpg differ diff --git a/banner/sec.png b/banner/sec.png new file mode 100644 index 0000000..3c1a49e Binary files /dev/null and b/banner/sec.png differ diff --git a/banner/tagmachine.png b/banner/tagmachine.png new file mode 100644 index 0000000..c8b82a0 Binary files /dev/null and b/banner/tagmachine.png differ diff --git a/favicon.ico b/favicon.ico new file mode 100644 index 0000000..e5c1fbc Binary files /dev/null and b/favicon.ico differ diff --git a/favicon.php b/favicon.php new file mode 100644 index 0000000..dadb923 --- /dev/null +++ b/favicon.php @@ -0,0 +1,362 @@ +defaulticon(); + } + + $filename = str_replace(["https://", "http://"], "", $url); + header("Content-Disposition: inline; filename=\"{$filename}.png\""); + + include "lib/curlproxy.php"; + $this->proxy = new proxy(false); + + $this->filename = parse_url($url, PHP_URL_HOST); + + /* + Check if we have the favicon stored locally + */ + if(file_exists("icons/" . $filename . ".png")){ + + $handle = fopen("icons/" . $filename . ".png", "r"); + echo fread($handle, filesize("icons/" . $filename . ".png")); + fclose($handle); + return; + } + + /* + Scrape html + */ + try{ + + $payload = $this->proxy->get($url, $this->proxy::req_web, true); + + }catch(Exception $error){ + + header("X-Error: Could not fetch HTML (" . $error->getMessage() . ")"); + $this->favicon404(); + } + //$payload["body"] = ''; + + // get link tags + preg_match_all( + '/< *link +(.*)[\/]?>/Uixs', + $payload["body"], + $linktags + ); + + /* + Get relevant tags + */ + + $linktags = $linktags[1]; + $attributes = []; + + /* + header("Content-Type: text/plain"); + print_r($linktags); + print_r($payload); + die();*/ + + for($i=0; $i $tags[1][$k], + "value" => trim($tags[2][$k], "\" \n\r\t\v\x00") + ]; + } + } + + unset($payload); + unset($linktags); + + $href = []; + + // filter out the tags we want + foreach($attributes as &$group){ + + $tmp_href = null; + $tmp_rel = null; + $badtype = false; + + foreach($group as &$attribute){ + + switch($attribute["name"]){ + + case "rel": + + $attribute["value"] = strtolower($attribute["value"]); + + if( + ( + $attribute["value"] == "icon" || + $attribute["value"] == "manifest" || + $attribute["value"] == "shortcut icon" || + $attribute["value"] == "apple-touch-icon" || + $attribute["value"] == "mask-icon" + ) === false + ){ + + break; + } + + $tmp_rel = $attribute["value"]; + break; + + case "type": + $attribute["value"] = explode("/", $attribute["value"], 2); + + if(strtolower($attribute["value"][0]) != "image"){ + + $badtype = true; + break; + } + break; + + case "href": + + // must not contain invalid characters + // must be bigger than 1 + if( + filter_var($attribute["value"], FILTER_SANITIZE_URL) == $attribute["value"] && + strlen($attribute["value"]) > 0 + ){ + + $tmp_href = $attribute["value"]; + break; + } + break; + } + } + + if( + $badtype === false && + $tmp_rel !== null && + $tmp_href !== null + ){ + + $href[$tmp_rel] = $tmp_href; + } + } + + /* + Priority list + */ + /* + header("Content-Type: text/plain"); + print_r($href); + die();*/ + + if(isset($href["icon"])){ $href = $href["icon"]; } + elseif(isset($href["apple-touch-icon"])){ $href = $href["apple-touch-icon"]; } + elseif(isset($href["manifest"])){ + + // attempt to parse manifest, but fallback to [] + $href = $this->parsemanifest($href["manifest"], $url); + } + + if(is_array($href)){ + + if(isset($href["mask-icon"])){ $href = $href["mask-icon"]; } + elseif(isset($href["shortcut icon"])){ $href = $href["shortcut icon"]; } + else{ + + $href = "/favicon.ico"; + } + } + + $href = $this->proxy->getabsoluteurl($href, $url); + /* + header("Content-type: text/plain"); + echo $href; + die();*/ + + + /* + Download the favicon + */ + //$href = "https://git.lolcat.ca/assets/img/logo.svg"; + + try{ + $payload = + $this->proxy->get( + $href, + $this->proxy::req_image, + true, + $url + ); + + }catch(Exception $error){ + + header("X-Error: Could not fetch the favicon (" . $error->getMessage() . ")"); + $this->favicon404(); + } + + /* + Parse the file format + */ + $image = null; + $format = $this->proxy->getimageformat($payload, $image); + + /* + Convert the image + */ + try{ + + /* + @todo: fix issues with avif+transparency + maybe using GD as fallback? + */ + if($format !== false){ + $image->setFormat($format); + } + + $image->setBackgroundColor(new ImagickPixel("transparent")); + $image->readImageBlob($payload["body"]); + $image->resizeImage(16, 16, imagick::FILTER_LANCZOS, 1); + $image->setFormat("png"); + + $image = $image->getImageBlob(); + + // save favicon + $handle = fopen("icons/" . $this->filename . ".png", "w"); + fwrite($handle, $image, strlen($image)); + fclose($handle); + + echo $image; + + }catch(ImagickException $error){ + + header("X-Error: Could not convert the favicon: (" . $error->getMessage() . ")"); + $this->favicon404(); + } + + return; + } + + private function parsemanifest($href, $url){ + + if( + // check if base64-encoded JSON manifest + preg_match( + '/^data:application\/json;base64,([A-Za-z0-9=]*)$/', + $href, + $json + ) + ){ + + $json = base64_decode($json[1]); + + if($json === false){ + + // could not decode the manifest regex + return []; + } + + }else{ + + try{ + $json = + $this->proxy->get( + $this->proxy->getabsoluteurl($href, $url), + $this->proxy::req_web, + false, + $url + ); + + $json = $json["body"]; + + }catch(Exception $error){ + + // could not fetch the manifest + return []; + } + } + + $json = json_decode($json, true); + + if($json === null){ + + // manifest did not return valid json + return []; + } + + if( + isset($json["start_url"]) && + $this->proxy->validateurl($json["start_url"]) + ){ + + $url = $json["start_url"]; + } + + if(!isset($json["icons"][0]["src"])){ + + // manifest does not contain a path to the favicon + return []; + } + + // horay, return the favicon path + return $json["icons"][0]["src"]; + } + + private function favicon404(){ + + // fallback to google favicons + // ... probably blocked by cuckflare + try{ + + $image = + $this->proxy->get( + "https://t0.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=http://{$this->filename}&size=16", + $this->proxy::req_image + ); + }catch(Exception $error){ + + $this->defaulticon(); + } + + // write favicon from google + $handle = fopen("icons/" . $this->filename . ".png", "w"); + fwrite($handle, $image["body"], strlen($image["body"])); + fclose($handle); + + echo $image["body"]; + die(); + } + + private function defaulticon(){ + + // give 404 and fuck off + http_response_code(404); + + $handle = fopen("lib/favicon404.png", "r"); + echo fread($handle, filesize("lib/favicon404.png")); + fclose($handle); + + die(); + } +} diff --git a/icons/lolcat.ca.png b/icons/lolcat.ca.png new file mode 100644 index 0000000..c7e4785 Binary files /dev/null and b/icons/lolcat.ca.png differ diff --git a/images.php b/images.php new file mode 100644 index 0000000..67a50e8 --- /dev/null +++ b/images.php @@ -0,0 +1,99 @@ +getscraperfilters("images"); + +$get = $frontend->parsegetfilters($_GET, $filters); + +$frontend->loadheader( + $get, + $filters, + "images" +); + +$payload = [ + "images" => "", + "nextpage" => "" +]; + +try{ + $results = $scraper->image($get); + +}catch(Exception $error){ + + echo + $frontend->drawerror( + "Shit", + 'This scraper returned an error:' . + '
' . htmlspecialchars($error->getMessage()) . '
' . + 'Things you can try:' . + '
    ' . + '
  • Use a different scraper
  • ' . + '
  • Remove keywords that could cause errors
  • ' . + '
  • Use another 4get instance
  • ' . + '

' . + 'If the error persists, please contact the administrator.' + ); + die(); +} + +if(count($results["image"]) === 0){ + + $payload["images"] = + '
' . + "

Nobody here but us chickens!

" . + 'Have you tried:' . + '
    ' . + '
  • Using a different scraper
  • ' . + '
  • Using fewer keywords
  • ' . + '
  • Defining broader filters (Is NSFW turned off?)
  • ' . + '
' . + '
'; +} + +foreach($results["image"] as $image){ + + $domain = htmlspecialchars(parse_url($image["url"], PHP_URL_HOST)); + + $c = count($image["source"]) - 1; + + if( + preg_match( + '/^data:/', + $image["source"][$c]["url"] + ) + ){ + + $src = htmlspecialchars($image["source"][$c]["url"]); + }else{ + + $src = "/proxy?i=" . urlencode($image["source"][$c]["url"]) . "&s=thumb"; + } + + $payload["images"] .= + ''; +} + +if($results["npt"] !== null){ + + $payload["nextpage"] = + 'Next page >'; +} + +echo $frontend->load("images.html", $payload); diff --git a/index.php b/index.php new file mode 100644 index 0000000..be9897f --- /dev/null +++ b/index.php @@ -0,0 +1,14 @@ +load( + "home.html", + [ + "body_class" => $frontend->getthemeclass(false), + "banner" => $images[rand(0, count($images) - 1)] + ] +); diff --git a/lib/bingcache-todo-fix.php b/lib/bingcache-todo-fix.php new file mode 100644 index 0000000..a4acb5b --- /dev/null +++ b/lib/bingcache-todo-fix.php @@ -0,0 +1,144 @@ + + +new bingcache(); + +class bingcache{ + + public function __construct(){ + + if( + !isset($_GET["s"]) || + $this->validate_url($_GET["s"]) === false + ){ + + var_dump($this->validate_url($_GET["s"])); + $this->do404("Please provide a valid URL."); + } + + $url = $_GET["s"]; + + $curlproc = curl_init(); + + curl_setopt( + $curlproc, + CURLOPT_URL, + "https://www.bing.com/search?q=url%3A" . + urlencode($url) + ); + + curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding + curl_setopt( + $curlproc, + CURLOPT_HTTPHEADER, + ["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0", + "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", + "Accept-Language: en-US,en;q=0.5", + "Accept-Encoding: gzip", + "DNT: 1", + "Connection: keep-alive", + "Upgrade-Insecure-Requests: 1", + "Sec-Fetch-Dest: document", + "Sec-Fetch-Mode: navigate", + "Sec-Fetch-Site: none", + "Sec-Fetch-User: ?1"] + ); + + curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true); + curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2); + curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); + curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 5); + + $data = curl_exec($curlproc); + + if(curl_errno($curlproc)){ + + $this->do404("Failed to connect to bing servers. Please try again later."); + } + + curl_close($curlproc); + + preg_match( + '/
/', + $data, + $keys + ); + + print_r($keys); + + if(count($keys) === 0){ + + $this->do404("Bing has not archived this URL."); + } + + $keys = explode("|", $keys[1]); + $count = count($keys); + + //header("Location: https://cc.bingj.com/cache.aspx?d=" . $keys[$count - 2] . "&w=" . $keys[$count - 1]); + echo("Location: https://cc.bingj.com/cache.aspx?d=" . $keys[$count - 2] . "&w=" . $keys[$count - 1]); + } + + public function do404($text){ + + include "lib/frontend.php"; + $frontend = new frontend(); + + echo + $frontend->load( + "error.html", + [ + "title" => "Shit", + "text" => $text + ] + ); + + die(); + } + + public function validate_url($url){ + + $url_parts = parse_url($url); + + // check if required parts are there + if( + !isset($url_parts["scheme"]) || + !( + $url_parts["scheme"] == "http" || + $url_parts["scheme"] == "https" + ) || + !isset($url_parts["host"]) + ){ + return false; + } + + if( + // if its not an RFC-valid URL + !filter_var($url, FILTER_VALIDATE_URL) + ){ + return false; + } + + $ip = + str_replace( + ["[", "]"], // handle ipv6 + "", + $url_parts["host"] + ); + + // if its not an IP + if(!filter_var($ip, FILTER_VALIDATE_IP)){ + + // resolve domain's IP + $ip = gethostbyname($url_parts["host"] . "."); + } + + // check if its localhost + return filter_var( + $ip, + FILTER_VALIDATE_IP, FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE + ); + } +} diff --git a/lib/classic.png b/lib/classic.png new file mode 100644 index 0000000..3d2b8fc Binary files /dev/null and b/lib/classic.png differ diff --git a/lib/curlproxy.php b/lib/curlproxy.php new file mode 100644 index 0000000..846fbb7 --- /dev/null +++ b/lib/curlproxy.php @@ -0,0 +1,652 @@ +cache = $cache; + } + + public function do404(){ + + http_response_code(404); + header("Content-Type: image/png"); + + $handle = fopen("lib/img404.png", "r"); + echo fread($handle, filesize("lib/img404.png")); + fclose($handle); + + die(); + return; + } + + public function getabsoluteurl($path, $relative){ + + if($this->validateurl($path)){ + + return $path; + } + + if(substr($path, 0, 2) == "//"){ + + return "https:" . $path; + } + + $url = null; + + $relative = parse_url($relative); + $url = $relative["scheme"] . "://"; + + if( + isset($relative["user"]) && + isset($relative["pass"]) + ){ + + $url .= $relative["user"] . ":" . $relative["pass"] . "@"; + } + + $url .= $relative["host"]; + + if(isset($relative["path"])){ + + $relative["path"] = explode( + "/", + $relative["path"] + ); + + unset($relative["path"][count($relative["path"]) - 1]); + $relative["path"] = implode("/", $relative["path"]); + + $url .= $relative["path"]; + } + + if( + strlen($path) !== 0 && + $path[0] !== "/" + ){ + + $url .= "/"; + } + + $url .= $path; + + return $url; + } + + public function validateurl($url){ + + $url_parts = parse_url($url); + + // check if required parts are there + if( + !isset($url_parts["scheme"]) || + !( + $url_parts["scheme"] == "http" || + $url_parts["scheme"] == "https" + ) || + !isset($url_parts["host"]) + ){ + return false; + } + + $ip = + str_replace( + ["[", "]"], // handle ipv6 + "", + $url_parts["host"] + ); + + // if its not an IP + if(!filter_var($ip, FILTER_VALIDATE_IP)){ + + // resolve domain's IP + $ip = gethostbyname($url_parts["host"] . "."); + } + + // check if its localhost + if( + filter_var( + $ip, + FILTER_VALIDATE_IP, FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE + ) === false + ){ + + return false; + } + + return true; + } + + public function get($url, $reqtype = self::req_web, $acceptallcodes = false, $referer = null, $redirectcount = 0){ + + if($redirectcount === 5){ + + throw new Exception("Too many redirects"); + } + + // sanitize URL + try{ + + $this->validateurl($url); + }catch(Exception $error){ + + throw new Exception($error->getMessage()); + } + + $this->clientcache(); + + $curl = curl_init(); + + curl_setopt($curl, CURLOPT_URL, $url); + curl_setopt($curl, CURLOPT_ENCODING, ""); // default encoding + curl_setopt($curl, CURLOPT_HEADER, 1); + + switch($reqtype){ + case self::req_web: + curl_setopt( + $curl, + CURLOPT_HTTPHEADER, + [ + "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", + "Accept-Language: en-US,en;q=0.5", + "Accept-Encoding: gzip, deflate", + "DNT: 1", + "Connection: keep-alive", + "Upgrade-Insecure-Requests: 1", + "Sec-Fetch-Dest: document", + "Sec-Fetch-Mode: navigate", + "Sec-Fetch-Site: none", + "Sec-Fetch-User: ?1" + ] + ); + break; + + case self::req_image: + + if($referer === null){ + $referer = explode("/", $url, 4); + array_pop($referer); + + $referer = implode("/", $referer); + } + + curl_setopt( + $curl, + CURLOPT_HTTPHEADER, + [ + "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0", + "Accept: image/avif,image/webp,*/*", + "Accept-Language: en-US,en;q=0.5", + "Accept-Encoding: gzip, deflate", + "DNT: 1", + "Connection: keep-alive", + "Referer: {$referer}" + ] + ); + break; + } + + curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); + curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2); + curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, true); + curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 30); + curl_setopt($curl, CURLOPT_TIMEOUT, 30); + + // limit size of payloads + curl_setopt($curl, CURLOPT_BUFFERSIZE, 1024); + curl_setopt($curl, CURLOPT_NOPROGRESS, false); + curl_setopt( + $curl, + CURLOPT_PROGRESSFUNCTION, + function($downloadsize, $downloaded, $uploadsize, $uploaded + ){ + + // if $downloaded exceeds 100MB, fuck off + return ($downloaded > 100000000) ? 1 : 0; + }); + + $body = curl_exec($curl); + + if(curl_errno($curl)){ + + throw new Exception(curl_error($curl)); + } + + curl_close($curl); + + $headers = []; + $http = null; + + while(true){ + + $header = explode("\n", $body, 2); + $body = $header[1]; + + if($http === null){ + + // http/1.1 200 ok + $header = explode("/", $header[0], 2); + $header = explode(" ", $header[1], 3); + + $http = [ + "version" => (float)$header[0], + "code" => (int)$header[1] + ]; + + continue; + } + + if(trim($header[0]) == ""){ + + // reached end of headers + break; + } + + $header = explode(":", $header[0], 2); + + // malformed headers + if(count($header) !== 2){ continue; } + + $headers[strtolower(trim($header[0]))] = trim($header[1]); + } + + // check http code + if( + $http["code"] >= 300 && + $http["code"] <= 309 + ){ + + // redirect + if(!isset($headers["location"])){ + + throw new Exception("Broken redirect"); + } + + $redirectcount++; + + return $this->get($this->getabsoluteurl($headers["location"], $url), $reqtype, $acceptallcodes, $referer, $redirectcount); + }else{ + if( + $acceptallcodes === false && + $http["code"] > 300 + ){ + + throw new Exception("Remote server returned an error code! ({$http["code"]})"); + } + } + + // check if data is okay + switch($reqtype){ + + case self::req_image: + + $format = false; + + if(isset($headers["content-type"])){ + + if($headers["content-type"] == "text/html"){ + + throw new Exception("Server returned an html document instead of image"); + } + + $tmp = explode(";", $headers["content-type"]); + + for($i=0; $i $http, + "format" => $format, + "headers" => $headers, + "body" => $body + ]; + break; + + default: + + return [ + "http" => $http, + "headers" => $headers, + "body" => $body + ]; + break; + } + + return; + } + + public function stream_linear_image($url, $referer = null){ + + $this->stream($url, $referer, "image"); + } + + public function stream_linear_audio($url, $referer = null){ + + $this->stream($url, $referer, "audio"); + } + + private function stream($url, $referer, $format){ + + $this->url = $url; + $this->format = $format; + + // sanitize URL + try{ + + $this->validateurl($url); + }catch(Exception $error){ + + throw new Exception($error->getMessage()); + } + + $this->clientcache(); + + $curl = curl_init(); + + // set headers + if($referer === null){ + $referer = explode("/", $url, 4); + array_pop($referer); + + $referer = implode("/", $referer); + } + + switch($format){ + + case "image": + curl_setopt( + $curl, + CURLOPT_HTTPHEADER, + [ + "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + "Accept: image/avif,image/webp,*/*", + "Accept-Language: en-US,en;q=0.5", + "Accept-Encoding: gzip, deflate, br", + "DNT: 1", + "Connection: keep-alive", + "Referer: {$referer}" + ] + ); + break; + + case "audio": + curl_setopt( + $curl, + CURLOPT_HTTPHEADER, + [ + "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + "Accept: audio/webm,audio/ogg,audio/wav,audio/*;q=0.9,application/ogg;q=0.7,video/*;q=0.6,*/*;q=0.5", + "Accept-Language: en-US,en;q=0.5", + "Accept-Encoding: gzip, deflate, br", + "DNT: 1", + "Connection: keep-alive", + "Referer: {$referer}" + ] + ); + break; + } + + // follow redirects + curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true); + curl_setopt($curl, CURLOPT_MAXREDIRS, 5); + curl_setopt($curl, CURLOPT_AUTOREFERER, 5); + + // set url + curl_setopt($curl, CURLOPT_URL, $url); + curl_setopt($curl, CURLOPT_ENCODING, ""); // default encoding + + // timeout + disable ssl + curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2); + curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, true); + curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10); + curl_setopt($curl, CURLOPT_TIMEOUT, 30); + + curl_setopt( + $curl, + CURLOPT_WRITEFUNCTION, + function($c, $data){ + + if(curl_getinfo($c, CURLINFO_HTTP_CODE) !== 200){ + + throw new Exception("Serber returned a non-200 code"); + } + + echo $data; + return strlen($data); + } + ); + + $this->empty_header = false; + $this->cont = false; + $this->headers_tmp = []; + $this->headers = []; + curl_setopt( + $curl, + CURLOPT_HEADERFUNCTION, + function($c, $header){ + + $head = trim($header); + $len = strlen($head); + + if($len === 0){ + + $this->empty_header = true; + $this->headers_tmp = []; + }else{ + + $this->empty_header = false; + $this->headers_tmp[] = $head; + } + + foreach($this->headers_tmp as $h){ + + // parse headers + $h = explode(":", $h, 2); + + if(count($h) !== 2){ + + if(curl_getinfo($c, CURLINFO_HTTP_CODE) !== 200){ + + // not HTTP 200, probably a redirect + $this->cont = false; + }else{ + + $this->cont = true; + } + + // is HTTP 200, just ignore that line + continue; + } + + $this->headers[strtolower(trim($h[0]))] = trim($h[1]); + } + + if( + $this->cont && + $this->empty_header + ){ + + // get content type + if(isset($this->headers["content-type"])){ + + $filetype = explode("/", $this->headers["content-type"]); + + if(strtolower($filetype[0]) != $this->format){ + + throw new Exception("Resource is not an {$this->format} (Found {$filetype[0]} instead)"); + } + + }else{ + + throw new Exception("Resource is not an {$this->format} (no Content-Type)"); + } + + header("Content-Type: {$this->format}/{$filetype[1]}"); + + // give payload size + if(isset($this->headers["content-length"])){ + + header("Content-Length: {$this->headers["content-length"]}"); + } + + // give filename + $this->getfilenameheader($this->headers, $this->url, $filetype[1]); + } + + return strlen($header); + } + ); + + curl_exec($curl); + + if(curl_errno($curl)){ + + throw new Exception(curl_error($curl)); + } + + curl_close($curl); + } + + public function getfilenameheader($headers, $url, $filetype = "jpg"){ + + // get filename from content-disposition header + if(isset($headers["content-disposition"])){ + + preg_match( + '/filename=([^;]+)/', + $headers["content-disposition"], + $filename + ); + + if(isset($filename[1])){ + + header("Content-Disposition: filename=" . $filename[1] . "." . $filetype); + return; + } + } + + // get filename from URL + $filename = parse_url($url, PHP_URL_PATH); + + if($filename === null){ + + // everything failed! rename file to domain name + header("Content-Disposition: filename=" . parse_url($url, PHP_URL_HOST) . "." . $filetype); + return; + } + + // remove extension from filename + $filename = + explode( + ".", + basename($filename) + ); + + if(count($filename) > 1){ + array_pop($filename); + } + + $filename = implode(".", $filename); + + header("Content-Disposition: inline; filename=" . $filename . "." . $filetype); + return; + } + + public function getimageformat($payload, &$imagick){ + + $finfo = new finfo(FILEINFO_MIME_TYPE); + $format = $finfo->buffer($payload["body"]); + + if($format === false){ + + if($payload["format"] === false){ + + header("X-Error: Could not parse format"); + $this->favicon404(); + } + + $format = $payload["format"]; + }else{ + + $format_tmp = explode("/", $format, 2); + + if($format_tmp[0] == "image"){ + + $format_tmp = strtolower($format_tmp[1]); + + if(substr($format_tmp, 0, 2) == "x-"){ + + $format_tmp = substr($format_tmp, 2); + } + + $format = $format_tmp; + } + } + + switch($format){ + + case "tiff": $format = "gif"; break; + case "vnd.microsoft.icon": $format = "ico"; break; + case "icon": $format = "ico"; break; + case "svg+xml": $format = "svg"; break; + } + + $imagick = new Imagick(); + + if( + !in_array( + $format, + array_map("strtolower", $imagick->queryFormats()) + ) + ){ + + // format could not be found, but imagemagick can + // sometimes detect it? shit's fucked + $format = false; + } + + return $format; + } + + public function clientcache(){ + + if($this->cache === false){ + + return; + } + + header("Last-Modified: Thu, 01 Oct 1970 00:00:00 GMT"); + $headers = getallheaders(); + + if( + isset($headers["If-Modified-Since"]) || + isset($headers["If-Unmodified-Since"]) + ){ + + http_response_code(304); // 304: Not Modified + die(); + } + } +} diff --git a/lib/favicon404.png b/lib/favicon404.png new file mode 100644 index 0000000..7540694 Binary files /dev/null and b/lib/favicon404.png differ diff --git a/lib/frontend.php b/lib/frontend.php new file mode 100644 index 0000000..3be912b --- /dev/null +++ b/lib/frontend.php @@ -0,0 +1,1282 @@ + $value){ + + $html = + str_replace( + "{%{$key}%}", + $value, + $html + ); + } + + return trim($html); + } + + public function getthemeclass($raw = true){ + + if( + isset($_COOKIE["theme"]) && + $_COOKIE["theme"] == "cream" + ){ + + $body_class = "theme-white "; + }else{ + + $body_class = ""; + } + + if( + $raw && + $body_class != "" + ){ + + return ' class="' . rtrim($body_class) . '"'; + } + + return $body_class; + } + + public function loadheader(array $get, array $filters, string $page){ + + echo + $this->load("header.html", [ + "title" => trim($get["s"] . " ({$page})"), + "description" => ucfirst($page) . ' search results for "' . htmlspecialchars($get["s"]) . '"', + "index" => "no", + "search" => htmlspecialchars($get["s"]), + "tabs" => $this->generatehtmltabs($page, $get["s"]), + "filters" => $this->generatehtmlfilters($filters, $get), + "body_class" => $this->getthemeclass() + ]); + + if( + preg_match( + '/bot|wget|curl|python-requests|scrapy|feedfetcher|go-http-client|ruby|universalfeedparser|yahoo\! slurp|spider|rss/i', + $_SERVER["HTTP_USER_AGENT"] + ) + ){ + + // bot detected !! + echo + $this->drawerror( + "Tshh, blocked!", + 'You were blocked from viewing this page. If you wish to scrape data from 4get, please consider running your own 4get instance or using the API.', + ); + die(); + } + } + + public function drawerror($title, $error){ + + return + $this->load("search.html", [ + "class" => "", + "right-left" => "", + "right-right" => "", + "left" => + '
' . + '

' . htmlspecialchars($title) . '

' . + $error . + '
' + ]); + } + + public function drawtextresult($site, $greentext = null, $duration = null, $keywords, $tabindex = true){ + + $payload = + '
'; + + // add favicon, link and archive links + $payload .= $this->drawlink($site["url"]); + + /* + Draw title + description + filetype + */ + $payload .= + '' . + 'thumb'; + + if($duration !== null){ + + $payload .= + '
' . + htmlspecialchars($duration) . + '
'; + } + + $payload .= + '
'; + } + + $payload .= + '
'; + + if( + isset($site["type"]) && + $site["type"] != "web" + ){ + + $payload .= '
' . strtoupper($site["type"]) . '
'; + } + + $payload .= + htmlspecialchars($site["title"]) . + '
'; + + if($greentext !== null){ + + $payload .= + '
' . + htmlspecialchars($greentext) . + '
'; + } + + if($site["description"] !== null){ + + $payload .= + '
' . + $this->highlighttext($keywords, $site["description"]) . + '
'; + } + + $payload .= '
'; + + /* + Sublinks + */ + if( + isset($site["sublink"]) && + !empty($site["sublink"]) + ){ + + usort($site["sublink"], function($a, $b){ + + return strlen($a["description"]) > strlen($b["description"]); + }); + + $payload .= + ''; + } + + if( + isset($site["table"]) && + !empty($site["table"]) + ){ + + $payload .= ''; + + foreach($site["table"] as $title => $value){ + + $payload .= + '' . + '' . + '' . + ''; + } + + $payload .= '
' . htmlspecialchars($title) . '' . htmlspecialchars($value) . '
'; + } + + return $payload . '
'; + } + + public function highlighttext($keywords, $text){ + + $text = htmlspecialchars($text); + + $keywords = explode(" ", $keywords); + $regex = []; + + foreach($keywords as $word){ + + $regex[] = "\b" . preg_quote($word, "/") . "\b"; + } + + $regex = "/" . implode("|", $regex) . "/i"; + + return + preg_replace( + $regex, + '${0}', + $text + ); + } + + function highlightcode($text){ + + // https://www.php.net/highlight_string + ini_set("highlight.comment", "c-comment"); + ini_set("highlight.default", "c-default"); + ini_set("highlight.html", "c-default"); + ini_set("highlight.keyword", "c-keyword"); + ini_set("highlight.string", "c-string"); + + $text = + trim( + preg_replace( + '/<\/span>$/', + "", // remove stray ending span because of the ', + ' ' + ], + [ + "\n", // replace
with newlines + " " // replace html entity to space + ], + str_replace( + [ + // leading \n<?php ", + "", + "" + ], + "", + highlight_string("', '', $text); + } + + return $text; + } + + public function drawlink($link){ + + /* + Add favicon + */ + $host = parse_url($link); + $esc = + explode( + ".", + $host["host"], + 2 + ); + + if( + count($esc) === 2 && + $esc[0] == "www" + ){ + + $esc = $esc[1]; + }else{ + + $esc = $esc[0]; + } + + $esc = substr($esc, 0, 2); + + $urlencode = urlencode($link); + + $payload = + '
' . + '' . + '
'; + + /* + Add archive links + */ + if( + $host["host"] == "boards.4chan.org" || + $host["host"] == "boards.4channel.org" + ){ + + $archives = []; + $path = explode("/", $host["path"]); + $count = count($path); + // /pol/thread/417568063/post-shitty-memes-if-you-want-to + + if($count !== 0){ + + $isboard = true; + + switch($path[1]){ + + case "con": + break; + + case "q": + $archives[] = "desuarchive.org"; + break; + + case "qa": + $archives[] = "desuarchive.org"; + break; + + case "qb": + $archives[] = "arch.b4k.co"; + break; + + case "trash": + $archives[] = "desuarchive.org"; + break; + + case "a": + $archives[] = "desuarchive.org"; + break; + + case "c": + $archives[] = "desuarchive.org"; + break; + + case "w": + break; + + case "m": + $archives[] = "desuarchive.org"; + break; + + case "cgl": + $archives[] = "desuarchive.org"; + $archives[] = "warosu.org"; + break; + + case "cm": + $archives[] = "boards.fireden.net"; + break; + + case "f": + $archives[] = "archive.4plebs.org"; + break; + + case "n": + break; + + case "jp": + $archives[] = "warosu.org"; + break; + + case "vt": + $archives[] = "warosu.org"; + break; + + case "v": + $archives[] = "boards.fireden.net"; + $archives[] = "arch.b4k.co"; + break; + + case "vg": + $archives[] = "boards.fireden.net"; + $archives[] = "arch.b4k.co"; + break; + + case "vm": + $archives[] = "arch.b4k.co"; + break; + + case "vmg": + $archives[] = "arch.b4k.co"; + break; + + case "vp": + $archives[] = "arch.b4k.co"; + break; + + case "vr": + $archives[] = "desuarchive.org"; + $archives[] = "warosu.org"; + break; + + case "vrpg": + $archives[] = "arch.b4k.co"; + break; + + case "vst": + $archives[] = "arch.b4k.co"; + break; + + case "co": + $archives[] = "desuarchive.org"; + break; + + case "g": + $archives[] = "desuarchive.org"; + $archives[] = "arch.b4k.co"; + break; + + case "tv": + $archives[] = "archive.4plebs.org"; + break; + + case "k": + $archives[] = "desuarchive.org"; + break; + + case "o": + $archives[] = "archive.4plebs.org"; + break; + + case "an": + $archives[] = "desuarchive.org"; + break; + + case "tg": + $archives[] = "desuarchive.org"; + $archives[] = "archive.4plebs.org"; + break; + + case "sp": + $archives[] = "archive.4plebs.org"; + break; + + case "xs": + $archives[] = "eientei.xyz"; + break; + + case "pw": + break; + + case "sci": + $archives[] = "boards.fireden.net"; + $archives[] = "warosu.org"; + $archives[] = "eientei.xyz"; + break; + + case "his": + $archives[] = "desuarchive.org"; + break; + + case "int": + $archives[] = "desuarchive.org"; + break; + + case "out": + break; + + case "toy": + break; + + case "i": + $archives[] = "archiveofsins.com"; + $archives[] = "eientei.xyz"; + break; + + case "po": + break; + + case "p": + break; + + case "ck": + $archives[] = "warosu.org"; + break; + + case "ic": + $archives[] = "boards.fireden.net"; + $archives[] = "warosu.org"; + break; + + case "wg": + break; + + case "lit": + $archives[] = "warosu.org"; + break; + + case "mu": + $archives[] = "desuarchive.org"; + break; + + case "fa": + $archives[] = "warosu.org"; + break; + + case "3": + $archives[] = "warosu.org"; + $archives[] = "eientei.xyz"; + break; + + case "gd": + break; + + case "diy": + $archives[] = "warosu.org"; + break; + + case "wsg": + $archives[] = "desuarchive.org"; + break; + + case "qst": + break; + + case "biz": + $archives[] = "warosu.org"; + break; + + case "trv": + $archives[] = "archive.4plebs.org"; + break; + + case "fit": + $archives[] = "desuarchive.org"; + break; + + case "x": + $archives[] = "archive.4plebs.org"; + break; + + case "adv": + $archives[] = "archive.4plebs.org"; + break; + + case "lgbt": + $archives[] = "archiveofsins.com"; + break; + + case "mlp": + $archives[] = "desuarchive.org"; + $archives[] = "arch.b4k.co"; + break; + + case "news": + break; + + case "wsr": + break; + + case "vip": + break; + + case "b": + $archives[] = "thebarchive.com"; + break; + + case "r9k": + $archives[] = "desuarchive.org"; + break; + + case "pol": + $archives[] = "archive.4plebs.org"; + break; + + case "bant": + $archives[] = "thebarchive.com"; + break; + + case "soc": + $archives[] = "archiveofsins.com"; + break; + + case "s4s": + $archives[] = "archive.4plebs.org"; + break; + + case "s": + $archives[] = "archiveofsins.com"; + break; + + case "hc": + $archives[] = "archiveofsins.com"; + break; + + case "hm": + $archives[] = "archiveofsins.com"; + break; + + case "h": + $archives[] = "archiveofsins.com"; + break; + + case "e": + break; + + case "u": + $archives[] = "archiveofsins.com"; + break; + + case "d": + $archives[] = "desuarchive.org"; + break; + + case "y": + $archives[] = "boards.fireden.net"; + break; + + case "t": + $archives[] = "archiveofsins.com"; + break; + + case "hr": + $archives[] = "archive.4plebs.org"; + break; + + case "gif": + break; + + case "aco": + $archives[] = "desuarchive.org"; + break; + + case "r": + $archives[] = "archiveofsins.com"; + break; + + default: + $isboard = false; + break; + } + + if($isboard === true){ + + $archives[] = "archived.moe"; + } + + $trail = ""; + + if( + isset($path[2]) && + isset($path[3]) && + $path[2] == "thread" + ){ + + $trail .= "/" . $path[1] . "/thread/" . $path[3]; + }elseif($isboard){ + + $trail = "/" . $path[1] . "/"; + } + + for($i=0; $i' . + '' . $archives[$i][0] . $archives[$i][1] . '' . + $archives[$i] . + ''; + } + } + } + + $payload .= + 'goGoogle cache' . + 'arArchive.org' . + 'arArchive.is' . + 'biBing cache' . + 'meMegalodon' . + '
'; + + /* + Draw link + */ + $parts = explode("/", $link); + $clickurl = ""; + + // remove trailing / + $c = count($parts) - 1; + if($parts[$c] == ""){ + + $parts[$c - 1] = $parts[$c - 1] . "/"; + unset($parts[$c]); + } + + // merge https://site together + $parts = [ + $parts[0] . $parts[1] . '//' . $parts[2], + ...array_slice($parts, 3, count($parts) - 1) + ]; + + $c = count($parts); + for($i=0; $i<$c; $i++){ + + if($i !== 0){ $clickurl .= "/"; } + + $clickurl .= $parts[$i]; + + if($i === $c - 1){ + + $parts[$i] = rtrim($parts[$i], "/"); + } + + $payload .= + '' . + htmlspecialchars(urldecode($parts[$i])) . + ''; + + if($i !== $c - 1){ + + $payload .= ''; + } + } + + return $payload . '
'; + } + + public function getscraperfilters($page){ + + $get_scraper = null; + + switch($page){ + + case "web": + $get_scraper = isset($_COOKIE["scraper_web"]) ? $_COOKIE["scraper_web"] : null; + break; + + case "images": + $get_scraper = isset($_COOKIE["scraper_images"]) ? $_COOKIE["scraper_images"] : null; + break; + + case "videos": + $get_scraper = isset($_COOKIE["scraper_videos"]) ? $_COOKIE["scraper_videos"] : null; + break; + + case "news": + $get_scraper = isset($_COOKIE["scraper_news"]) ? $_COOKIE["scraper_news"] : null; + break; + } + + if( + isset($_GET["scraper"]) && + is_string($_GET["scraper"]) + ){ + + $get_scraper = $_GET["scraper"]; + }else{ + + if( + isset($_GET["npt"]) && + is_string($_GET["npt"]) + ){ + + $get_scraper = explode(".", $_GET["npt"], 2)[0]; + + $get_scraper = + preg_replace( + '/[0-9]+$/', + "", + $get_scraper + ); + } + } + + // add search field + $filters = + [ + "s" => [ + "option" => "_SEARCH" + ] + ]; + + // define default scrapers + switch($page){ + + case "web": + $filters["scraper"] = [ + "display" => "Scraper", + "option" => [ + "ddg" => "DuckDuckGo", + "brave" => "Brave", + "google" => "Google", + "mojeek" => "Mojeek", + "marginalia" => "Marginalia", + "wiby" => "wiby" + ] + ]; + break; + + case "images": + $filters["scraper"] = [ + "display" => "Scraper", + "option" => [ + "ddg" => "DuckDuckGo", + "yandex" => "Yandex", + "google" => "Google" + ] + ]; + break; + + case "videos": + $filters["scraper"] = [ + "display" => "Scraper", + "option" => [ + "yt" => "YouTube", + "ddg" => "DuckDuckGo", + "google" => "Google" + ] + ]; + break; + + case "news": + $filters["scraper"] = [ + "display" => "Scraper", + "option" => [ + "ddg" => "DuckDuckGo", + "brave" => "Brave", + "google" => "Google", + "mojeek" => "Mojeek" + ] + ]; + break; + } + + // get scraper name from user input, or default out to preferred scraper + $scraper_out = null; + $first = true; + + foreach($filters["scraper"]["option"] as $scraper_name => $scraper_pretty){ + + if($first === true){ + + $first = $scraper_name; + } + + if($scraper_name == $get_scraper){ + + $scraper_out = $scraper_name; + } + } + + if($scraper_out === null){ + + $scraper_out = $first; + } + + switch($scraper_out){ + + case "ddg": + include "scraper/ddg.php"; + $lib = new ddg(); + break; + + case "brave": + include "scraper/brave.php"; + $lib = new brave(); + break; + + case "yt"; + include "scraper/youtube.php"; + $lib = new youtube(); + break; + + case "yandex": + include "scraper/yandex.php"; + $lib = new yandex(); + break; + + case "google": + include "scraper/google.php"; + $lib = new google(); + break; + + case "mojeek": + include "scraper/mojeek.php"; + $lib = new mojeek(); + break; + + case "marginalia": + include "scraper/marginalia.php"; + $lib = new marginalia(); + break; + + case "wiby": + include "scraper/wiby.php"; + $lib = new wiby(); + break; + } + + // set scraper on $_GET + $_GET["scraper"] = $scraper_out; + + // set nsfw on $_GET + if( + isset($_COOKIE["nsfw"]) && + !isset($_GET["nsfw"]) + ){ + + $_GET["nsfw"] = $_COOKIE["nsfw"]; + } + + return + [ + $lib, + array_merge_recursive( + $filters, + $lib->getfilters($page) + ) + ]; + } + + public function parsegetfilters($parameters, $whitelist){ + + $sanitized = []; + + // add npt token + if( + isset($parameters["npt"]) && + is_string($parameters["npt"]) + ){ + + $sanitized["npt"] = $parameters["npt"]; + }else{ + + $sanitized["npt"] = false; + } + + // we're iterating over $whitelist, so + // you can't polluate $sanitized with useless + // parameters + foreach($whitelist as $parameter => $value){ + + if(isset($parameters[$parameter])){ + + if(!is_string($parameters[$parameter])){ + + $sanitized[$parameter] = null; + continue; + } + + // parameter is already set, use that value + $sanitized[$parameter] = $parameters[$parameter]; + }else{ + + // parameter is not set, add it + if(is_string($value["option"])){ + + // special field: set default value manually + switch($value["option"]){ + + case "_DATE": + // no date set + $sanitized[$parameter] = false; + break; + + case "_SEARCH": + // no search set + $sanitized[$parameter] = ""; + break; + } + + }else{ + + // set a default value + $sanitized[$parameter] = array_keys($value["option"])[0]; + } + } + + // sanitize input + if(is_array($value["option"])){ + if( + !in_array( + $sanitized[$parameter], + $keys = array_keys($value["option"]) + ) + ){ + + $sanitized[$parameter] = $keys[0]; + } + }else{ + + // sanitize search & string + switch($value["option"]){ + + case "_DATE": + if($sanitized[$parameter] !== false){ + + $sanitized[$parameter] = strtotime($sanitized[$parameter]); + if($sanitized[$parameter] <= 0){ + + $sanitized[$parameter] = false; + } + } + break; + + case "_SEARCH": + + // get search string & bang + $sanitized[$parameter] = trim($sanitized[$parameter]); + $sanitized["bang"] = ""; + + if( + strlen($sanitized[$parameter]) !== 0 && + $sanitized[$parameter][0] == "!" + ){ + + $sanitized[$parameter] = explode(" ", $sanitized[$parameter], 2); + + $sanitized["bang"] = trim($sanitized[$parameter][0]); + + if(count($sanitized[$parameter]) === 2){ + + $sanitized[$parameter] = trim($sanitized[$parameter][1]); + }else{ + + $sanitized[$parameter] = ""; + } + + $sanitized["bang"] = ltrim($sanitized["bang"], "!"); + } + + $sanitized[$parameter] = ltrim($sanitized[$parameter], "! \n\r\t\v\x00"); + } + } + } + + // invert dates if needed + if( + isset($sanitized["older"]) && + isset($sanitized["newer"]) && + $sanitized["newer"] !== false && + $sanitized["older"] !== false && + $sanitized["newer"] > $sanitized["older"] + ){ + + // invert + [ + $sanitized["older"], + $sanitized["newer"] + ] = [ + $sanitized["newer"], + $sanitized["older"] + ]; + } + + return $sanitized; + } + + public function s_to_timestamp($seconds){ + + if(is_string($seconds)){ + + return "LIVE"; + } + + return ($seconds >= 60) ? ltrim(gmdate("H:i:s", $seconds), ":0") : gmdate("0:s", $seconds); + } + + public function generatehtmltabs($page, $query){ + + $html = null; + + foreach(["web", "images", "videos", "news"] as $type){ + + $html .= '' . ucfirst($type) . ''; + } + + return $html; + } + + public function generatehtmlfilters($filters, $params){ + + $html = null; + + foreach($filters as $filter_name => $filter_values){ + + if(!isset($filter_values["display"])){ + + continue; + } + + $output = true; + $tmp = + '
' . + '
' . htmlspecialchars($filter_values["display"]) . '
'; + + if(is_array($filter_values["option"])){ + + $tmp .= ''; + }else{ + + switch($filter_values["option"]){ + + case "_DATE": + $tmp .= ' $value){ + + if( + $value == null || + $value == false || + $key == "npt" || + $key == "extendedsearch" || + $value == "any" || + $value == "all" || + ( + $ommit === true && + $key == "s" + ) + ){ + + continue; + } + + $out[$key] = $value; + } + + return http_build_query($out); + } + + public function htmlnextpage($gets, $npt, $page){ + + $query = $this->buildquery($gets); + + return $page . "?" . $query . "&npt=" . $npt; + } +} diff --git a/lib/fuckhtml.php b/lib/fuckhtml.php new file mode 100644 index 0000000..8802511 --- /dev/null +++ b/lib/fuckhtml.php @@ -0,0 +1,361 @@ +load($html, $isfile); + } + } + + public function load($html, $isfile = false){ + + if(is_array($html)){ + + if(!isset($html["innerHTML"])){ + + throw new Exception("(load) Supplied array doesn't contain a innerHTML index"); + } + $html = $html["innerHTML"]; + } + + if($isfile){ + + $handle = fopen($html, "r"); + $fetch = fread($handle, filesize($html)); + fclose($handle); + + $this->html = $fetch; + }else{ + + $this->html = $html; + } + + $this->strlen = strlen($this->html); + } + + public function getElementsByTagName(string $tagname){ + + $out = []; + + /* + Scrape start of the tag. Example +
... + */ + + if($tagname == "*"){ + + $tagname = '[^\/<>\s]+'; + }else{ + + $tagname = preg_quote(strtolower($tagname)); + } + + preg_match_all( + '/<\s*(' . $tagname . ')(\s(?:[^>\'"]*|"[^"]*"|\'[^\']*\')+)?\s*>/i', + /* '/<\s*(' . $tagname . ')(\s[\S\s]*?)?>/i', */ + $this->html, + $starting_tags, + PREG_OFFSET_CAPTURE + ); + + for($i=0; $i strtolower($starting_tags[1][$i][0]), + "startPos" => $starting_tags[0][$i][1], + "endPos" => 0, + "startTag" => $starting_tags[0][$i][0], + "attributes" => $attributes, + "innerHTML" => null + ]; + } + + /* + Get innerHTML + */ + // get closing tag positions + preg_match_all( + '/<\s*\/\s*(' . $tagname . ')\s*>/i', + $this->html, + $regex_closing_tags, + PREG_OFFSET_CAPTURE + ); + + // merge opening and closing tags together + for($i=0; $i strtolower($regex_closing_tags[1][$i][0]), + "endTag" => $regex_closing_tags[0][$i][0], + "startPos" => $regex_closing_tags[0][$i][1] + ]; + } + + usort( + $out, + function($a, $b){ + + return $a["startPos"] > $b["startPos"]; + } + ); + + // computer the indent level for each element + $level = []; + $count = count($out); + + for($i=0; $i<$count; $i++){ + + if(!isset($level[$out[$i]["tagName"]])){ + + $level[$out[$i]["tagName"]] = 0; + } + + if(isset($out[$i]["startTag"])){ + + // encountered starting tag + $level[$out[$i]["tagName"]]++; + $out[$i]["level"] = $level[$out[$i]["tagName"]]; + }else{ + + // encountered closing tag + $out[$i]["level"] = $level[$out[$i]["tagName"]]; + $level[$out[$i]["tagName"]]--; + } + } + + // if the indent level is the same for a div, + // we encountered _THE_ closing tag + for($i=0; $i<$count; $i++){ + + if(!isset($out[$i]["startTag"])){ + + continue; + } + + for($k=$i; $k<$count; $k++){ + + if( + isset($out[$k]["endTag"]) && + $out[$i]["tagName"] == $out[$k]["tagName"] && + $out[$i]["level"] + === $out[$k]["level"] + ){ + + $startlen = strlen($out[$i]["startTag"]); + $endlen = strlen($out[$k]["endTag"]); + + $out[$i]["endPos"] = $out[$k]["startPos"] + $endlen; + + $out[$i]["innerHTML"] = + substr( + $this->html, + $out[$i]["startPos"] + $startlen, + $out[$k]["startPos"] - ($out[$i]["startPos"] + $startlen) + ); + + $out[$i]["outerHTML"] = + substr( + $this->html, + $out[$i]["startPos"], + $out[$k]["startPos"] - $out[$i]["startPos"] + $endlen + ); + + break; + } + } + } + + // filter out ending divs + for($i=0; $i<$count; $i++){ + + if(isset($out[$i]["endTag"])){ + + unset($out[$i]); + } + + unset($out[$i]["startTag"]); + } + + return array_values($out); + } + + public function getElementsByAttributeName(string $name, $collection = null){ + + if($collection === null){ + + $collection = $this->getElementsByTagName("*"); + }elseif(is_string($collection)){ + + $collection = $this->getElementsByTagName($collection); + } + + $return = []; + foreach($collection as $elem){ + + foreach($elem["attributes"] as $attrib_name => $attrib_value){ + + if($attrib_name == $name){ + + $return[] = $elem; + continue 2; + } + } + } + + return $return; + } + + public function getElementsByFuzzyAttributeValue(string $name, string $value, $collection = null){ + + $elems = $this->getElementsByAttributeName($name, $collection); + $value = explode(" ", $value); + + $return = []; + + foreach($elems as $elem){ + + foreach($elem["attributes"] as $attrib_name => $attrib_value){ + + $attrib_value = explode(" ", $attrib_value); + $ac = count($attrib_value); + $nc = count($value); + $cr = 0; + + for($i=0; $i<$nc; $i++){ + + for($k=0; $k<$ac; $k++){ + + if($value[$i] == $attrib_value[$k]){ + + $cr++; + } + } + } + + if($cr === $nc){ + + $return[] = $elem; + continue 2; + } + } + } + + return $return; + } + + public function getElementsByAttributeValue(string $name, string $value, $collection = null){ + + $elems = $this->getElementsByAttributeName($name, $collection); + + $return = []; + + foreach($elems as $elem){ + + foreach($elem["attributes"] as $attrib_name => $attrib_value){ + + if($attrib_value == $value){ + + $return[] = $elem; + continue 2; + } + } + } + + return $return; + } + + public function getElementById(string $idname, $collection = null){ + + $id = $this->getElementsByAttributeValue("id", $idname, $collection); + + if(count($id) !== 0){ + + return $id[0]; + } + + return false; + } + + public function getElementsByClassName(string $classname, $collection = null){ + + return $this->getElementsByFuzzyAttributeValue("class", $classname, $collection); + } + + public function getTextContent($html, $whitespace = false, $trim = true){ + + if(is_array($html)){ + + if(!isset($html["innerHTML"])){ + + throw new Exception("(getTextContent) Supplied array doesn't contain a innerHTML index"); + } + $html = $html["innerHTML"]; + } + + $html = + preg_split('/\n|<\/?br>/i', $html); + + $out = ""; + for($i=0; $i diff --git a/lib/img404.png b/lib/img404.png new file mode 100644 index 0000000..4549dee Binary files /dev/null and b/lib/img404.png differ diff --git a/lib/nextpage.php b/lib/nextpage.php new file mode 100644 index 0000000..a883e49 --- /dev/null +++ b/lib/nextpage.php @@ -0,0 +1,106 @@ +scraper = $scraper; + } + + public function store($payload, $page){ + + $page = $page[0]; + $password = random_bytes(256); // 2048 bit + $salt = random_bytes(16); + $key = hash_pbkdf2("sha512", $password, $salt, 20000, 32, true); + $iv = + random_bytes( + openssl_cipher_iv_length("aes-256-gcm") + ); + + $tag = ""; + $out = openssl_encrypt($payload, "aes-256-gcm", $key, OPENSSL_RAW_DATA, $iv, $tag, "", 16); + + $key = apcu_inc("key", 1); + + apcu_store( + $page . "." . + $this->scraper . + (string)($key), + gzdeflate($salt.$iv.$out.$tag), + 420 // cache information for 7 minutes blaze it + ); + + return + $this->scraper . $key . "." . + rtrim(strtr(base64_encode($password), '+/', '-_'), '='); + } + + public function get($npt, $page){ + + $page = $page[0]; + $explode = explode(".", $npt, 2); + + if(count($explode) !== 2){ + + throw new Exception("Malformed nextPageToken!"); + } + + $apcu = $page . "." . $explode[0]; + $key = $explode[1]; + + $payload = apcu_fetch($apcu); + + if($payload === false){ + + throw new Exception("The nextPageToken is invalid or has expired!"); + } + + $key = + base64_decode( + str_pad( + strtr($key, '-_', '+/'), + strlen($key) % 4, + '=', + STR_PAD_RIGHT + ) + ); + + $payload = gzinflate($payload); + + $key = + hash_pbkdf2( + "sha512", + $key, + substr($payload, 0, 16), // salt + 20000, + 32, + true + ); + $ivlen = openssl_cipher_iv_length("aes-256-gcm"); + + $payload = + openssl_decrypt( + substr( + $payload, + 16 + $ivlen, + -16 + ), + "aes-256-gcm", + $key, + OPENSSL_RAW_DATA, + substr($payload, 16, $ivlen), + substr($payload, -16) + ); + + if($payload === false){ + + throw new Exception("The nextPageToken is invalid or has expired!"); + } + + // remove the key after using + apcu_delete($apcu); + + return $payload; + } +} diff --git a/lib/type-todo.php b/lib/type-todo.php new file mode 100644 index 0000000..f813543 --- /dev/null +++ b/lib/type-todo.php @@ -0,0 +1,132 @@ + + public function type($get){ + + $search = $get["s"]; + $bang = $get["bang"]; + + if(empty($search)){ + + if(!empty($bang)){ + + // !youtube + $conn = pg_connect("host=localhost dbname=4get user=postgres password=postgres"); + + pg_prepare($conn, "bang_get", "SELECT bang,name FROM bangs WHERE bang LIKE $1 ORDER BY bang ASC LIMIT 8"); + $q = pg_execute($conn, "bang_get", ["$bang%"]); + + $results = []; + while($row = pg_fetch_array($q, null, PGSQL_ASSOC)){ + + $results[] = [ + "s" => "!" . $row["bang"], + "n" => $row["name"] + ]; + } + + return $results; + }else{ + + // everything is empty + // lets just return a bang list + return [ + [ + "s" => "!w", + "n" => "Wikipedia", + "u" => "https://en.wikipedia.org/wiki/Special:Search?search={%q%}" + ], + [ + "s" => "!4ch", + "n" => "4chan Board", + "u" => "https://find.4chan.org/?q={%q%}" + ], + [ + "s" => "!a", + "n" => "Amazon", + "u" => "https://www.amazon.com/s?k={%q%}" + ], + [ + "s" => "!e", + "n" => "eBay", + "u" => "https://www.ebay.com/sch/items/?_nkw={%q%}" + ], + [ + "s" => "!so", + "n" => "Stack Overflow", + "u" => "http://stackoverflow.com/search?q={%q%}" + ], + [ + "s" => "!gh", + "n" => "GitHub", + "u" => "https://github.com/search?utf8=%E2%9C%93&q={%q%}" + ], + [ + "s" => "!tw", + "n" => "Twitter", + "u" => "https://twitter.com/search?q={%q%}" + ], + [ + "s" => "!r", + "n" => "Reddit", + "u" => "https://www.reddit.com/search?q={%q%}" + ], + ]; + } + } + + // now we know search isnt empty + if(!empty($bang)){ + + // check if the bang exists + $conn = pg_connect("host=localhost dbname=4get user=postgres password=postgres"); + + pg_prepare($conn, "bang_get_single", "SELECT bang,name FROM bangs WHERE bang = $1 LIMIT 1"); + $q = pg_execute($conn, "bang_get_single", [$bang]); + + $row = pg_fetch_array($q, null, PGSQL_ASSOC); + + if(isset($row["bang"])){ + + $bang = "!$bang "; + }else{ + + $bang = ""; + } + } + + try{ + $res = $this->get( + "https://duckduckgo.com/ac/", + [ + "q" => strtolower($search) + ], + ddg::req_xhr + ); + + $res = json_decode($res, true); + + }catch(Exception $e){ + + throw new Exception("Failed to get /ac/"); + } + + $arr = []; + for($i=0; $i $res[$i]["phrase"] + ]; + }else{ + + $arr[] = [ + "s" => $bang . $res[$i]["phrase"], + "n" => $row["name"] + ]; + } + } + + return $arr; + } diff --git a/news.php b/news.php new file mode 100644 index 0000000..eb5817f --- /dev/null +++ b/news.php @@ -0,0 +1,96 @@ +getscraperfilters("news"); + +$get = $frontend->parsegetfilters($_GET, $filters); + +$frontend->loadheader( + $get, + $filters, + "news" +); + +$payload = [ + "class" => "", + "right-left" => "", + "right-right" => "", + "left" => "" +]; + +try{ + $results = $scraper->news($get); + +}catch(Exception $error){ + + echo + $frontend->drawerror( + "Shit", + 'This scraper returned an error:' . + '
' . htmlspecialchars($error->getMessage()) . '
' . + 'Things you can try:' . + '
    ' . + '
  • Use a different scraper
  • ' . + '
  • Remove keywords that could cause errors
  • ' . + '
  • Use another 4get instance
  • ' . + '

' . + 'If the error persists, please contact the administrator.' + ); + die(); +} + +/* + Populate links +*/ +if(count($results["news"]) === 0){ + + $payload["left"] = + '
' . + "

Nobody here but us chickens!

" . + 'Have you tried:' . + '
    ' . + '
  • Using a different scraper
  • ' . + '
  • Using fewer keywords
  • ' . + '
  • Defining broader filters (Is NSFW turned off?)
  • ' . + '
' . + '
'; +} + +foreach($results["news"] as $news){ + + $greentext = []; + + if($news["date"] !== null){ + + $greentext[] = date("jS M y @ g:ia", $news["date"]); + } + + if($news["author"] !== null){ + + $greentext[] = htmlspecialchars($news["author"]); + } + + if(count($greentext) !== 0){ + + $greentext = implode(" • ", $greentext); + }else{ + + $greentext = null; + } + + $n = null; + $payload["left"] .= $frontend->drawtextresult($news, $greentext, $n, $get["s"]); +} + +if($results["npt"] !== null){ + + $payload["left"] .= + 'Next page >'; +} + +echo $frontend->load("search.html", $payload); diff --git a/opensearch.xml b/opensearch.xml new file mode 100644 index 0000000..efce4b4 --- /dev/null +++ b/opensearch.xml @@ -0,0 +1,9 @@ + + +4get + +UTF-8 +https://4get.ca/favicon.ico + + + diff --git a/proxy.php b/proxy.php new file mode 100644 index 0000000..edefd77 --- /dev/null +++ b/proxy.php @@ -0,0 +1,130 @@ +do404(); + die(); +} + +try{ + + // original size request, stream file to browser + if( + !isset($_GET["s"]) || + $_GET["s"] == "original" + ){ + + $proxy->stream_linear_image($_GET["i"]); + die(); + } + + // bing request, ask bing to resize and stream to browser + if( + preg_match( + '/bing.net$/', + parse_url($_GET["i"], PHP_URL_HOST) + ) + ){ + + switch($_GET["s"]){ + + case "portrait": $req = "&w=50&h=90&p=0&qlt=99"; break; + case "landscape": $req = "&w=160&h=90&p=0&qlt=99"; break; + case "square": $req = "&w=90&h=90&p=0&qlt=99"; break; + case "thumb": $req = "&w=236&h=180&p=0&qlt=99"; break; + case "cover": $req = "&w=207&h=270&p=0&qlt=99"; break; + } + + $proxy->stream_linear_image($_GET["i"] . $req, "https://bing.net"); + die(); + } + + // resize image ourselves + $payload = $proxy->get($_GET["i"], $proxy::req_image, true); + + // get image format & set imagick + $image = null; + $format = $proxy->getimageformat($payload, $image); + + try{ + + if($format !== false){ + $image->setFormat($format); + } + + $image->readImageBlob($payload["body"]); + $image_width = $image->getImageWidth(); + $image_height = $image->getImageHeight(); + + switch($_GET["s"]){ + + case "portrait": + $width = 50; + $height = 90; + break; + + case "landscape": + $width = 160; + $height = 90; + break; + + case "square": + $width = 90; + $height = 90; + break; + + case "thumb": + $width = 236; + $height = 180; + break; + + case "cover": + $width = 207; + $height = 270; + break; + } + + $ratio = $image_width / $image_height; + + if($image_width > $width){ + + $image_width = $width; + $image_height = round($image_width / $ratio); + } + if($image_height > $height){ + + $ratio = $image_width / $image_height; + $image_height = $height; + $image_width = $image_height * $ratio; + } + + $image->setImageBackgroundColor(new ImagickPixel("#504945")); + $image->mergeImageLayers(Imagick::LAYERMETHOD_FLATTEN); + + $image->resizeImage($image_width, $image_height, Imagick::FILTER_LANCZOS, 1); + + $image->stripImage(); + $image->setFormat("jpeg"); + $image->setImageCompression(Imagick::COMPRESSION_JPEG2000); + + $proxy->getfilenameheader($payload["headers"], $_GET["i"]); + + header("Content-Type: image/jpeg"); + echo $image->getImageBlob(); + + }catch(ImagickException $error){ + + header("X-Error: Could not convert the image: (" . $error->getMessage() . ")"); + $proxy->do404(); + } + +}catch(Exception $error){ + + header("X-Error: " . $error->getMessage()); + $proxy->do404(); + die(); +} diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..3e608cc --- /dev/null +++ b/robots.txt @@ -0,0 +1,28 @@ +# When the robots.txt is sus + +# ⠀⠀⠀⡯⡯⡾⠝⠘⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢊⠘⡮⣣⠪⠢⡑⡌ +# ⠀⠀⠀⠟⠝⠈⠀⠀⠀⠡⠀⠠⢈⠠⢐⢠⢂⢔⣐⢄⡂⢔⠀⡁⢉⠸⢨⢑⠕⡌ +# ⠀⠀⡀⠁⠀⠀⠀⡀⢂⠡⠈⡔⣕⢮⣳⢯⣿⣻⣟⣯⣯⢷⣫⣆⡂⠀⠀⢐⠑⡌ +# ⢀⠠⠐⠈⠀⢀⢂⠢⡂⠕⡁⣝⢮⣳⢽⡽⣾⣻⣿⣯⡯⣟⣞⢾⢜⢆⠀⡀⠀⠪ +# ⣬⠂⠀⠀⢀⢂⢪⠨⢂⠥⣺⡪⣗⢗⣽⢽⡯⣿⣽⣷⢿⡽⡾⡽⣝⢎⠀⠀⠀⢡ +# ⣿⠀⠀⠀⢂⠢⢂⢥⢱⡹⣪⢞⡵⣻⡪⡯⡯⣟⡾⣿⣻⡽⣯⡻⣪⠧⠑⠀⠁⢐ +# ⣿⠀⠀⠀⠢⢑⠠⠑⠕⡝⡎⡗⡝⡎⣞⢽⡹⣕⢯⢻⠹⡹⢚⠝⡷⡽⡨⠀⠀⢔ +# ⣿⡯⠀⢈⠈⢄⠂⠂⠐⠀⠌⠠⢑⠱⡱⡱⡑⢔⠁⠀⡀⠐⠐⠐⡡⡹⣪⠀⠀⢘ +# ⣿⣽⠀⡀⡊⠀⠐⠨⠈⡁⠂⢈⠠⡱⡽⣷⡑⠁⠠⠑⠀⢉⢇⣤⢘⣪⢽⠀⢌⢎ +# ⣿⢾⠀⢌⠌⠀⡁⠢⠂⠐⡀⠀⢀⢳⢽⣽⡺⣨⢄⣑⢉⢃⢭⡲⣕⡭⣹⠠⢐⢗ +# ⣿⡗⠀⠢⠡⡱⡸⣔⢵⢱⢸⠈⠀⡪⣳⣳⢹⢜⡵⣱⢱⡱⣳⡹⣵⣻⢔⢅⢬⡷ +# ⣷⡇⡂⠡⡑⢕⢕⠕⡑⠡⢂⢊⢐⢕⡝⡮⡧⡳⣝⢴⡐⣁⠃⡫⡒⣕⢏⡮⣷⡟ +# ⣷⣻⣅⠑⢌⠢⠁⢐⠠⠑⡐⠐⠌⡪⠮⡫⠪⡪⡪⣺⢸⠰⠡⠠⠐⢱⠨⡪⡪⡰ +# ⣯⢷⣟⣇⡂⡂⡌⡀⠀⠁⡂⠅⠂⠀⡑⡄⢇⠇⢝⡨⡠⡁⢐⠠⢀⢪⡐⡜⡪⡊ +# ⣿⢽⡾⢹⡄⠕⡅⢇⠂⠑⣴⡬⣬⣬⣆⢮⣦⣷⣵⣷⡗⢃⢮⠱⡸⢰⢱⢸⢨⢌ +# ⣯⢯⣟⠸⣳⡅⠜⠔⡌⡐⠈⠻⠟⣿⢿⣿⣿⠿⡻⣃⠢⣱⡳⡱⡩⢢⠣⡃⠢⠁ +# ⡯⣟⣞⡇⡿⣽⡪⡘⡰⠨⢐⢀⠢⢢⢄⢤⣰⠼⡾⢕⢕⡵⣝⠎⢌⢪⠪⡘⡌⠀ +# ⡯⣳⠯⠚⢊⠡⡂⢂⠨⠊⠔⡑⠬⡸⣘⢬⢪⣪⡺⡼⣕⢯⢞⢕⢝⠎⢻⢼⣀⠀ +# ⠁⡂⠔⡁⡢⠣⢀⠢⠀⠅⠱⡐⡱⡘⡔⡕⡕⣲⡹⣎⡮⡏⡑⢜⢼⡱⢩⣗⣯⣟ +# ⢀⢂⢑⠀⡂⡃⠅⠊⢄⢑⠠⠑⢕⢕⢝⢮⢺⢕⢟⢮⢊⢢⢱⢄⠃⣇⣞⢞⣞⢾ +# ⢀⠢⡑⡀⢂⢊⠠⠁⡂⡐⠀⠅⡈⠪⠪⠪⠣⠫⠑⡁⢔⠕⣜⣜⢦⡰⡎⡯⡾⡽ + +User-agent: * +Disallow: +host: 4get.ca +sitemap: https://4get.ca/sitemap.xml diff --git a/scraper/brave.php b/scraper/brave.php new file mode 100644 index 0000000..4d48c33 --- /dev/null +++ b/scraper/brave.php @@ -0,0 +1,2287 @@ +bypasscaptcha($html, "yes", "ca");*/ + +class brave{ + + public function __construct(){ + + include "lib/fuckhtml.php"; + $this->fuckhtml = new fuckhtml(); + + include "lib/nextpage.php"; + $this->nextpage = new nextpage("brave"); + } + + public function getfilters($page){ + + switch($page){ + + case "web": + return [ + "country" => [ + "display" => "Country", + "option" => [ + "all" => "All Regions", + "ar" => "Argentina", + "au" => "Australia", + "at" => "Austria", + "be" => "Belgium", + "br" => "Brazil", + "ca" => "Canada", + "cl" => "Chile", + "cn" => "China", + "dk" => "Denmark", + "fi" => "Finland", + "fr" => "France", + "de" => "Germany", + "hk" => "Hong Kong", + "in" => "India", + "id" => "Indonesia", + "it" => "Italy", + "jp" => "Japan", + "kr" => "Korea", + "my" => "Malaysia", + "mx" => "Mexico", + "nl" => "Netherlands", + "nz" => "New Zealand", + "no" => "Norway", + "pl" => "Poland", + "pt" => "Portugal", + "ph" => "Philippines", + "ru" => "Russia", + "sa" => "Saudi Arabia", + "za" => "South Africa", + "es" => "Spain", + "se" => "Sweden", + "ch" => "Switzerland", + "tw" => "Taiwan", + "tr" => "Turkey", + "gb" => "United Kingdom", + "us" => "United States" + ] + ], + "nsfw" => [ + "display" => "NSFW", + "option" => [ + "yes" => "Yes", + "maybe" => "Maybe", + "no" => "No" + ] + ], + "newer" => [ + "display" => "Newer than", + "option" => "_DATE" + ], + "older" => [ + "display" => "Older than", + "option" => "_DATE" + ] + ]; + break; + + case "news": + return [ + "country" => [ + "display" => "Country", + "option" => [ + "all" => "All regions", + "ar" => "Argentina", + "au" => "Australia", + "at" => "Austria", + "be" => "Belgium", + "br" => "Brazil", + "ca" => "Canada", + "cl" => "Chile", + "cn" => "China", + "dk" => "Denmark", + "fi" => "Finland", + "fr" => "France", + "de" => "Germany", + "hk" => "Hong Kong", + "in" => "India", + "id" => "Indonesia", + "it" => "Italy", + "jp" => "Japan", + "kr" => "Korea", + "my" => "Malaysia", + "mx" => "Mexico", + "nl" => "Netherlands", + "nz" => "New Zealand", + "no" => "Norway", + "pl" => "Poland", + "pt" => "Portugal", + "ph" => "Philippines", + "ru" => "Russia", + "sa" => "Saudi Arabia", + "za" => "South Africa", + "es" => "Spain", + "se" => "Sweden", + "ch" => "Switzerland", + "tw" => "Taiwan", + "tr" => "Turkey", + "gb" => "United Kingdom", + "us" => "United States" + ] + ], + "nsfw" => [ + "display" => "NSFW", + "option" => [ + "yes" => "Yes", + "maybe" => "Maybe", + "no" => "No" + ] + ] + ]; + break; + } + } + + private function get($url, $get = [], $nsfw, $country/*, $is_post = false, $additional_cookies = null*/){ + + switch($nsfw){ + + case "yes": $nsfw = "off"; break; + case "maybe": $nsfw = "moderate"; break; + case "no": $nsfw = "strict"; break; + } + + //$cookie = "safesearch={$nsfw}; country={$country}; useLocation=0"; + /* + if($additional_cookies !== null){ + + $cookie = $additional_cookies . "; " . $cookie; + }*/ + + $headers = [ + "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/110.0", + "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", + "Accept-Language: en-US,en;q=0.5", + "Accept-Encoding: gzip", + "Cookie: safesearch={$nsfw}; country={$country}; useLocation=0; summarizer=0", + "DNT: 1", + "Connection: keep-alive", + "Upgrade-Insecure-Requests: 1", + "Sec-Fetch-Dest: document", + "Sec-Fetch-Mode: navigate", + "Sec-Fetch-Site: none", + "Sec-Fetch-User: ?1"//, + //"Content-Type: application/json" + ]; + + if($country == "any"){ + + $country = "all"; + } + + $curlproc = curl_init(); + + /*if($is_post){ + + curl_setopt($curlproc, CURLOPT_POST, true); + curl_setopt( + $curlproc, + CURLOPT_POSTFIELDS, + json_encode($get) + ); + + }else{ + */ + if($get !== []){ + $get = http_build_query($get); + $url .= "?" . $get; + } + //} + + curl_setopt($curlproc, CURLOPT_URL, $url); + + curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding + curl_setopt($curlproc, CURLOPT_HTTPHEADER, $headers); + + curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true); + curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2); + curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true); + curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30); + curl_setopt($curlproc, CURLOPT_TIMEOUT, 30); + + $data = curl_exec($curlproc); + + if(curl_errno($curlproc)){ + + throw new Exception(curl_error($curlproc)); + } + + curl_close($curlproc); + return $data; + } + + public function web($get){ + + if($get["npt"]){ + + // get next page data + $q = json_decode($this->nextpage->get($get["npt"], "web"), true); + + $search = $q["q"]; + $q["spellcheck"] = 0; + + $nsfw = $q["nsfw"]; + unset($q["nsfw"]); + + $country = $q["country"]; + unset($q["country"]); + + }else{ + + // get _GET data instead + $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + + if(strlen($search) > 2048){ + + throw new Exception("Search query is too long!"); + } + + $nsfw = $get["nsfw"]; + $country = $get["country"]; + $older = $get["older"]; + $newer = $get["newer"]; + + $q = [ + "q" => $search + ]; + + /* + Pass older/newer filters to brave + */ + if($newer !== false){ + + $newer = date("Y-m-d", $newer); + + if($older === false){ + + $older = date("Y-m-d", time()); + } + } + + if( + is_string($older) === false && + $older !== false + ){ + + $older = date("Y-m-d", $older); + + if($newer === false){ + + $newer = "1970-01-02"; + } + } + + if($older !== false){ + + $q["tf"] = "{$newer}to{$older}"; + } + } + /* + $handle = fopen("scraper/brave.html", "r"); + $html = fread($handle, filesize("scraper/brave.html")); + fclose($handle); + */ + try{ + $html = + $this->get( + "https://search.brave.com/search", + $q, + $nsfw, + $country + ); + + }catch(Exception $error){ + + throw new Exception("Could not fetch search page"); + } + + $out = [ + "status" => "ok", + "spelling" => [ + "type" => "no_correction", + "using" => null, + "correction" => null + ], + "npt" => null, + "answer" => [], + "web" => [], + "image" => [], + "video" => [], + "news" => [], + "related" => [] + ]; + + // load html + $this->fuckhtml->load($html); + + /* + Get next page "token" + */ + $nextpage = + $this->fuckhtml + ->getElementsByClassName( + "btn ml-15", + "a" + ); + + if(count($nextpage) !== 0){ + + preg_match( + '/offset=([0-9]+)/', + $this->fuckhtml->getTextContent($nextpage[0]["attributes"]["href"]), + $nextpage + ); + + $q["offset"] = (int)$nextpage[1]; + $q["nsfw"] = $nsfw; + $q["country"] = $country; + + $out["npt"] = + $this->nextpage->store( + json_encode($q), + "web" + ); + } + + /* + Get discussions (and append them to web results) + */ + + // they're loaded using javascript!! + $discussion = + $this->fuckhtml + ->getElementById( + "js-discussions", + "script" + ); + + if( + $discussion && + isset($discussion["attributes"]["data"]) + ){ + + $discussion = + json_decode( + $this->fuckhtml + ->getTextContent( + $discussion["attributes"]["data"] + ), + true + ); + + foreach($discussion["results"] as $result){ + + $data = [ + "title" => $this->titledots($result["title"]), + "description" => null, + "url" => $result["url"], + "date" => null, + "type" => "web", + "thumb" => [ + "url" => null, + "ratio" => null + ], + "sublink" => [], + "table" => [] + ]; + + // description + $data["description"] = + $this->limitstrlen( + $this->limitwhitespace( + $this->titledots( + $this->fuckhtml->getTextContent( + $result["description"] + ) + ) + ) + ); + + if($result["age"] != ""){ + $data["date"] = strtotime($result["age"]); + } + + // populate table + + if($result["data"]["num_answers"] != ""){ + $data["table"]["Replies"] = (int)$result["data"]["num_answers"]; + } + + if($result["data"]["score"] != ""){ + + $score = explode("|", $result["data"]["score"]); + + if(count($score) === 2){ + + $score = ((int)$score[1]) . " (" . trim($score[0]) . ")"; + }else{ + + $score = (int)$score[0]; + } + + $data["table"]["Votes"] = $score; + } + + if($result["thumbnail"] != ""){ + + $data["thumb"]["url"] = $result["thumbnail"]; + $data["thumb"]["ratio"] = "16:9"; + } + + $out["web"][] = $data; + } + } + + /* + Get related searches + */ + $faq = + $this->fuckhtml + ->getElementById("js-faq", "script"); + + if( + $faq && + isset($faq["attributes"]["data"]) + ){ + + $faq = + json_decode( + $this->fuckhtml + ->getTextContent( + $faq["attributes"]["data"] + ), + true + ); + + foreach($faq["items"] as $related){ + + $out["related"][] = $related["question"]; + } + } + + /* + Get spelling autocorrect + */ + $altered = + $this->fuckhtml + ->getElementById("altered-query", "div"); + + if($altered){ + + $this->fuckhtml->load($altered); + + $altered = + $this->fuckhtml + ->getElementsByTagName("a"); + + if(count($altered) === 2){ + + $out["spelling"] = [ + "type" => "including", + "using" => + $this->fuckhtml + ->getTextContent($altered[0]), + "correction" => + $this->fuckhtml + ->getTextContent($altered[1]) + ]; + } + + $this->fuckhtml->load($html); + } + + /* + Get web results + */ + $resulthtml = + $this->fuckhtml + ->getElementById( + "results", + "div" + ); + + $this->fuckhtml->load($resulthtml); + $items = 0; + foreach( + $this->fuckhtml + ->getElementsByClassName("snippet fdb") + as $result + ){ + + $data = [ + "title" => null, + "description" => null, + "url" => null, + "date" => null, + "type" => "web", + "thumb" => [ + "url" => null, + "ratio" => null + ], + "sublink" => [], + "table" => [] + ]; + + if( + isset($result["attributes"]["data-type"]) && + $result["attributes"]["data-type"] == "ad" + ){ + + // is an ad, skip + continue; + } + + $this->fuckhtml->load($result); + + /* + Get title + */ + $title = + $this->fuckhtml + ->getElementsByClassName( + "snippet-title", + "span" + ); + + if(count($title) === 0){ + + // encountered AI summarizer + // or misspelling indicator @TODO + continue; + } + + if(isset($title[0]["attributes"]["title"])){ + + $data["title"] = + $this->titledots( + $this->fuckhtml + ->getTextContent( + $title[0]["attributes"]["title"] + ) + ); + }else{ + + $data["title"] = + $this->titledots( + $this->fuckhtml + ->getTextContent( + $title[0] + ) + ); + } + + /* + Get description + */ + $description = + $this->fuckhtml + ->getElementsByClassName( + "snippet-description", + "p" + ); + + if(count($description) !== 0){ + $data["description"] = + $this->titledots( + $this->fuckhtml + ->getTextContent( + $description[0] + ) + ); + + // also check for thumbnail in here + $img = + $this->fuckhtml + ->getElementsByClassName( + "thumb", + "img" + ); + + if(count($img) !== 0){ + + $data["thumb"] = [ + "url" => $this->unshiturl($img[0]["attributes"]["src"]), + "ratio" => "16:9" + ]; + }else{ + + // might be a video thumbnail wrapper? + $wrapper = + $this->fuckhtml + ->getElementsByClassName( + "video-thumb", + "a" + ); + + if(count($wrapper) !== 0){ + + // we found a video + $this->fuckhtml->load($wrapper[0]); + + $img = + $this->fuckhtml + ->getElementsByTagName("img"); + + $data["thumb"] = [ + "url" => $this->unshiturl($img[0]["attributes"]["src"]), + "ratio" => "16:9" + ]; + + // get the video length, if its there + $duration = + $this->fuckhtml + ->getElementsByClassName( + "duration", + "div" + ); + + if(count($duration) !== 0){ + + $data["table"]["Duration"] = $duration[0]["innerHTML"]; + } + + // reset html load + $this->fuckhtml->load($result); + } + } + + }else{ + + // is a steam/shop listing + $description_alt = + $this->fuckhtml + ->getElementsByClassName( + "text-sm", + "div" + ); + + if(count($description_alt) !== 0){ + + switch($description_alt[0]["attributes"]["class"]){ + + case "text-sm text-gray": + case "description text-sm": + + $data["description"] = + $this->titledots( + $this->fuckhtml + ->getTextContent( + $description_alt[0] + ) + ); + break; + } + + // get table sublink + $sublink = + $this->fuckhtml + ->getElementsByClassName( + "r-attr text-sm", + "div" + ); + + if(count($sublink) !== 0){ + + $this->tablesublink($sublink, $data); + } + + // check for thumb element + $data["thumb"] = $this->getimagelinkfromstyle("thumb"); + }else{ + + // ok... finally... + // maybe its the instant answer thingy + $answer = + $this->fuckhtml + ->getElementsByClassName("answer"); + + if(count($answer) !== 0){ + + $data["description"] = + $this->titledots( + $this->fuckhtml + ->getTextContent($answer[0]) + ); + } + } + } + + // finally, fix brave's date format sucking balls + $data["description"] = explode(" - ", $data["description"], 2); + + if(count($data["description"]) === 0){ + + // nothing to do + $data["description"] = $data["description"][0]; + }else{ + + // attempt to parse + $time = strtotime($data["description"][0]); + + if($time !== false){ + + // got response + $data["date"] = $time; + + array_shift($data["description"]); + } + + // merge back + $data["description"] = + implode(" - ", $data["description"]); + } + + /* + Check content type + */ + $content_type = + $this->fuckhtml + ->getElementsByClassName( + "content-type", + "span" + ); + + if(count($content_type) !== 0){ + + $data["type"] = + strtolower($this->fuckhtml->getTextContent($content_type[0])); + } + + /* + Check subtext table thingy + */ + $table_items = + array_merge( + $this->fuckhtml + ->getElementsByClassName( + "item-attributes", + "div" + ), + $this->fuckhtml + ->getElementsByClassName( + "r", + "div" + ) + ); + + /* + DIV: item-attributes + */ + if(count($table_items) !== 0){ + + foreach($table_items as $table){ + + $this->fuckhtml->load($table); + + $span = + $this->fuckhtml + ->getElementsByClassName( + "text-sm", + "*" + ); + + foreach($span as $item){ + + $item = + explode( + ":", + $this->fuckhtml->getTextContent(preg_replace('/\n/', " ", $item["innerHTML"])), + 2 + ); + + if(count($item) === 2){ + + $data["table"][trim($item[0])] = trim($this->limitwhitespace($item[1])); + } + } + } + + $this->fuckhtml->load($result); + } + + // get video sublinks + $table_items = + $this->fuckhtml + ->getElementsByClassName( + "snippet-description published-time", + "p" + ); + + if(count($table_items) !== 0){ + + $table_items = + explode( + '', + $table_items[0]["innerHTML"], + 2 + ); + if(count($table_items) === 2){ + + $item2 = []; + + $item2[] = explode(":", $this->fuckhtml->getTextContent($table_items[0])); + + if(trim($table_items[1]) != ""){ + $item2[] = explode(":", $this->fuckhtml->getTextContent($table_items[1])); + } + + foreach($item2 as $it){ + + $data["table"][trim($it[0])] = trim($it[1]); + } + } + } + + /* + Get URL + */ + $data["url"] = + $this->fuckhtml->getTextContent( + $this->fuckhtml + ->getElementsByTagName("a") + [0] + ["attributes"] + ["href"] + ); + + /* + Get sublinks + */ + $sublinks_elems = + $this->fuckhtml + ->getElementsByClassName( + "snippet", + "div" + ); + + $sublinks = []; + + foreach($sublinks_elems as $sublink){ + + $this->fuckhtml->load($sublink); + + $a = + $this->fuckhtml + ->getElementsByTagName("a")[0]; + + $title = + $this->fuckhtml + ->getTextContent($a); + + $url = $a["attributes"]["href"]; + + $description = + $this->titledots( + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByTagName("p")[0] + ) + ); + + $sublinks[] = [ + "title" => $title, + "date" => null, + "description" => $description, + "url" => $url + ]; + } + + /* + Get smaller sublinks + */ + $sublinks_elems = + $this->fuckhtml + ->getElementsByClassName( + "deep-link", + "a" + ); + + foreach($sublinks_elems as $sublink){ + + $sublinks[] = [ + "title" => $this->fuckhtml->getTextContent($sublink), + "date" => null, + "description" => null, + "url" => $sublink["attributes"]["href"] + ]; + } + + // append sublinks to $data !! + $data["sublink"] = $sublinks; + + // append first result to start of $out["web"] + // other results are after + if($items === 0){ + + $out["web"] = [$data, ...$out["web"]]; + }else{ + + $out["web"][] = $data; + } + $items++; + } + + /* + Get news + */ + $this->fuckhtml->load($resulthtml); + $news_carousel = $this->fuckhtml->getElementById("news-carousel"); + + $this->fuckhtml->load($news_carousel); + + if($news_carousel){ + + $a = + $this->fuckhtml + ->getElementsByClassName( + "card fdb", + "a" + ); + + foreach($a as $news){ + + $this->fuckhtml->load($news); + + $out["news"][] = [ + "title" => + $this->titledots( + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "title", + "div" + )[0] + ) + ), + "description" => null, + "date" => + strtotime( + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "card-footer__timestamp", + "span" + )[0] + ) + ), + "thumb" => $this->getimagelinkfromstyle("img-bg"), + "url" => $this->fuckhtml->getTextContent($news["attributes"]["href"]) + ]; + } + } + + + + /* + Get videos + */ + $this->fuckhtml->load($resulthtml); + $news_carousel = $this->fuckhtml->getElementById("video-carousel"); + + $this->fuckhtml->load($news_carousel); + + if($news_carousel){ + + $a = + $this->fuckhtml + ->getElementsByClassName( + "card fdb", + "a" + ); + + foreach($a as $video){ + + $this->fuckhtml->load($video); + + $date = null; + + $date_o = + $this->fuckhtml + ->getElementsByClassName( + "text-gray text-xs", + "span" + ); + + if(count($date_o) !== 0){ + + $date = + strtotime( + $this->fuckhtml + ->getTextContent( + $date_o[0] + ) + ); + } + + $out["video"][] = [ + "title" => + $this->titledots( + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "title", + "div" + )[0] + ) + ), + "description" => null, + "date" => $date, + "duration" => null, + "views" => null, + "thumb" => $this->getimagelinkfromstyle("img-bg"), + "url" => $this->fuckhtml->getTextContent($video["attributes"]["href"]) + ]; + } + } + + + /* + Get DEFINITION snippet + */ + $this->fuckhtml->load($html); + $infobox = $this->fuckhtml->getElementById("rh-definitions", "div"); + + if($infobox !== false){ + + $answer = [ + "title" => null, + "description" => [], + "url" => null, + "thumb" => null, + "table" => [], + "sublink" => [] + ]; + + $this->fuckhtml->load($infobox); + + $answer["title"] = + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "header", + "h5" + )[0] + ); + + $sections = + $this->fuckhtml + ->getElementsByTagName("section"); + + $i = -1; + foreach($sections as $section){ + + $this->fuckhtml->load($section); + $items = + $this->fuckhtml + ->getElementsByTagName("*"); + + $li = 1; + $pronounce = false; + foreach($items as $item){ + + switch($item["tagName"]){ + + case "h6": + + if( + isset($item["attributes"]["class"]) && + $item["attributes"]["class"] == "h6 pronunciation" + ){ + + if($pronounce){ + + break; + } + + $answer["description"][] = [ + "type" => "quote", + "value" => + $this->fuckhtml + ->getTextContent( + $item + ) + ]; + + $answer["description"][] = + [ + "type" => "audio", + "url" => "https://search.brave.com/api/rhfetch?rhtype=definitions&word={$answer["title"]}&source=ahd-5" + ]; + + $pronounce = true; + $i = $i + 2; + break; + } + + $answer["description"][] = [ + "type" => "title", + "value" => + $this->fuckhtml + ->getTextContent( + $item + ) + ]; + $i++; + break; + + case "li": + + if( + $i !== -1 && + $answer["description"][$i]["type"] == "text" + ){ + + $answer["description"][$i]["value"] .= + "\n" . $li . ". " . + $this->fuckhtml + ->getTextContent( + $item + ); + + }else{ + $answer["description"][] = [ + "type" => "text", + "value" => + $li . ". " . + $this->fuckhtml + ->getTextContent( + $item + ) + ]; + $i++; + } + $li++; + break; + + case "a": + $answer["url"] = + $this->fuckhtml + ->getTextContent( + $item["attributes"]["href"] + ); + break; + } + } + } + + $out["answer"][] = $answer; + } + + + /* + Get instant answer + */ + $this->fuckhtml->load($html); + $infobox = $this->fuckhtml->getElementById("infobox", "div"); + + if($infobox !== false){ + + $answer = [ + "title" => null, + "description" => [], + "url" => null, + "thumb" => null, + "table" => [], + "sublink" => [] + ]; + + $this->fuckhtml->load($infobox); + $div = $this->fuckhtml->getElementsByTagName("div"); + + /* + Get title + url + */ + $title = + $this->fuckhtml + ->getElementsByClassName("infobox-title", "a"); + + if(count($title) !== 0){ + + $answer["title"] = + $this->fuckhtml + ->getTextContent( + $title[0] + ); + + $answer["url"] = + $this->fuckhtml + ->getTextContent( + $title[0]["attributes"]["href"] + ); + } + + /* + Get thumbnail + */ + $thumb = $this->getimagelinkfromstyle("thumb"); + + if($thumb["url"] !== null){ + + $answer["thumb"] = $thumb["url"]; + } + + /* + Get table + */ + $title = + $this->fuckhtml + ->getElementsByClassName( + "infobox-attr-header", + "div" + ); + + $rowhtml = $infobox; + + if(count($title) >= 2){ + + $rowhtml = + explode( + $title[1]["outerHTML"], + $infobox["innerHTML"], + 2 + )[0]; + } + + $this->fuckhtml->load($rowhtml); + + $rows = + $this->fuckhtml + ->getElementsByClassName("infobox-attr", "div"); + + foreach($rows as $row){ + + if(!isset($row["innerHTML"])){ + + continue; + } + + $this->fuckhtml->load($row); + $span = + $this->fuckhtml + ->getElementsByTagName("span"); + + if(count($span) === 2){ + + $answer["table"][ + $this->fuckhtml->getTextContent($span[0]) + ] = str_replace("\n", ", ", $this->fuckhtml->getTextContent($span[1], true)); + } + } + + $this->fuckhtml->load($infobox); + + /* + Parse stackoverflow answers + */ + $code = + $this->fuckhtml + ->getElementById("codebox-answer", $div); + + if($code){ + + // this might be standalone text with no paragraphs, check for that + $author = + $this->fuckhtml + ->getElementById("author"); + + $desc_tmp = + str_replace( + $author["outerHTML"], + "", + $code["innerHTML"] + ); + + $this->fuckhtml->load($desc_tmp); + $code = + $this->fuckhtml + ->getElementsByTagName("*"); + + if(count($code) === 0){ + + $answer["description"] = + [ + [ + "type" => "text", + "value" => + $this->fuckhtml + ->getTextContent( + $desc_tmp + ) + ], + [ + "type" => "quote", + "value" => + $this->fuckhtml + ->getTextContent( + $author + ) + ] + ]; + }else{ + + $text = []; + $i = 0; + + foreach($code as $snippet){ + + switch($snippet["tagName"]){ + + case "p": + $this->fuckhtml->load($snippet["innerHTML"]); + + $codetags = + $this->fuckhtml + ->getElementsByTagName("*"); + + $tmphtml = $snippet["innerHTML"]; + + foreach($codetags as $tag){ + + if(!isset($tag["outerHTML"])){ + + continue; + } + + $tmphtml = + explode( + $tag["outerHTML"], + $tmphtml, + 2 + ); + + $value = $this->fuckhtml->getTextContent($tmphtml[0], false, false); + $this->appendtext($value, $text, $i); + + $type = null; + switch($tag["tagName"]){ + + case "code": $type = "inline_code"; break; + case "em": $type = "italic"; break; + case "blockquote": $type = "quote"; break; + default: $type = "text"; + } + + if($type !== null){ + $value = $this->fuckhtml->getTextContent($tag, false, true); + + if(trim($value) != ""){ + + if( + $i !== 0 && + $type == "title" + ){ + + $text[$i - 1]["value"] = rtrim($text[$i - 1]["value"]); + } + + $text[] = [ + "type" => $type, + "value" => $value + ]; + $i++; + } + } + + if(count($tmphtml) === 2){ + + $tmphtml = $tmphtml[1]; + }else{ + + break; + } + } + + if(is_array($tmphtml)){ + + $tmphtml = $tmphtml[0]; + } + + if(strlen($tmphtml) !== 0){ + + $value = $this->fuckhtml->getTextContent($tmphtml, false, false); + $this->appendtext($value, $text, $i); + } + break; + + case "pre": + + switch($text[$i - 1]["type"]){ + + case "text": + case "italic": + $text[$i - 1]["value"] = rtrim($text[$i - 1]["value"]); + break; + } + + $text[] = + [ + "type" => "code", + "value" => + rtrim( + $this->fuckhtml + ->getTextContent( + $snippet, + true, + false + ) + ) + ]; + $i++; + + break; + + case "ol": + $o = 0; + + $this->fuckhtml->load($snippet); + $li = + $this->fuckhtml + ->getElementsByTagName("li"); + + foreach($li as $elem){ + $o++; + + $this->appendtext( + $o . ". " . + $this->fuckhtml + ->getTextContent( + $elem + ), + $text, + $i + ); + } + break; + } + } + + if( + $i !== 0 && + $text[$i - 1]["type"] == "text" + ){ + + $text[$i - 1]["value"] = rtrim($text[$i - 1]["value"]); + } + + if($author){ + + $text[] = [ + "type" => "quote", + "value" => $this->fuckhtml->getTextContent($author) + ]; + } + + $answer["description"] = $text; + } + }else{ + + /* + Get normal description + */ + $description = + $this->fuckhtml + ->getElementsByClassName( + "mb-6", + "div" + ); + + if(count($description) !== 0){ + + $description = + [ + [ + "type" => "text", + "value" => + $this->titledots( + preg_replace( + '/ Wikipedia$/', + "", + $this->fuckhtml + ->getTextContent( + $description[0] + ) + ) + ) + ] + ]; + + $ratings = + $this->fuckhtml + ->getElementById("ratings"); + + if($ratings){ + + $this->fuckhtml->load($ratings); + + $ratings = + $this->fuckhtml + ->getElementsByClassName( + "flex-hcenter mb-10", + "div" + ); + + $description[] = [ + "type" => "title", + "value" => "Ratings" + ]; + + foreach($ratings as $rating){ + + $this->fuckhtml->load($rating); + + $num = + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "r-num", + "div" + )[0] + ); + + $href = + $this->fuckhtml + ->getElementsByClassName( + "mr-10", + "a" + )[0]; + + $votes = + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "text-sm", + "span" + )[0] + ); + + $c = count($description) - 1; + + if( + $c !== -1 && + $description[$c]["type"] == "text" + ){ + + $description[$c]["value"] .= $num . " "; + }else{ + + $description[] = [ + "type" => "text", + "value" => $num . " " + ]; + } + + $description[] = [ + "type" => "link", + "value" => $this->fuckhtml->getTextContent($href), + "url" => $this->fuckhtml->getTextContent($href["attributes"]["href"]) + ]; + + $description[] = [ + "type" => "text", + "value" => " (" . $votes . ")\n" + ]; + } + } + + $answer["description"] = $description; + } + } + + /* + Get sublinks + */ + $this->fuckhtml->load($infobox); + + $profiles = + $this->fuckhtml + ->getElementById("profiles"); + + if($profiles){ + $profiles = + $this->fuckhtml + ->getElementsByClassName( + "chip", + "a" + ); + + foreach($profiles as $profile){ + + $name = $this->fuckhtml->getTextContent($profile["attributes"]["title"]); + + if(strtolower($name) == "steampowered"){ + + $name = "Steam"; + } + + $answer["sublink"][$name] = + $this->fuckhtml->getTextContent($profile["attributes"]["href"]); + } + } + + $actors = + $this->fuckhtml + ->getElementById("panel-movie-cast"); + + if($actors){ + + $this->fuckhtml->load($actors); + + $actors = + $this->fuckhtml + ->getElementsByClassName("card"); + + $answer["description"][] = [ + "type" => "title", + "value" => "Cast" + ]; + + foreach($actors as $actor){ + + $this->fuckhtml->load($actor); + + $answer["description"][] = [ + "type" => "text", + "value" => + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName("card-body") + [0] + ) + ]; + + $answer["description"][] = [ + "type" => "image", + "url" => $this->getimagelinkfromstyle("person-thumb")["url"] + ]; + } + } + + $out["answer"][] = $answer; + } + + /* + Get actor standalone thingy + */ + $this->fuckhtml->load($resulthtml); + $actors = + $this->fuckhtml + ->getElementById("predicate-entity"); + + if($actors){ + + $this->fuckhtml->load($actors); + + $cards = + $this->fuckhtml + ->getElementsByClassName("card"); + + $url = + $this->fuckhtml + ->getElementsByClassName( + "disclaimer", + "div" + )[0]; + + $this->fuckhtml->load($url); + + $url = + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByTagName("a") + [0] + ["attributes"] + ["href"] + ); + + $this->fuckhtml->load($actors); + + $answer = [ + "title" => + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "entity", + "span" + )[0] + ) . " (Cast)", + "description" => [], + "url" => $url, + "sublink" => [], + "thumb" => null, + "table" => [] + ]; + + foreach($cards as $card){ + + $this->fuckhtml->load($card); + + $answer["description"][] = [ + "type" => "title", + "value" => + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "title" + )[0] + ) + ]; + + $answer["description"][] = [ + "type" => "text", + "value" => + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "text-xs desc" + )[0] + ) + ]; + + $answer["description"][] = [ + "type" => "image", + "url" => $this->getimagelinkfromstyle("img-bg")["url"] + ]; + } + + $out["answer"][] = $answer; + } + + return $out; + } + + public function news($get){ + + $search = $get["s"]; + if(strlen($search) === 0){ + + throw new Exception("Search term is empty!"); + } + + $nsfw = $get["nsfw"]; + $country = $get["country"]; + + if(strlen($search) > 2048){ + + throw new Exception("Search query is too long!"); + } + /* + $handle = fopen("scraper/brave-news.html", "r"); + $html = fread($handle, filesize("scraper/brave-news.html")); + fclose($handle);*/ + try{ + $html = + $this->get( + "https://search.brave.com/news", + [ + "q" => $search + ], + $nsfw, + $country + ); + + }catch(Exception $error){ + + throw new Exception("Could not fetch search page"); + } + + $out = [ + "status" => "ok", + "npt" => null, + "news" => [] + ]; + + // load html + $this->fuckhtml->load($html); + + $news = + $this->fuckhtml + ->getElementsByClassName( + "snippet inline gap-standard", + "div" + ); + + foreach($news as $article){ + + $data = [ + "title" => null, + "author" => null, + "description" => null, + "date" => null, + "thumb" => + [ + "url" => null, + "ratio" => null + ], + "url" => null + ]; + + $this->fuckhtml->load($article); + $elems = + $this->fuckhtml + ->getElementsByTagName("*"); + + // get title + $data["title"] = + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "snippet-title", + $elems + ) + [0] + ["innerHTML"] + ); + + // get description + $data["description"] = + $this->titledots( + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "snippet-description", + $elems + ) + [0] + ["innerHTML"] + ) + ); + + // get date + $date = + explode( + "•", + $this->fuckhtml + ->getTextContent( + $this->fuckhtml + ->getElementsByClassName( + "snippet-url", + $elems + )[0] + ) + ); + + if( + count($date) !== 1 && + trim($date[1]) != "" + ){ + + $data["date"] = + strtotime( + $date[1] + ); + } + + // get URL + $data["url"] = + $this->fuckhtml->getTextContent( + $this->unshiturl( + $this->fuckhtml + ->getElementsByClassName( + "result-header", + $elems + ) + [0] + ["attributes"] + ["href"] + ) + ); + + // get thumbnail + $thumb = + $this->fuckhtml + ->getElementsByTagName( + "img" + ); + + if( + count($thumb) === 2 && + trim( + $thumb[1] + ["attributes"] + ["src"] + ) != "" + ){ + + $data["thumb"] = [ + "url" => + $this->fuckhtml->getTextContent( + $this->unshiturl( + $thumb[1] + ["attributes"] + ["src"] + ) + ), + "ratio" => "16:9" + ]; + } + + $out["news"][] = $data; + } + + return $out; + } + + /* + public function bypasscaptcha($html, $nsfw, $country){ + + // @TODO figure out why I still cant go trough + // the captcha wall even after breaking it + + try{ + $html = + $this->get( + "https://search.brave.com/goggles", + [ + "q" => "site:dailymotion.com my bloody valentine" + ], + $nsfw, + $country + ); + + }catch(Exception $error){ + + throw new Exception("Could not fetch html"); + } + + // Bypass brave search captcha + // this captcha only appears on the goggles page + preg_match( + '/this\.img\.src = "(.*)"/', + $html, + $image + ); + + $image = + base64_decode( + explode( + "data:image/png;base64,", + $image[1] + )[1] + ); + + $im = new Imagick(); + $im->readImageBlob($image); + + $im->blurImage(20, 20); + $im->posterizeImage(2, imagick::IMGTYPE_COLORSEPARATION); + + // if we encounter a white line thats longer than 45px + // we found the circle position + $iterator = $im->getPixelRegionIterator(0, 77, 310, 1); + + $found = null; + foreach( + $iterator as $row + ){ + + $whitecount = 0; + $count = 0; + + foreach($row as $pixel){ + + if($pixel->getColor()["r"] === 255){ + + $whitecount++; + $pixel->setColor("rgba(255,0,0,0)"); + + if($whitecount === 45){ + + $found = $count - 45; + break 2; + } + }else{ + + $whitecount = 0; + } + + $count++; + $iterator->syncIterator(); + } + } + + $found = $found + 10; + + //header("Content-Type: image/png"); + //echo $im; + //die(); + + if($found === null){ + + throw new Exception("Could not bypass captcha"); + } + + preg_match( + '/data="{"captcha_id":"([0-9A-z-]+)"}"/', + $html, + $key + ); + + $key = $key[1]; + // we bypassed captcha, send POST data + $order = + $this->get( + "https://search.brave.com/api/captcha?brave=0&captcha_id={$key}", + [ + "solution" => (string)$found + ], + $nsfw, + $country, + true + ); + + $order = json_decode($order, true)["orderId"]; + + $orderpayload = + $this->get( + "https://search.brave.com/api/rewards/v1/orders/{$order}", + [], + $nsfw, + $country + ); + + $orderpayload = json_decode($orderpayload, true); + + $creds = + $this->get( + "https://search.brave.com/api/rewards/v1/orders/{$order}/credentials", + [ + "itemId" => $orderpayload["items"][0]["id"], + "blindedCreds" => [ + "fuYAVcB/m7BU66vf3wkNGxJCSaRhshB9o+8km3F1h2c=", + "uswvcWJuPK/1qFlVdzBP3eQd0+V1EQgfAtnEoMIK+Uk=", + "fJWKGLBxl3Gyn4n9FjTLq1PjupfABT7Ni8MeB+iGzUs=", + "Aq9enJ/VZP9GxQIza3n65ZK7xQhY4VwDxv53BCb/Txg=", + "FMJA9eSLHq71K+Pcwgm4gIQOmdR/6KMy5cMgXhpd5Ro=", + "2NVhIAbvI317SP9/xXbVe/U57eWgvHyqVbHL/5+Gdmw=", + "6mpjsjSCmYEzK2xlbL8DI2P4LuhWUOxjTLvsTAL9l24=", + "kAn4wuHvIlKWhfuFfPTSfD4tZ5le9t7/61YbdEc/L3k=", + "BjjUyG16aTfd1c0h4oBzgQQOekrH1f+a5CmcXqMPTR4=", + "SBNgpCt4/V44yaQTfh+D027Yv1GJFHkjUEpPw6rAwRI=", + "XDENAtdQ7PyYx+Qx1wQGQtDWgg8WpIMgWGmd4RDOVWE=", + "tF7rB4sqamsiUk3K7fojdQSI0Q6iip72yKyhnvg/bC0=", + "VsAqflirAd/u4VsLdfRS2UvnH24ZNkFh6YN3DctLjzQ=", + "MntLbXkoI0LdcisCbNazmooiHXJyX91L1KERDAu1JRU=", + "TH6Zs8JBvFDbTDWgKbfGE4M5/cSwCtHD8ms5Y/U8zHQ=", + "jsZg0Z+qDPHymrbhdnesodhLNJ26QdunyMko1aVe4So=", + "rpKsyj6/vdnuMgLI2BApeijtGq9g5USRDL0w6X2bnlQ=", + "vCzliGT8A9vcLXj2sFf2kavOuYw69d70NpfgA22B4lI=", + "7OWoxSCtYXWcaBSifF7AXNBif/sjcuO0IelzXG/3PFk=", + "iiXtByNlT6nDMN9De5B58Jl8J0p6LCjnZ9aS3w2FEQU=", + "zDhd7gsJ4h4JkDeGK0Y0mfFd8IBdkLhMOANzwO+4Dig=", + "qANZ+AikwFReEA61JF009d/c3IHM/aSfIYwljckhJWE=", + "nNC30pDLxtXvUr+WDwfDSrAInNBpfSZkPsV2JlpheWI=", + "kGXE1pkt25P71kdJzmKIg4+yMR1VA5wNmbpBb/FhJQ8=", + "aLqPsY1Qiz2UCa2Jx3YNNt8r4JINMphks/43EiyZfXU=", + "bHGYZoQARZEM5LdFF6B74PkRqNd9EKxzuTvGYxjq+hk=", + "JOsYQjfE/9Y1u29hR+GvEkNyxUI8blgLhX1iJI/aGRQ=", + "yKjHjH5j600TJD/3WPsA1N3OmItDLifdjlysq4H6NV0=", + "9lTnUbsPp7BJ7XVN5/T4yGfzD9DJdqWB7xk72s19MAA=", + "5KHG8iY45em7zDhO/HlI0ydcZ0Ubn+XSyjifMmy7qXM=" + ] + ], + $nsfw, + $country, + true + ); + + var_dump($creds); + + sleep(2); + $test = + $this->get( + "https://search.brave.com/api/rewards/v1/orders/{$order}/credentials", + [], + $nsfw, + $country + ); + + var_dump($test); + + $html = + $this->get( + "https://search.brave.com/goggles", + [ + "q" => "site:dailymotion.com my bloody valentine" + ], + $nsfw, + $country, + false, + "__Secure-sku#brave-search-captcha=eyJ0eXBlIjoic2luZ2xlLXVzZSIsInZlcnNpb24iOjEsInNrdSI6ImJyYXZlLXNlYXJjaC1jYXB0Y2hhIiwicHJlc2VudGF0aW9uIjoiZXlKcGMzTjFaWElpT2lKaWNtRjJaUzVqYjIwL2MydDFQV0p5WVhabExYTmxZWEpqYUMxallYQjBZMmhoSWl3aWMybG5ibUYwZFhKbElqb2lNRzl0VDBneWQxZ3dTazkzU0VFMVJ6QTJaR1V5WjFOQ1dDdGhSM3B2Y2xsTVQwVTJZVVJtTUc5a1IweG1Wa3RhZEd0cU4xbHdia3BPT0VOVGNGbE5lVWR2YmpGRlNTOUhhMlZYU1RWNGQxTjJPWGxJTTNjOVBTSXNJblFpT2lKWlJWWldaVzR5TTJwQ01tSnZkakJ2U1hGNGJtSndUMGxEUW5Kd1drRjBRbWQxVnpoRlNURTNVREY2UVRaQlpUTXJSVGRFYm5NeVFqUmhka0pGYTFWM2FGY3JWRVZJVjNWcE9TdFllRU1yYlVSTVkyMTBRVDA5SW4wPSJ9" + ); + + var_dump($html); + }*/ + + private function appendtext($payload, &$text, &$index){ + + if(trim($payload) == ""){ + + return; + } + + if( + $index !== 0 && + $text[$index - 1]["type"] == "text" + ){ + + $text[$index - 1]["value"] .= "\n\n" . preg_replace('/ $/', " ", $payload); + }else{ + + $text[] = [ + "type" => "text", + "value" => preg_replace('/ $/', " ", $payload) + ]; + $index++; + } + } + + private function tablesublink($html_collection, &$data){ + + foreach($html_collection as $html){ + + $html["innerHTML"] = preg_replace( + '/