4get/docs/configure.md

# 4get configuation options

Welcome! This guide assumes that you have a working 4get instance. This will help you configure your instance to the best it can be!

# Files location
1. The main configuration file is located at `data/config.php`
2. The proxies are located in `data/proxies/*.txt`
3. The captcha imagesets are located in `data/captcha/your_image_set/*.png`
4. The captcha font is located in `data/fonts/captcha.ttf`

# Bypass Cloudflare
**Note: These instructions won't help you get pass Cloudflare captchas, only through their firewall!!**

If you want to get proxied images from websites behind Cloudflare (or search engine data from the `Yep` search engine), you need to setup `curl-impersonate`. To do this, follow the install instructions listed here:

https://github.com/lwthiker/curl-impersonate/blob/main/INSTALL.md#native-build

Only install the Firefox module. This is needed to fool Cloudflare into thinking we're making requests from the Firefox browser. Cloudflare is a piece of shit that checks TLS fingerprints and it can detect `curl` this way... By pretending to have Firefox's signatures, we can get pass their firewall.

Anyway, once you compiled all of this, you should be able to execute `curl_ff117 --version` and get an output similar to this:

```sh
$ curl_ff117 --version
curl 8.1.1 (x86_64-pc-linux-gnu) libcurl/8.1.1 NSS/3.92 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 nghttp2/1.56.0
Release-Date: 2023-05-23
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB SSL threadsafe UnixSockets zstd
```
Once you managed to do that, you still need to tell PHP to use the new library... This can be difficult to do depending of your distribution, but here is how I did it on Debian. First, you need to replace the `libcurl.so.4` file. You can use `find` to find where it is... Although it will return multiple matches.

```sh
$ sudo find / -name libcurl.so.4
/usr/lib/i386-linux-gnu/libcurl.so.4
/usr/lib/x86_64-linux-gnu/libcurl.so.4 # on debian, this is the one I must replace!
```
Once you found it, check if the library you compiled is present in `/usr/local/lib/libcurl-impersonate-ff.so`. If so, use the following commands:
```sh
sudo rm /usr/lib/x86_64-linux-gnu/libcurl.so.4
sudo cp /usr/local/lib/libcurl-impersonate-ff.so /usr/lib/x86_64-linux-gnu/libcurl.so.4
```
Yes, this is cursed. Yes, this might break `curl` updates. Let me know if you find a better way.

**Important: after you've done this, make sure to restart apache2, otherwise it will still use the old library!**

# Robots.txt
Make sure you configure this right to optimize your search engine presence! Head over to `/robots.txt` and change the 4get.ca domain to your own domain.

# Server listing
To be listed on https://4get.ca/instances , you must contact *any* of the people in the server list and ask them to add you to their list of instances in their configuration. The instance list is distributed, and I don't have control over it.

If you see spammy entries in your instances list, simply remove the instance from your list that pushes the offending entries.

# Proxies
4get supports rotating proxies for scrapers! Configuring one is really easy.

1. Head over to the **proxies** folder. Give it any name you want, like `myproxy`, but make sure it has the `txt` extension.
2. Add your proxies to the file. Examples:
	```conf
	# format -> <protocol>:<address>:<port>:<username>:<password>
	# protocol list:
	# raw_ip, http, https, socks4, socks5, socks4a, socks5_hostname
	socks5:1.1.1.1:juicy:cloaca00
	http:1.3.3.7::
	raw_ip::::
	```
3. Go to the **main configuration file**. Then, find which website you want to setup a proxy for.
4. Modify the value `false` with `"myproxy"`, with quotes included and the semicolon at the end.

Done! The scraper you chose should now be using the rotating proxies. When asking for the next page of results, it will use the same proxy to avoid detection!

## Important!
If you ever test out a `socks5` proxy locally on your machine and find out it works but doesn't on your server, try supplying the `socks5_hostname` protocol instead. Hopefully this tip can save you 3 hours of your life!
added documentation 2024-02-25 23:56:28 +00:00			`# 4get configuation options`

			`Welcome! This guide assumes that you have a working 4get instance. This will help you configure your instance to the best it can be!`

add cloudflare bypass instructions 2024-05-25 00:28:19 +00:00			`# Files location`
added documentation 2024-02-25 23:56:28 +00:00			1. The main configuration file is located at `data/config.php`
			2. The proxies are located in `data/proxies/*.txt`
			3. The captcha imagesets are located in `data/captcha/your_image_set/*.png`
			4. The captcha font is located in `data/fonts/captcha.ttf`

add cloudflare bypass instructions 2024-05-25 00:28:19 +00:00			`# Bypass Cloudflare`
			`Note: These instructions won't help you get pass Cloudflare captchas, only through their firewall!!`

			If you want to get proxied images from websites behind Cloudflare (or search engine data from the `Yep` search engine), you need to setup `curl-impersonate`. To do this, follow the install instructions listed here:

			`https://github.com/lwthiker/curl-impersonate/blob/main/INSTALL.md#native-build`

			Only install the Firefox module. This is needed to fool Cloudflare into thinking we're making requests from the Firefox browser. Cloudflare is a piece of shit that checks TLS fingerprints and it can detect `curl` this way... By pretending to have Firefox's signatures, we can get pass their firewall.

			Anyway, once you compiled all of this, you should be able to execute `curl_ff117 --version` and get an output similar to this:

			```sh
			`$ curl_ff117 --version`
			`curl 8.1.1 (x86_64-pc-linux-gnu) libcurl/8.1.1 NSS/3.92 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 nghttp2/1.56.0`
			`Release-Date: 2023-05-23`
			`Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp ws wss`
			`Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB SSL threadsafe UnixSockets zstd`
			```
			Once you managed to do that, you still need to tell PHP to use the new library... This can be difficult to do depending of your distribution, but here is how I did it on Debian. First, you need to replace the `libcurl.so.4` file. You can use `find` to find where it is... Although it will return multiple matches.

			```sh
			`$ sudo find / -name libcurl.so.4`
			`/usr/lib/i386-linux-gnu/libcurl.so.4`
			`/usr/lib/x86_64-linux-gnu/libcurl.so.4 # on debian, this is the one I must replace!`
			```
			Once you found it, check if the library you compiled is present in `/usr/local/lib/libcurl-impersonate-ff.so`. If so, use the following commands:
			```sh
			`sudo rm /usr/lib/x86_64-linux-gnu/libcurl.so.4`
			`sudo cp /usr/local/lib/libcurl-impersonate-ff.so /usr/lib/x86_64-linux-gnu/libcurl.so.4`
			```
			Yes, this is cursed. Yes, this might break `curl` updates. Let me know if you find a better way.

			`Important: after you've done this, make sure to restart apache2, otherwise it will still use the old library!`

			`# Robots.txt`
fag protection 2024-03-25 02:31:19 +00:00			Make sure you configure this right to optimize your search engine presence! Head over to `/robots.txt` and change the 4get.ca domain to your own domain.

add cloudflare bypass instructions 2024-05-25 00:28:19 +00:00			`# Server listing`
added documentation 2024-02-25 23:56:28 +00:00			`To be listed on https://4get.ca/instances , you must contact any of the people in the server list and ask them to add you to their list of instances in their configuration. The instance list is distributed, and I don't have control over it.`

			`If you see spammy entries in your instances list, simply remove the instance from your list that pushes the offending entries.`

add cloudflare bypass instructions 2024-05-25 00:28:19 +00:00			`# Proxies`
added documentation 2024-02-25 23:56:28 +00:00			`4get supports rotating proxies for scrapers! Configuring one is really easy.`

			1. Head over to the proxies folder. Give it any name you want, like `myproxy`, but make sure it has the `txt` extension.
			`2. Add your proxies to the file. Examples:`
			```conf
			`# format -> <protocol>:<address>:<port>:<username>:<password>`
			`# protocol list:`
			`# raw_ip, http, https, socks4, socks5, socks4a, socks5_hostname`
			`socks5:1.1.1.1:juicy:cloaca00`
			`http:1.3.3.7::`
			`raw_ip::::`
			```
			`3. Go to the main configuration file. Then, find which website you want to setup a proxy for.`
			4. Modify the value `false` with `"myproxy"`, with quotes included and the semicolon at the end.

			`Done! The scraper you chose should now be using the rotating proxies. When asking for the next page of results, it will use the same proxy to avoid detection!`

add cloudflare bypass instructions 2024-05-25 00:28:19 +00:00			`## Important!`
fag protection 2024-03-25 02:31:19 +00:00			If you ever test out a `socks5` proxy locally on your machine and find out it works but doesn't on your server, try supplying the `socks5_hostname` protocol instead. Hopefully this tip can save you 3 hours of your life!