Request: add startpage scraper #23

Closed
opened 2024-06-21 13:21:42 +00:00 by Evhorizon · 15 comments

First of all thanks for all the great job you’re doing :) I really want to use the docker 4get+Tor container, but as I told elsewhere I’m mostly using the Google scraper, that is unforgiving blocking my searches with Tor. If startpage engine is added, it would fix my problem, because this engine proxies the Google results without blocking Tor. Thanks in advance, I hope this will be possible, tho I'm aware that each scraper is some thousands lines of code :)

First of all thanks for all the great job you’re doing :) I really want to use the docker 4get+Tor container, but as I told elsewhere I’m mostly using the Google scraper, that is unforgiving blocking my searches with Tor. If startpage engine is added, it would fix my problem, because this engine proxies the Google results without blocking Tor. Thanks in advance, I hope this will be possible, tho I'm aware that each scraper is some thousands lines of code :)
Owner

I'm planning to add more Google sources soon, please stay tuned

I'm planning to add more Google sources soon, please stay tuned
Author

Thank you, I'll be on the lookout

Thank you, I'll be on the lookout
Evhorizon changed title from Request: add startpage to Request: add startpage scraper 2024-06-22 18:04:11 +00:00
Author

In the meantime tried a workaround using the !sp bang in DDG but I was out of luck :) I get Failed to get d.js URL

In the meantime tried a workaround using the !sp bang in DDG but I was out of luck :) I get` Failed to get d.js URL`
Owner

I'm working on a Startpage scraper, please stay tuned.

Just telling you now, their image search uses bing. I haven't checked for the video/news tab yet.

I'm working on a Startpage scraper, please stay tuned. Just telling you now, their image search uses bing. I haven't checked for the video/news tab yet.
Author

Thanks:) huh, Bing, really? Damn, so which engine is left sourcing Google results? Gibiru?

Thanks:) huh, Bing, really? Damn, so which engine is left sourcing Google results? Gibiru?
Owner

I don't know why people keep mentioning Gibiru as an option. It's literally just reskinned Google CSE. Open the network tab, you will see it makes calls to cse.google.com to pull results, it's literal garbage.

There are a few Google options out there, like MyPrivateSearch which has a hidden Google API endpoint and some other ones with shitty reputation. Once I scrape Startpage, I'm thinking about writing scrapers for Korean and Chinese search engines, they have interesting results.

I don't know why people keep mentioning Gibiru as an option. It's literally just reskinned Google CSE. Open the network tab, you will see it makes calls to cse.google.com to pull results, it's literal garbage. There are a few Google options out there, like MyPrivateSearch which has a hidden Google API endpoint and some other ones with shitty reputation. Once I scrape Startpage, I'm thinking about writing scrapers for Korean and Chinese search engines, they have interesting results.
Author

Yes, Gibiru is known to be a bare Google clone, that's why I mentioned it hoping it would not have blocked me:) however Asian engines sound interesting, are you talking about something like Baidu?

Yes, Gibiru is known to be a bare Google clone, that's why I mentioned it hoping it would not have blocked me:) however Asian engines sound interesting, are you talking about something like Baidu?
Owner

I have a few in my list:

Baidu, Seznam, Daum, Sese, m.sm.cn, Toutiao, Naver, Sogou, So.com, Coc Coc, Goo, Weibo, Solofield.net

Just to name a few.

I have a few in my list: Baidu, Seznam, Daum, Sese, m.sm.cn, Toutiao, Naver, Sogou, So.com, Coc Coc, Goo, Weibo, Solofield.net Just to name a few.
Author

They will all come as a surprise, honestly I haven't ever heard of any of those :)

They will all come as a surprise, honestly I haven't ever heard of any of those :)
Owner

I've been hunting for them autistically

I've been hunting for them autistically
Owner

I've been doing some work on it lately, expect an update on this sooner or later

I've been doing some work on it lately, expect an update on this sooner or later
Owner

I have added partial support for startpage with my latest commit.

Image, videos and news will follow soon...

still ironing out some issues with the instant answers that suddenly stopped showing up. what a pain

edit: ^ just fixed whatever bs that was

I have added partial support for startpage with my latest commit. Image, videos and news will follow soon... still ironing out some issues with the instant answers that suddenly stopped showing up. what a pain edit: ^ just fixed whatever bs that was
Author

Great job man, felt like something exciting was going to happen this weekend:) I'm waiting for the docker code to be updated and try it asap

Great job man, felt like something exciting was going to happen this weekend:) I'm waiting for the docker code to be updated and try it asap
Author

Image, videos and news will follow soon...

Good, my personal priority would go to image and news because yt is already a satisfying choice to look for videos

> Image, videos and news will follow soon... Good, my personal priority would go to image and news because yt is already a satisfying choice to look for videos
Owner

I fully implemented startpage in my latest commit, thank you for your time

I fully implemented startpage in my latest commit, thank you for your time
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: lolcat/4get#23
No description provided.