Nobody here but us chickens! #6
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I get this error:
Nobody here but us chickens!
And also the scraper error:
This scraper returned an error:
Google returned an unsupported page format (will fix)
Maybe 5-10% of the time I search. If I click search again (or a few times) eventually it will return results. I know the scraper error is due to google changes, but the "chickens" error seems to be because no results are returned.
Do you think that is also scraper-related, or could there be something else going on that could cause that error?
Also, a feature request, or idea, for consideration, would be an option to display the proxy being used somewhere in the search results. If there's an issue, it's helpful to know which proxy is being used, and it's interesting info to know. I added this to backend.php and frontend.php for myself, but I know it's not the best way to do it.
Added to lib/backend.php assign_proxy function before switch($type):
Added to lib/frontend.php to display proxy:
Yes.. The Google scraper is due for a rewrite. Let me explain the errors you've been getting:
this is caused by Google returning some miscellaneous error. I don't detect it, so I attempt to get all search results in the body even though the page doesn't have any.
Google has started doing A/B testing and sometimes returns a newer interface, which has not been parsed yet. My scraper works by getting nodes by their CSS style attributes, so it is very prone to breakage. I do this because most of the nodenames on the Google page are random strings on every page load. I also opted to scrape the mobile version of Google because it returns more sublinks despite not returning sublink descriptions, and it also shows results faster. Historically, the mobile page for old browsers had not changed in almost a decade, but it seems that the old interface is slowly getting replaced.
SearxNG uses some fucky API that returns different results from my testing; Their method also doesn't allow me to scrape word definitions so will not be using that.
In a next update, I will be spoofing the user agent to use a newer android tablet running android 4.2, since it uses the new layout and even returns (albeit small) video descriptions.
Yes... I also had to implement my own methods to debug some of my proxies. A complete rewrite of the user interface is coming. An admin interface to check proxies will also be made at some point, it's just been really hard to find some time to work on this stuff lately as I'm pretty burnt out from working at my dead end job xoxo
Don't expect the google scraper to be fixed during this month, although I might fix this during the month of May. I also have a week off in august so expect some movement then.
Thank you for your time.
No worries. Thanks for the info.
Hey, just wanted to keep you updated. I'm working on a new version of the scraper which scrapes the desktop version. Their webpage is a clusterfuck so it will take time, but expect some movement next weekend.
Sorry for the wait, the update is here. Please let me know of any issues you encounter!
I updated my instance and I've been using it with google yesterday and today, with several different proxies. I've only gotten the "chickens" error once. I do get other scraper errors sometimes, though, like:
Failed to grep result div
Failed to get HTML
It is a good improvement, though. Thanks!