|
Author: chaz
Date: 11-05-03 09:04
Sounds about right for them. I wonder if its got anything to do with the IRC GoogleBot TCL's. :S hope they fix it soon because its annoying having to open up a remote opera to quickly search for something on a browser on shell
chaz
|
|
Reply To This Message
|
|
Author: John Meredith
Date: 11-05-03 11:51
Google have been doing this for a little while - I think to cut down on automated search queries from it's database ie. perl scripts etc. Saying that however, it is simple to change the browser identification string and continue as normal.
John
|
|
Reply To This Message
|
|
Author: Dan Langille
Date: 11-05-03 12:47
John Meredith wrote:
>
> Google have been doing this for a little while
A very little while. I've often used links in the past with Google.
> I
> think to cut down on automated search queries from it's
> database ie. perl scripts etc.
Agreed.
> Saying that however, it is
> simple to change the browser identification string and
> continue as normal.
As noted in the article.
|
|
Reply To This Message
|
|
Author: Dan Langille
Date: 11-05-03 12:48
chaz wrote:
>
> Sounds about right for them. I wonder if its got
> anything to do with the IRC GoogleBot TCL's. :S
What is that?
> hope they fix
> it soon because its annoying having to open up a remote opera
> to quickly search for something on a browser on shell
Pardon?
|
|
Reply To This Message
|
|
Author: chaz
Date: 11-05-03 17:57
Author: Dan Langille (---.unixathome.org)
Date: 2003-05-11 05:48
Dan wrote:
>
> Sounds about right for them. I wonder if its got
> anything to do with the IRC GoogleBot TCL's. :S
>What is that?
Eggdrop has a tcl which can reference the google database by a trigger like !google blah, and im assuming it has an issue with the browser identity.
> hope they fix
> it soon because its annoying having to open up a remote opera
> to quickly search for something on a browser on shell
>Pardon?
Im talking about when i have to use VNC to search for something due to internet restrictions on the local machine at my college.. IE. it has most search engines blocked due to "pornography searches" - as u can tell, my administrator is slightly ... "lost in space". I need to use VNC onto my remote unix box just to be able to surf when im in college.... i used to just use lynx to do most of the browsing because it was simpler.
|
|
Reply To This Message
|
|
Author: Cristian Burneci
Date: 12-05-03 09:44
The campaign is specifically targeted against links and wget, which
can "dump" the content of a remote page into a text file. (Should
this be a starting poing for performing automated queries?)
Anyway note that links 2.x can't do this anymore, so banning this
browser is hilarious.
|
|
Reply To This Message
|
|
Author: Sniffy McNickles
Date: 20-05-03 13:20
>The campaign is specifically targeted against
>links and wget, which can "dump" the content
>of a remote page into a text file.
What are you talking about? This statement
makes no sense.
Any browser can save the contents of a page
to a text file.
If you're trying to say they're targeting automated
tools, you may be right, although blocking on UA is a
silly way to do it.
Much better would be to throttle repetitive looking
requests, which is pretty easy to do.
My guess is they're being annoyed by something specific
which happens not to set the UA, and this is a stopgap
until something else is in place.
|
|
Reply To This Message
|
|
Author: mjl
Date: 05-06-03 00:59
I can't see why you would want to use a non-interactive browser for any other reason other than for violating their TOS. Google provide a SOAP API, which I have used quite successfully for programmatically searching. They even provide excellent documentation and sample scripts.
HTH
|
|
Reply To This Message
|
|
Author: Eli the Bearded
Date: 18-06-03 19:05
I noticed the same thing happening with a page download tool I wrote (bget, available at CPAN in the scripts section). When I used a browser emulation I could get access.
As for mjl, I was doing it because I wanted to save an article I found in Google Groups in the same place that I save all my other news posts. So I copied the URL to the view original format link and tried to fetch the page.
By the way, when I did it the forbidden message I got had just a
simple base64 encoded block in the 'code below' section, but the one here is doublely base64 encoded.
|
|
Reply To This Message
|
|
Author: Doh
Date: 29-07-03 20:09
It is not fixed at all, already 2 days that I cannot search the groups.
|
|
Reply To This Message
|
|
Author: Peter Leftwich
Date: 20-08-03 17:52
[1] What's a TCL?
[2] Yahoo is not as good as Google ;) but they have improved
[3] Sniffy McNickels makes a great point - "Much better would be to throttle repetitive looking requests, which is pretty easy to do." Could you provide a URL which explains how this is done and at what level (e.g. as a daemon? in hardware...?)
[4] Most browsers and spiders allow the user to spoof the UA (UserAgent); What is this coming to? A fixed browser ID? As trustable as an IP? :)
[5] What is mjl?
[6] To 'chaz' with the college sysadmin who has "search engines blocked due to 'pornography searches'": That sysadmin needs to be fired and expelled. This makes about as much sense as closing down a city because a criminal lives within its boundaries!
[7] Is there a good write up (URL) about the Google SOAP API and what can be done using it?
[8] Sniffy McNickels is incorrect in his/its argument: "Any browser can save the contents of a page to a text file." There's more to the story! wget is non-interactive, whereas most browsers require clicking or scheduling through a GUI. Also, wget can have several instances run and acts in a more linear and consecutive, "robotic" manner than PACU's (point and click users') requests of an HTTP or FTP site.
[9] There is no nine. Please email me if this thread changes, it is hostmaster then an at symbol then Video2Video is the dot com domain. Thanks.
|
|
Reply To This Message
|
|