The FreeBSD Diary

The FreeBSD Diary (TM) Remember
I remember
[ HOME | TOPICS | INDEX | WEB RESOURCES | BOOKS | CONTRIBUTE | SEARCH | FEEDBACK | FAQ | FORUMS ]

 New Topic  |  Go to Top  |  Go to Topic  |  Search  |  Log In   Newer Topic  |  Older Topic 
 RE: google.com
Author: chaz 
Date:   11-05-03 09:04

Sounds about right for them. I wonder if its got anything to do with the IRC GoogleBot TCL's. :S hope they fix it soon because its annoying having to open up a remote opera to quickly search for something on a browser on shell

chaz

Reply To This Message
 
 RE: google.com
Author: John Meredith 
Date:   11-05-03 11:51

Google have been doing this for a little while - I think to cut down on automated search queries from it's database ie. perl scripts etc. Saying that however, it is simple to change the browser identification string and continue as normal.

John

Reply To This Message
 
 RE: google.com
Author: Dan Langille 
Date:   11-05-03 12:47

John Meredith wrote:
>
> Google have been doing this for a little while

A very little while. I've often used links in the past with Google.

> I
> think to cut down on automated search queries from it's
> database ie. perl scripts etc.

Agreed.

> Saying that however, it is
> simple to change the browser identification string and
> continue as normal.

As noted in the article.

Reply To This Message
 
 RE: google.com
Author: Dan Langille 
Date:   11-05-03 12:48

chaz wrote:
>
> Sounds about right for them. I wonder if its got
> anything to do with the IRC GoogleBot TCL's. :S

What is that?

> hope they fix
> it soon because its annoying having to open up a remote opera
> to quickly search for something on a browser on shell

Pardon?

Reply To This Message
 
 RE: google.com
Author: chaz 
Date:   11-05-03 17:57

Author: Dan Langille (---.unixathome.org)
Date: 2003-05-11 05:48

Dan wrote:
>
> Sounds about right for them. I wonder if its got
> anything to do with the IRC GoogleBot TCL's. :S

>What is that?

Eggdrop has a tcl which can reference the google database by a trigger like !google blah, and im assuming it has an issue with the browser identity.

> hope they fix
> it soon because its annoying having to open up a remote opera
> to quickly search for something on a browser on shell

>Pardon?

Im talking about when i have to use VNC to search for something due to internet restrictions on the local machine at my college.. IE. it has most search engines blocked due to "pornography searches" - as u can tell, my administrator is slightly ... "lost in space". I need to use VNC onto my remote unix box just to be able to surf when im in college.... i used to just use lynx to do most of the browsing because it was simpler.

Reply To This Message
 
 RE: google.com
Author: Jorge 
Date:   12-05-03 07:34

yahoo.com is as good as google now (the same????)

:)

Reply To This Message
 
 RE: google.com
Author: Cristian Burneci 
Date:   12-05-03 09:44

The campaign is specifically targeted against links and wget, which
can "dump" the content of a remote page into a text file. (Should
this be a starting poing for performing automated queries?)
Anyway note that links 2.x can't do this anymore, so banning this
browser is hilarious.

Reply To This Message
 
 RE: google.com
Author: Sniffy McNickles 
Date:   20-05-03 13:20

>The campaign is specifically targeted against
>links and wget, which can "dump" the content
>of a remote page into a text file.

What are you talking about? This statement
makes no sense.

Any browser can save the contents of a page
to a text file.

If you're trying to say they're targeting automated
tools, you may be right, although blocking on UA is a
silly way to do it.

Much better would be to throttle repetitive looking
requests, which is pretty easy to do.

My guess is they're being annoyed by something specific
which happens not to set the UA, and this is a stopgap
until something else is in place.

Reply To This Message
 
 RE: google.com
Author: mjl 
Date:   05-06-03 00:59

I can't see why you would want to use a non-interactive browser for any other reason other than for violating their TOS. Google provide a SOAP API, which I have used quite successfully for programmatically searching. They even provide excellent documentation and sample scripts.

HTH

Reply To This Message
 
 RE: google.com
Author: Eli the Bearded 
Date:   18-06-03 19:05

I noticed the same thing happening with a page download tool I wrote (bget, available at CPAN in the scripts section). When I used a browser emulation I could get access.

As for mjl, I was doing it because I wanted to save an article I found in Google Groups in the same place that I save all my other news posts. So I copied the URL to the view original format link and tried to fetch the page.

By the way, when I did it the forbidden message I got had just a
simple base64 encoded block in the 'code below' section, but the one here is doublely base64 encoded.

Reply To This Message
 
 RE: google.com
Author: Doh 
Date:   29-07-03 20:09

It is not fixed at all, already 2 days that I cannot search the groups.

Reply To This Message
 
 RE: google.com
Author: test 
Date:   20-08-03 17:38

test

Reply To This Message
 
 RE: google.com
Author: Peter Leftwich 
Date:   20-08-03 17:52

[1] What's a TCL?
[2] Yahoo is not as good as Google ;) but they have improved
[3] Sniffy McNickels makes a great point - "Much better would be to throttle repetitive looking requests, which is pretty easy to do." Could you provide a URL which explains how this is done and at what level (e.g. as a daemon? in hardware...?)
[4] Most browsers and spiders allow the user to spoof the UA (UserAgent); What is this coming to? A fixed browser ID? As trustable as an IP? :)
[5] What is mjl?
[6] To 'chaz' with the college sysadmin who has "search engines blocked due to 'pornography searches'": That sysadmin needs to be fired and expelled. This makes about as much sense as closing down a city because a criminal lives within its boundaries!
[7] Is there a good write up (URL) about the Google SOAP API and what can be done using it?
[8] Sniffy McNickels is incorrect in his/its argument: "Any browser can save the contents of a page to a text file." There's more to the story! wget is non-interactive, whereas most browsers require clicking or scheduling through a GUI. Also, wget can have several instances run and acts in a more linear and consecutive, "robotic" manner than PACU's (point and click users') requests of an HTTP or FTP site.
[9] There is no nine. Please email me if this thread changes, it is hostmaster then an at symbol then Video2Video is the dot com domain. Thanks.

Reply To This Message
 Forum List  |  Threaded View   Newer Topic  |  Older Topic 


 Forum List  |  Need a Login? Register Here 
 User Login
 User Name:
 Password:
 Remember my login:
   
 Forgot Your Password?
Enter your email address or user name below and a new password will be sent to the email address associated with your profile.
How to get the most out of the forum

phorum.org