Parker Software Ltd Homepage
Forum Home Forum Home > WhosOn Visitor Tracking & Live Chat > FAQ's
  New Posts New Posts RSS Feed: Spiders - How Are They Detected
  FAQ FAQ  Forum Search   Calendar   Register Register  Login Login

Spiders - How Are They Detected - Event Date: 12 Jan 2006 - 17 Mar 2006

 Post Reply Post Reply
Author
Message
jmestep View Drop Down
New User
New User


Joined: 12 Jan 2006
Posts: 25
Post Options Post Options   Quote jmestep Quote  Post ReplyReply Direct Link To This Post Calendar Event: Spiders - How Are They Detected
    Posted: 12 Jan 2006 at 4:47pm
How do you determine what is a spider in Whoson?
Back to Top
Stephen View Drop Down
Admin Group
Admin Group
Avatar

Joined: 21 Oct 2005
Location: Stoke on Trent
Posts: 1415
Post Options Post Options   Quote Stephen Quote  Post ReplyReply Direct Link To This Post Posted: 13 Jan 2006 at 12:11pm
Hi, WhosOn detects spiders in 2 ways:
 
- If the visitor requests the file 'robots.txt' then WhosOn assumes its a spider.
 
- If the useragent field contains any of the following:
google
msnbot
yahoo
inktomi
gulliver
netseer
jeeves
lycos
infoseek
architext
realnames
picsearch
directhit
robozi
sitecheck
pingalink
miragobot
webmon
grub
looksmart
mercator
 
It assigns these to named search engines. Then generic 'bots' are seached for in the useragent text: spider robot crawl seek gigabot seach email maxbot esismart ip3000 fast- archiver webtop curl ferret
 
Many Thanks Steve


Edited by Stephen - 19 Apr 2008 at 7:41pm
Back to Top
Andy.Krafft View Drop Down
New User
New User


Joined: 15 Mar 2006
Posts: 5
Post Options Post Options   Quote Andy.Krafft Quote  Post ReplyReply Direct Link To This Post Posted: 15 Mar 2006 at 4:09pm
Hi Are the spiders hardcoded or can they be updated in file? Andy
quote:
Originally posted by Stephen
Hi, WhosOn detects spiders in 2 ways: - If the visitor requests the file 'robots.txt' then WhosOn assumes its a spider. - If the useragent field contains any of the following: [.....] Many Thanks Steve
Back to Top
Stephen View Drop Down
Admin Group
Admin Group
Avatar

Joined: 21 Oct 2005
Location: Stoke on Trent
Posts: 1415
Post Options Post Options   Quote Stephen Quote  Post ReplyReply Direct Link To This Post Posted: 15 Mar 2006 at 8:47pm
Hi Andy,
 
The ones listed are hardcoded however you can add your own by editing the file text ROBOTS.DAT in the folder:
C:\Documents and settings\All Users\Application Data\Parker Software\WhosOnV4\
 
The format of the file is:
 
IP [space] UserAgent <CRLF>
 
Eg: 131.179.64.242 BruinBot+(+http://webarchive.cs.ucla.edu/bruinbot.html) 216.145.17.190 SurveyBot/2.3+(Whois+Source)
 
WhosOn will read this file when it starts. If a visitor matches the IP OR their useragent contains the user agent text found in this file, the visitor will be marked as a spider.
 
Thanks Steve


Edited by Stephen - 19 Apr 2008 at 7:42pm
Back to Top
Andy.Krafft View Drop Down
New User
New User


Joined: 15 Mar 2006
Posts: 5
Post Options Post Options   Quote Andy.Krafft Quote  Post ReplyReply Direct Link To This Post Posted: 16 Mar 2006 at 7:21pm
Hi Steve Does the UA string need to be an exact match or can it be partial and is it case sensitive (eg is "google" enough to match all google's spiders)? Thanks Andy
Back to Top
Stephen View Drop Down
Admin Group
Admin Group
Avatar

Joined: 21 Oct 2005
Location: Stoke on Trent
Posts: 1415
Post Options Post Options   Quote Stephen Quote  Post ReplyReply Direct Link To This Post Posted: 17 Mar 2006 at 9:07am
Hi, It doesnt have to be an extract match.. if the UserAgent contains the text specified.. then the visitor will be flagged as a spider. So yes - 'google' would match any useragent that contained google.. The match is not case sensitive. Thanks Steve
Back to Top
Andy.Krafft View Drop Down
New User
New User


Joined: 15 Mar 2006
Posts: 5
Post Options Post Options   Quote Andy.Krafft Quote  Post ReplyReply Direct Link To This Post Posted: 12 Jul 2011 at 4:05pm
Hi Stephen

I can get the IP address suppression to work but no matter what I put in the second field (part of the UA string or the whole of it) I cannot get WO4 to remove the visitor when the User Agent field contains one of the entries in robots.dat

Cheers
Andy

Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.156 seconds.
These are the forums for Parker Software, developers of Live Chat Software: WhosOn and Email Automation Software: Email2DB.