![]() |
Spiders - How Are They Detected - Event Date: 12 Jan 2006 - 17 Mar 2006 |
Post Reply
|
| Author | |
jmestep
New User
Joined: 12 Jan 2006 Posts: 25 |
Post Options
Quote Reply
Calendar Event: Spiders - How Are They DetectedPosted: 12 Jan 2006 at 4:47pm |
|
How do you determine what is a spider in Whoson?
|
|
![]() |
|
Stephen
Admin Group
Joined: 21 Oct 2005 Location: Stoke on Trent Posts: 1389 |
Post Options
Quote Reply
Posted: 13 Jan 2006 at 12:11pm |
|
Hi, WhosOn detects spiders in 2 ways:
- If the visitor requests the file 'robots.txt' then WhosOn assumes its a spider.
- If the useragent field contains any of the following:
google
msnbot
yahoo
inktomi
gulliver
netseer
jeeves
lycos
infoseek
architext
realnames
picsearch
directhit
robozi
sitecheck
pingalink
miragobot
webmon
grub
looksmart
mercator
It assigns these to named search engines. Then generic 'bots' are seached for in the useragent text: spider robot crawl seek gigabot seach email maxbot esismart ip3000 fast- archiver webtop curl ferret
Many Thanks Steve Edited by Stephen - 19 Apr 2008 at 7:41pm |
|
![]() |
|
Andy.Krafft
New User
Joined: 15 Mar 2006 Posts: 5 |
Post Options
Quote Reply
Posted: 15 Mar 2006 at 4:09pm |
|
Hi
Are the spiders hardcoded or can they be updated in file?
Andy
quote: |
|
![]() |
|
Stephen
Admin Group
Joined: 21 Oct 2005 Location: Stoke on Trent Posts: 1389 |
Post Options
Quote Reply
Posted: 15 Mar 2006 at 8:47pm |
|
Hi Andy,
The ones listed are hardcoded however you can add your own by editing the file text ROBOTS.DAT in the folder:
C:\Documents and settings\All Users\Application Data\Parker Software\WhosOnV4\
The format of the file is:
IP [space] UserAgent <CRLF>
Eg: 131.179.64.242 BruinBot+(+http://webarchive.cs.ucla.edu/bruinbot.html) 216.145.17.190 SurveyBot/2.3+(Whois+Source)
WhosOn will read this file when it starts. If a visitor matches the IP OR their useragent contains the user agent text found in this file, the visitor will be marked as a spider.
Thanks Steve Edited by Stephen - 19 Apr 2008 at 7:42pm |
|
![]() |
|
Andy.Krafft
New User
Joined: 15 Mar 2006 Posts: 5 |
Post Options
Quote Reply
Posted: 16 Mar 2006 at 7:21pm |
|
Hi Steve
Does the UA string need to be an exact match or can it be partial and is it case sensitive (eg is "google" enough to match all google's spiders)?
Thanks
Andy
|
|
![]() |
|
Stephen
Admin Group
Joined: 21 Oct 2005 Location: Stoke on Trent Posts: 1389 |
Post Options
Quote Reply
Posted: 17 Mar 2006 at 9:07am |
|
Hi,
It doesnt have to be an extract match.. if the UserAgent contains the text specified.. then the visitor will be flagged as a spider.
So yes - 'google' would match any useragent that contained google.. The match is not case sensitive.
Thanks
Steve
|
|
![]() |
|
Andy.Krafft
New User
Joined: 15 Mar 2006 Posts: 5 |
Post Options
Quote Reply
Posted: 12 Jul 2011 at 4:05pm |
|
Hi Stephen
I can get the IP address suppression to work but no matter what I put in the second field (part of the UA string or the whole of it) I cannot get WO4 to remove the visitor when the User Agent field contains one of the entries in robots.dat Cheers Andy |
|
![]() |
|
Post Reply
|
| Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |