Web Crawlers Bundle

A bundle of features for the Spiders access. Hits tracking, complete listing for info, Specific usergroup for Spiders, etc

It’s globally a merge with some small addons i wrote in the past, and as i did not want to release a ton of minimal tools that just fit together, i make a real bundle, 4 or 5 tools together, with activation and permissions settings when needed.

Settings:

nex_crawlers_bundle_options.jpg

Spiders List: a little spiders tracker for your forum. It’s not tracking each page the engine is viewing, because this is pointless. Instead, It is listing the name of the spiders that visit your sites, the last date of a visit, the number of unique visits and the number of pages viewed. That information is not very important for the indexation of your site, but it helps to see why your site may be occupied or not. You can then take action if a crawler is visiting and still giving no result on search engines.

nex_crawlers_bundle_list.jpg

You can see it in action here: vbEnhancer.com – Crawlers List

Specific Usergroup for Spiders: i released this addon on vb.org long time ago, and it was copied in source, but this version is updated and have more flexibility. You simply have to choose the proper usergroup in the settings so when a spider/crawler visit your site, it is considered having some permissions… it’s useful if you do not want to fill your robots.txt file with strange access blocks. This let you give access to crawlers for profiles but not visitors messages, etc…

Also remember to follow the TOS of the search engines you are registered to. Google until lately was blocking sites that were ghosting their content.

Display Spiders in WOL: and in any page showing « Currently Active Users » (showthread, forumdisplay, etc) … that way, you see where these beasts are visiting.. 🙂

nex_crawlers_bundle_wol_ug_markup.jpg

As you can see in this listing, the markup for the usergroup applied to the crawlers give some style to the web crawlers, easier to trace that way.

… some other tools are to be decided to join in the bundle, i’ll see later!

CRON JOB:
to make it easier on the server, there is a cronjob storing the hourly stats about the crawlers… once the cronjob is done once (it’s the cron named Hourly #1), the stats appear in the right place…:

nex_crawlers_bundle_info.jpg

…update: may 1st, 10:50, a small change, the Crawlers listing will now update the spiders list in cache if the file changed, so you can update it when needed.

…update: may 26th, a change related to a request by Calystos here, as we can apply a usergroup to the crawlers, we will now be able to add some markup to that usergroup and it will show in the WOL and online.php …

and in the Who’s Online page (demo vbEnhancer.com):

nex_crawlers_bundle_online.jpg

i made it so the « Spider » in front of each spider is deactivated in the online.php page, because it’s pointless if you ask me… but you can deactivate the plugin of the hook « online_bit_complete » if you prefer.

note: 17/06/09: update to 1.1.1, now will update the proper count and names of web crawlers in Active Users of Showthread and Forumdisplay pages… thanks to all who reported it, mainly [user]xOBKx[/user]… 🙂

note: 19/06/09: no version change, but added the spiders count in the WOL page itself… from [user]xOBKx[/user]’s suggestion.

note: 09/07/09: no version change yet, Dream updated his spiders_vbulletin.xml, so i provide it in this first post, if you want to upload it to your /includes/xml/ directory… it will update the list instantly when needed.

note: 09/07/09 by night: bundle updated with the latest spiders list from Dream, and updated some bug fixes suggested by [user]xOBKx[/user], like the extra comma when there was nobody online, and the uncached template.

53 réponses sur “Web Crawlers Bundle”

  1. Here is some information for people who know nothing about Web Crawlers…

    Wikipedia wrote:
    A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are ants, automatic indexers, bots, and worms[1] or Web spider, Web robot, or—especially in the FOAF community—Web scutter[2].
    This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).
    A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

    Source: http://en.wikipedia.org/wiki/Web_crawler

  2. About compatibility and replacement for existing products, here is a list of known addons released on vb.org that can have an interaction with this product:

    1- Track Guest Visits – vBulletin.org Forum … basically, you can drop it, but as this product also track the activity of guests and web crawlers, it’s up to you. it will be more server intense as it’s doing most of the things this product does, it double the queries etc and is not caching anything.

    2- Spiders on ForumHome [NO FILE EDIT] – vBulletin.org Forum … this one is so basic… drop it, we’re doing it 5 times better.

  3. @nexia 20955 wrote:

    sure, there will be in the next alteration… this version is the first because the goal is done… now, the fixes for prefs etc.

    superb 🙂 really nice hack Nexy … i dropped the others for this … 😉

  4. Hi! My first post here, hopefully not last, 🙂

    Just wanted to let you know this addon is really REALLY useful! I actually got it from the vBulletin.org site some time back but now I noticed its in the graveyard area.

    I’ve only got one probem with it. Am running vbulletin 3.8.2 and for some reason the bots aren’t showing up or being moved into the « Bots » usergroup I setup an made and told the web crawlers setup to point them to. Any ideas?

    Other than that I love this addon, very useful! Great work!

  5. AH, you post « support » in the wrong thread then.. lol

    it’s not supposed to move the crawlers to the usergroup, but just use their privileges… because the crawlers are not registering an account to the site… they usually act like guests.

  6. Ahh, thats what I thought but wasn’t sure, 🙂

    Is it possible to do something like marking them as that group for when people see them on the who’s online an stuff? I know forum software such as phpbb and others have this feature as such.

    EDIT: Oopsy, I should really see about making a new thread in support (just re-read what you said).

  7. Heh, not essential as such but would be a nice feature I guess. Specially for forums where they (like me) have the Bots group marked with a different colour so you can see that they are Bots an whatnot.

    I’ve not noticed any other things that could be tweaked or added atm, it all seems to be working really well an looking good. 🙂

  8. actually it is a good suggestion, i’ll see if something is missing to have that markup… i think it is related to the fact that the bots have no userID… i’ll see if i can fix that… it may not be essential but it’s a basic twist.

  9. I was just reviewing the product file, an I think there could be a basic quick (an dirty) hack way of doing it. I’ll give it a whirl tomorrow an if it works I’ll let ya know an post the code. Course theres probably gonna be a better way but if my idea works at least it’d be a good start, 🙂

  10. First post was edited, version 1.1.0 is now available, filling this request… 😉

    @Calystos 21299 wrote:

    I was just reviewing the product file, an I think there could be a basic quick (an dirty) hack way of doing it. I’ll give it a whirl tomorrow an if it works I’ll let ya know an post the code. Course theres probably gonna be a better way but if my idea works at least it’d be a good start, 🙂

  11. Another quick thought, there should also be a check to see if there’s any registered users present, otherwise there’s an unnecessary comma at the start of the active users list with the web crawlers..

    I’ve had a quick look at this myself, but couldn’t get it working. I’ll have another go when I have some time if you haven’t beaten me to it. 🙂

  12. it’s easy, i just never thought of it… 🙂 will fix this.

    it’s easy… the first number have a value, you just have to check if the number is raised, and if the loop have content… 🙂 i’ll fix this in 10, when the laundry is started

  13. I managed to fix it, though possibly not in the most elegant of solutions.. Heh. Took me a while to get the logic right, but;
    If registered users present – show comma, else if total spiders AND spider count greater than one – show comma else hide comma.

    Just thought I’d report another bug, this time an uncached template in misc.php?do=crawlers.. GENERIC_SHELL

  14. bundle updated with the latest spiders list from Dream, and updated some bug fixes suggested by [user]xOBKx[/user], like the extra comma when there was nobody online, and the uncached template.

    .. also edited first post with new screenshots hosted seperately to avoid thumbnails.

  15. @zero5854 22170 wrote:

    for some reason after I installed I am not showing the bots stats link under the stats on index? I only show the part after currently active uses? Any help please?

    do you have a custom style? there may be a missing hook in your template that makes this error possible.

    btw, by default, the bots count is not showing when there is no bot on your site… *(instead of the pathetic « and 0 web crawler)

  16. as indicated in this screeny:
    nex_crawlers_bundle_options.jpg

    the last block of settings is ONLY for the « What’s Going On » block, and when there is no bot, there is no count… are they set properly?… if so, maybe just the hook isn’t working…

    do you still have the default style on your site, to verify this?

  17. hum, the text is not based on a hook, only the stats is… the text is related to the phrase $vbphrase[x_members_and_y_guests] if it is different, in vb 3.7 and lower, i can’t do a thing… you need an updated template  » FORUMHOME  » you can PM it to me, i’ll see what i can do.

  18. When no one is online, but a bot, this is what happening:

    active-users.png

    I believe [user]Boofo[/user] has found a fix for this in his Spider Display for vB3.7 Version 1.0.3*, but it required the config.php to be edited.

    *

    Quote:
    Version 1.0.3 –Fixed the leading comma issue when there were no members online.
    Version 1.0.4
  19. The screenshot says there’s 2 online members..

    It could be that your update doesn’t check for invisible users?

    I believe [user]Boofo[/user] has found a fix for this in his Spider Display for vB3.7 Version 1.0.3*, but it required the config.php to be edited.

    Definately no need for a config.php edit – I fixed the issue on my forums before [user]nexia[/user] released his update.

    See above.

  20. i’ll update this package when Dream is finished with updating the spiders list… in a day or two i suppose…

    i’ll make a new calculation instead of having simply the « xxx days… » because i have it for 160+ days, it’s pathetic to read…