Skip to main content
All CollectionsTechnical
Why are you having trouble crawling my site?
Why are you having trouble crawling my site?
Laurence O'Toole avatar
Written by Laurence O'Toole
Updated over a year ago

We use our own spidering software to automatically crawl each website on the platform. If you're not seeing any data in the Content 360 module, then there could be a number of causes for this:

  1. IP blocking - your server could simply be blocking our crawler server by blocking our IP). If this is the case, there is no way around this unless you remove this block. Our bot will show up in your server logs as follows:

    185.184.157.70 Mozilla/5.0 (compatible; linkdexbot/2.1; +http://www.linkdex.com/bots/)
  2. Robots.txt blocking - as you can see, our crawler is called linkdexbot. If your robots.txt includes a link blocking this user agent, then we will follow good practice, obey that instruction and not crawl your site. You would need to get this restriction removed if you want us to be able to crawl the site. If you fix the issue with robots.txt, please let us know by logging a ticket with our support team (support@authoritas.com) and we will restart both jobs.

NB: if you're comparing crawl number with Google's number of pages indexed (use the "site: query", e.g. Search Google for "site:authoritas.com"), then there will be a number of reason why these numbers can vary considerably. Apart from the points already outlined above, you also need to consider the following:

  • 404s - we won't count 404s as pages crawled, but will report those in the Deadlinks module. However, Google may include these in its total number of indexed pages.

  • Orphaned pages - Google may index these if it has another way of finding them (from external backlinks from other unique domains). Remember, we won't see these as we're spidering internal links only.

  • Javascript links - Google may follow these, but we won't.

  • The "site: hack" may return subdomains which we will ignore as our spidering software will consider all subdomains as off domain links and will not follow them; Google, however, may be giving you an number of pages indexed for the whole domain. You can easily verify this by running a "site: query" and flicking through the sample results Google returns.

Did this answer your question?