Latest discoveries about deindexations and Search Engine crawling

Every time a blog gets deindexed on EBN, we save all the information we have about it so we can analyze it later. We then do batch analysis to see if there are any patterns or footprints that we can report to the community.

In the last few weeks, we found a few interesting things. We’re still discussing how to implement these into Blog Health to improve deindexation prevention but they’re useful insights nonetheless.

Here’s what we found.

Search Engine Crawler is visiting old URLs all the time

If you buy an expired domain with history, old URLs get crawled often and for long periods of time. This can go on for months, even if the URLs report 404 error.

Unrelated content and/or language can cause deindexation

We found that rebuilding old domains and using unrelated content and/or different language can cause deindexation.

Comments increase Search Engine Crawler visits

The comment feed is checked daily. If there are no comments, the blog is crawled less often.

Blocking crawlers can cause deindexation

We did not find any issues with users of Spider Blocker, however, a lot of users add more than one plugin and block additional crawlers. Do NOT do this. Use one blocker and block as little crawlers as possible.

Some domains are permanently penalized

Some penalized domains never get any crawler traffic and will therefore never get indexed. Unfortunately, we don’t yet have the data on how long this penalty can persist or what is the root cause of it (email spam, malware, phishing etc.).

Search Engine Crawler still visits the blog after deindexation (!)

When a domain gets removed from the SERP (deindexed), the old URLs still get crawled regularly and that stops only after 5-7 days. This could mean there are still options to save your blog after it gets deindexed by rebuilding URLs with relevant content.

Since we’re using passive indexation check, this is the reason why our indexation status can be late for 7-14 days (while Blog Health is checked daily).

Summary

Here’s a quick recap:

Rebuild URLs with relevant content that would fit on the old domain. Use the same language.
Check domains in spam and malware databases before buying them.
Use only one spider blocker, we recommend our free Spider Blocker plugin and block only the most important crawlers.

While none of this is a complete surprise, it’s just something that we can now confirm with data, not just speculation.

In the future, we’re going to start collecting even more information about domains – from social metrics to backlinks and blacklist databases. Once we have that, our analysis and deindexation prevention will greatly improve.

Latest discoveries about deindexations and Search Engine crawling

Search Engine Crawler is visiting old URLs all the time

Unrelated content and/or language can cause deindexation

Comments increase Search Engine Crawler visits

Blocking crawlers can cause deindexation

Some domains are permanently penalized

Search Engine Crawler still visits the blog after deindexation (!)

Summary

Dejan Murko

Easy Blog Networks

Features

About

Legal