Recently, my blog got hit by the Panda badly. I was really hurt, as I couldn’t understand the real reason behind this sudden hit. I never wrote a single article which was low-quality, neither did I accept any thin article from my guest authors, then why a direct hit and such a huge loss in traffic.
Digging deep into this problem, I found that my blog is suffering from duplicate content issues as reported by the Google Webmaster Central. Almost all of my posts were having lots of duplicate pages in my blog itself, as reported by GWT. An another quick check on my blog’s index on Google, showed me that I am having thousands duplicate pages getting indexed regularly on Google.
I was completely unaware of this problem occurring in my blog. It was like Cancer, slowly eating my blog and all my hard work and efforts. I didn’t copy or duplicate any post intentionally, in my blog, and I was wondering how all this happened?
The Bitter Story Of replytocom Links
It was really a very bitter experience for me, I must say. I made a search with “site:dapazze.com “ and found about 2K pages of my blog in Googles index. I was really shocked. I only had about 150 posts and few other pages, then how can Google index so many for my blog? Where do they come from? Majority of them were replytocom links.
I was wondering, what these replytocom links are all about and how did all of them came? After some research I found that, any WordPress blog, who have “Reply to comment” enabled in their, are going to see these replytocom links. Whenever any reader, comments on your blog, then WordPress silently generates a replytocom link for that particular comment. Now if you have thousands of comments in your blog, then you are going to have thousands of these killer links indexed in your blog.
These replytocom links, actually are some separate pages, which contain exactly the same content as the parent post. Thus, these are definitely duplicate content And Panda in going to frown on your for too many duplicate articles.
Why Panda Started Hating You? Suddenly!
Google Panda update is all about on-site well-being. It mainly focuses on low-quality content, duplicate content, poorly framed content, weak site structure, poor spelling and grammar, broken links, etc. Now, if Google starts indexing all these replytocom links for your blog, then only God can bless you.
When I checked, URL Parameters under Goggle Webmaster Central, I found that Google is instructed to “Let Googlebot decide”, whether to index replytocom links or not. And I wonder why, Google decided to index them all. Google should have decided not to index these pages as they have been wrongly generated by WordPress, and doesn’t add any value to the users.
But unfortunately, Google showed no mercy on us and indexed every possible replytocom page, it could. I really don’t know, what is the idea of Canonical tag all about, if even after implementing that, I got into all these mess.
How To Find replytocom Links?
Its really very simple. The first thing that is going to happen is that you are going to see your traffic stats fall suddenly. It is not the time for you to panic. You are not the reason for this problem. You are just a victim of a technical fault, and that can be recovered very easily. And everything else, that you are going to read after this can be implemented only if you stay calm and not panic.
Make a quick search on Google with “site:domain.com replytocom”. I will show you approximately how many replytocom links have Google indexed so far. Click on “Repeat the search with omitted results included”, then it will show you the Tier 2 results.
You can see that all the comments have got indexed as replytocom links. The posts which have more comments, have got more replytocom links indexed, and so on. Now, its time to deindex them all and ensure that this never happen in future again. I got almost all my traffic back after these poisonous replytocom links went off.
How To Remove replytocom Links?
1) Using Google Webmaster Central
This is the first thing, that I would recommend you to do. The process is simple, but its worth it. Just go to Google Webmaster Central, go to URL Parameters, and click on edit for “replytocom” links. From the dropdown menu, select “No URLs” so as to ensure Google to ignore all replytocom links further.
This will also help you in deindexing all the replytocom links that have been already indexed. It will take some time, but its really worth it. Google will slowly re-crawl your site and work on deindexing these poisons slowly.
For me, I took few weeks to see some links fly away. It may take more time for you, or may be less. It all depends on the total number of links you have and also on how frequently Google crawls your site.
2) Installing SEO by Yoast Plugin
This is my favourite step and it is a very effective tip, that I have not seen in anywhere till now. Just install the SEO by Yoast plugin. It is a very powerful SEO plugin that can take care of your entire site’s SEO. It also does have a very efficient solution for this replytocom links problem.
All you need to do is activate the plugin, then go to the Permalinks section and then check the “Remove ?replytocom variables”.
What this option does is that, it removes the replytocom parameter completely from the URLs and puts #comment instead. Now, Google stands no chance to index them, as Google doesn’t index any link containing “#” in it. And also all the previously indexed replytocom links will also be deindexed as all of them have been redirected to the parent post itself.
I personally consider this to be the best and the most effective preventive as well as curative measure you can take to combat this problem without any hassle.
3) Tell the robots.txt to do the rest
You can use the robots.txt file to stop the crawlers from even crawling these links. You can put “Disallow: ?replytocom” in the robots.txt file. This will block the crawlers from crawling these links any more and the problem will never occur, expectantly.
But a word of caution, never apply these tip if you still have any replytocom links left, yet to be deindexed by Google. This will block the crawlers from crawling these links, so there will be no chance to get them deindexed. It will be mere foolishness. It will be like, inviting a guest to your house but keeping the doors locked.
I personally, have decided not to implement this technique, as I found that Google have a tendency not to obey the rules mentioned by robots.txt always. As you can see, I have put “Disallow: wp-content/plugins” still many plugin pages have got indexed. I really do not understand, why sometimes Google behaves like this.
My personal recommendation, is always to follow the step 2 as a perfect preventive measure.
4) If you are too lucky, its time for manual removal
Yes, you heard it right. If you are lucky enough and have only a few hundred replytocom links. Or even if you have around 1-2K links, then you can go for manual removal.
You can use the URL Removal Tool in GWT to remove the URLs manually. It is not a bad technique in any way, rather it is the only way to see fast results. But remember not to overdo it. You can add about a hundred links in it per day, with no alarm. The links will automatically get deindexed within 24 hours. This is the fastest way to get these poisons removed from your blog.
To ensure that they have all been removed, you can again do a “site:domain.com replytocom” search and check whether your links have been permanently removed or not.
Featured Image Source: blog.postling.com