If you are a blogger, chances are, you have either dealt with spammers already, or will be doing so in the future when your blog becomes more popular. These days, spammers are using any means necessary to get their links on your blog. These tactics include link filled comments, bogus pingbacks and bogus trackbacks. What I’m going to focus on within this article is deciding, whether a pingback or trackback is coming from a legitimate blog or not.
The example I use in this post will be from a random site that is attributed to a bogus trackback url that was found on a Mashable.com post. I won’t be directly linking to the example site because that is what those spamming bastards want. Determining whether a blog is fake or real is easy once you figure out the patterns. Granted, these patterns change from time to time, here is a collection of tactics I use to determine if a blog is fake or not.
What Is The Difference Between A Splog And Scraper?
Special thanks to Lorelle who stopped by and added her definition for these two terms in the comments section of this post.
A splog is a “spam blog”, a blog with little or no purpose other than to promote or sell something and make the blog owner money. The content is usually made up, or duplicated throughout the different posts, or a collection of post titles and excerpts from a variety of keyword matching posts in a link list.
A scraping blog is a blog that uses an automatic tool, often a WordPress Plugin, that snatches the content from legitimate blogs, called “scraping”, and uses it as its own with no original content. Some present the content in full posts, a big copyright no-no, or as an excerpt, often as you mentioned, with the “Charles wrote something interesting today” lead-in.
Also, according to Lorelle, “A scraping splog is the worst of both types.”
Precautions First:
When you discover that someone has linked to your post, the first thing you should do before visiting the site to check it’s authenticity is to make sure you have popup blocking software turned on as well as anti-virus software. I use something called Ad-Block-Plus which is an awesome FireFox extension. I highly recommend it. The reason for these precautions is that, it doesn’t take much for you to be infected with something. Especially if you run a Windows based machine that doesn’t have the latest security updates.
Checking The Theme:
The first thing to check for when visiting the source of the trackback URL is the blogs theme. A lot of spammers will generate a blog with the default theme and in the case of WordPress, this theme is called Kubrick. Here is an example of what I’m talking about.
Kubrick is actually a fantastic default theme for WordPress. Quite a lot of people end up using this theme. I also wanted to mention that spammers do use different themes other than Kubrick. In fact, I’ve noticed many of the sploggers are now using themes other than Kubrick. This is when it’s time to evaluate the content of that particular site. But before we move on, I want to show you something that appears on this blog that should never appear on ANYONES blog.
Don’t worry, this is only an image. This is what I found on this particular example of a splog. If you were to click on this banner, you would probably be infected with some sort of adware or trojan even if you were protected by software. No blog should ever have an advertisement like this displayed on their blog. This is a dead give away to get the hell out of there before it’s too late.
Checking Out The Content:
Lets take a closer look at the content posted within the image up above. That post generated a trackback URL on Mashable.com, a very popular website covering social-networking and all that jazz. A good score for the spammer as they are sure to receive some sort of traffic through that backlink. Within this image, the title of the post matches the title of the original post on Mashable. The next dead give away is the text “By Charles“. There is no one on that blog by the name of Charles. In my experience, the spammers software automatically places a random name into the Author Field of the post. This author name usually links to the original post but in this case, the author name is not linked.
Another suspect of a splog is the related content. In the screenshot, you can see the title of the blog is Social Sites News. And since they linked to Mashable, you would think this blog is about social-networking and web 2.0 stuff. So why then, is there a link near the top of the page, to an article about Great Barrier Reef holds drug key to diseases. The reason is because, these spammers use software that resembles search engine spiders. They crawl content across the internet that contains a predefined list of keywords. Once an article is discovered that contains a keyword, the software scrapes the content, and then links to it, generating a trackback or pingback url. Here is some evidence that further substantiates my claims.
Each keyword this splog is targeting is labeled as a category. This is just a sample of the categories listed on this splog. I recognize the fact that there are bloggers out there that blog about A LOT of different subjects and each one of those subjects can be a category. Thankfully, there are other attributes that play into the matter as to whether the site is legit or a splog.
Checking The URL:
I’ve actually taken some flack for this section of the post. I’ve had numerous people tell me that the question mark and the obscure link text is nothing more than proof that the blogger in question doesn’t know about SEO friendly URLs. The 99% claim is not in general, that was a number based on my own experiences.
The question mark that is sometimes included in the URLS that these sploggers generate is nothing more than evidence that either the blogger doesn’t know about SEO friendly URLs, hasn’t bothered to change them, or at the very least, a potential sign that the blog may be that of a splog.
I’ve also been told by Jonathon Bailey to look at the actual domain of the said blog. According to Jonathon, many sploggers are using .info domains because of their cheap price. However, sploggers will use anything they can get their hands on in order to achieve their goal which usually consists of making a profit.
The Default Meta:
I’ve been informed that the default Meta block that is displayed by default on every fresh install of WordPress is not an indication of anything. At first, I thought the login link was a security issue, but Lorelle has reminded me that if someone wants to try to login to gain access to your administration panel, they probably already know the login link thus, making my LOGIN link security issue a moot point.
Blog Postings With Many Misspelled Or ReArranged Words:
Words that don’t make sense, are scraping splogs which run the stolen content through a spinning process, which “translates” the content to make it “different” from the original while staying the same and often injects ad links into the content or keywords that match whatever it is they are selling.
Conclusion:
This is by no means the end all be all of ways of determining a legitimate blog from a splog. These are all tactics that I use for this blog in determining whether a trackback or a pingback is actually legitimate. I will admit, I did comment on a blog one time, thanking them for linking to me. At first glance, they looked pretty legitimate but instead, I found out they scraped the content of a Mashable post and published the entire article word for word. Since the Mashable article linked to me, this splogger also linked to me. After that experience, I told myself that I would closely examine any other site that linked to me to determine it’s legitimacy.
If you feel up to taking on these bastards head on, you can check out a post that Lorelle ( How to Stop Content Theft: The Best Tips ) published on her blog which has tips and suggestions on how to report these time wasters.
I wanted to take this time to remind you that as a blogger, it is your responsibility to ensure that these crappy spammers don’t fill your blog with porn links, or links that would otherwise put your readers in danger. I’m sure Mashable tries to do a good job at combating spam and deleting bogus trackback URL’s, but as my example up above shows, they can’t get every one of them. As a reader, if I were to click a URL on Mashable.com which clearly looked related to the article in question, and that site ended up infecting me, I sure as hell would hold Mashable.com responsible for the infection. Wouldn’t you? If every blogger did their part with their own blogs to combat this problem, I’m pretty sure that spamming blogs would become a business model not worth pursuing.
If you disagree with anything you read in this post, or if you have some additional tips, feel free to post them below.
It can be difficult to tell them apart but they are getting easier to spot.
The problem I notice at the moment though is that many are taking an excerpt which, in the US at least, could fall under fair use. So as distastful as it all is, there isn’t technically anything wrong with it.
I wonder whether the way to prevent these is to set up splogs of our own, i.e. create fake blogs that publish random nonsense on a daily basis under various categories to see if they get scraped. Enough of these and the scraping will become pointless.
You’ve made some good points, though a little excessive on the bold emphasis :-) and I’d like to add some points to help tell the difference between a legitimate blog and a scraping splog and content theft.
A splog is a “spam blog”, a blog with little or no purpose other than to promote or sell something and make the blog owner money. The content is usually made up, or duplicated throughout the different posts, or a collection of post titles and excerpts from a variety of keyword matching posts in a link list.
A scraping blog is a blog that uses an automatic tool, often a WordPress Plugin, that snatches the content from legitimate blogs, called “scraping”, and uses it as its own with no original content. Some present the content in full posts, a big copyright no-no, or as an excerpt, often as you mentioned, with the “Charles wrote something interesting today” lead-in.
A scraping splog is the worst of both types.
The permalink URL that starts with a question mark is not indicative of anything other than the blogger doesn’t know about pretty permalinks. While some sploggers use it, because they don’t pay attention to the details, many do. Rarely do I find sploggers and scrapers now-a-days not using textual links. So the 99% claim has no evidence to support it. There are still some web hosts running old versions of Apache which will not allow such text URLs. I know because I just left one.
The odd and cluttered category list can be a clue, but more likely is an archive list that claims 6498 posts in “uncategorized” in November and December, the only months listed. Big clue something’s wrong.
Words that don’t make sense, are scraping splogs which run the stolen content through a spinning process, which “translates” the content to make it “different” from the original while staying the same and often injects ad links into the content or keywords that match whatever it is they are selling.
There are other blatant clues, but these are the biggies. I hope they help.
@AndrewI’m not sure if that would work or not. Seems to me like that would aggravate the problem even further rather than helping.
@Lorelle Sorry about the bold emphasis. I try to make all of the points in bold so the article is easier to read but I guess I went a little over the top in this article. I visited your site and it looks like you have covered this subject pretty well on your own blog. Thanks for stopping by and sharing those tips.
First off, great article with some good ideas. It is important to note that none of these tips are the end all or be all of detecting spam blogs. As you’ve pointed out, many sploggers change themes, URLs, etc.
An additional tip is to look at the posting frequency. If the blog sees dozens of posts of day, it is probably at best an aggregator, at worst outright spam. Either way, no point visiting.
The second tip is to look at the domain. If the splog has its own domain, it will probably either be a nonsense domain with numbers in it, or a .info domain.
A third tip, and final one for now, is to look at the service. Blogspot is used by many legit bloggers, but it is also very popular among spam bloggers. Be more wary of services that attract more spammers.
In a real quick comment to Andrew, the idea sounds nice but won’t work. First, it will increase the amount of junk out there and, second, it will not trick scrapers. Spam bloggers can easily obtain lists of legit RSS feeds, the same as email spammers can obtain lists of email addresses. It might slow them down, but it won’t stop them by any stretch.
All in all, a great article with some good discussion. If you have any cases you need help with, feel free to drop me a line!
@Jonathan Bailey Heh, I feel honored that some big names are stopping by and leaving some input.
I did note that this article was not the end all be all, it was merely a list of tactics that I use for this particular blog.
I also wanted to add that you are right about the domains. Someone on a forum I posted to responded that alot of spammers seem to enjoy using .info domains because they are cheap. I’ve also noticed that spammers use subdomains and I think the reason is, most webhosting companies offer customers the ability to create unlimited sub domains which could be one of the bigger aspects of this entire problem.
Thanks for stopping by Jonathan, I really do appreciate it.
Sorry, I wasn’t clear, wasn’t trying to say that you didn’t say it, just driving home the point. If anyone can find a way to tell splogs from blogs definitely, email me immediately as I will be very eager to hear!
You’re very welcome for the comment though, I only have one question. When did I become a big name? I don’t think anyone has used that to describe me before…
@Jonathan Bailey Venturing around the blogosphere means coming across posts that discuss plagiarism and copyrighting and all that jazz. I see a lot of mentions to you and your blog all the time. If Lorelle mentions you, your a big name :P
I’ve been having quite a few problems with “Splogs” lately, and I find most of them are on a subdomain, of a domain with a dash in it, but thats just me..
Recently, one of my articles was featured on 9to5Mac.com, and AppleTell.com, which of course got “sploged”, and since my name was linked in their articles, I got pingbacks for a LOT of splogs.
Oh, and this is my favorite pingback ever… on a MacApper post. :P
Nice Article :)
@ Jeffro
I suppse by those standards I am a big name. I have indeed been graced by the hand of Lorelle. Still, I refuse to call myself as such as I don’t want to wear a cape or have to get a huge ego full of self-importance. Not my style.
I like being little.
@ Chris
That is a classing pingback! Great stuff.
LOL! Hey, I was going to say that!!!
And Jeffro, I’m glad you consider yourself lucky. You’ve made some very good points, and gotten some great help. Flesh out the article a little more and you have a winner I’d be glad to promote.
@Chris Thomson Thanks for submitting the story to Digg and glad you enjoyed the article.
@Jonathan Bailey It’s ok. I see where your coming from.
@Lorelle Lorelle, I’m not sure what I should be changing in the article. Whatever wasn’t covered was taking cared of within the comments. I’m a bit confused as to what I’m suppose to flesh out.
Ok, Lorelle and to others, I’ve readjusted the post, included things you guys have told me (Lorelle and Jonathan) so I hope the post is A ok now.
Also, delete my comments on how to “fix” the article, the ones here at the end. Our advice in the beginning is fine, but when people come visiting, you want to show off your very best.
Which could also mean removing the strike out and turning that into more of an explanation of the “myth” that many believe the ?p=21 URL is a sign of a splog, but not necessarily. The same holds true for .info and foreign country extensions. While .info domains are cheap, and many sploggers use them, sploggers and scrapers will use anything they can, and many are actually experts in using WordPress. It’s a powerful point that deals with the facts not the assumptions, which is the point of your article.
@Lorelle Ok dokey Lorelle, I’ve deleted the comments and I’ve changed the article around a bit. I’ve changed the sections that deal with the URL, the .info domains and I’ve removed the striked out portion of the text. Thanks for your time and I appreciate the tips.
@Jonathan On an un related note, congrats Jonathan on your co-hosting gig on the WordPress podcast :) I think you’ll make a fine addition to the show.
Give me time. I’m not a great addition yet but I just got drafted. Still going through basic training :)