Copyright Your RSS Feed With FeedEntryHeader

FeedEntryHeader is a WordPress plugin written by Stephen Cronin. Like so many other bloggers, Stephen has been battling the scraping issue. FeedEntryHeader provides WordPress bloggers with a way of adding a copyright statement and a link to the original article which will show up at the top of your feed entries.

Why the top and not the bottom? According to Stephen, the copyright that appears at the bottom of your feed entries has little impact because of it’s location. Also, if a splogger is scraping an excerpt of your content, chances are, the copyright statement won’t be added into the entry.

You can customize the text that is shown in the copyright area by changing the HTML and associated tags. However, the default message should be fine for most people.

As Stephen and the commenter’s on his blog note, this will not stop scrapers, but this is a step in the right direction. All WordPress bloggers should install this plugin as it provides another opportunity for us to fight back against these jerks.

WordPress Plugin Updates

Digging into my Plugin area today, I noticed a few plugins that required updating and figured I’d let you know about them in case you have update notification turned off.

First up is Google XML Sitemaps which is now up to version 3.0.2.1 The changes for this release are:

Fixed wrong XML Schema Location (Thanks to Emanuele Tessore)
Added Russian Language files by Sergey http://ryvkin.ru

Next up is WP AJAX Edit Comments WP Ajax Edit Comments is now up to version 1.1.2.1

Added Italian Language file. Thanks Piplos
Added Russian Language file. Thanks Sergey.

Simple Pie has also released update and is now up to version 2.1.2 SimplePie Plugin for WordPress

2.1: Added support for feed post-processing, better error handling, and fixed issues with installing in the wrong location.

Live Comment Preview is now at version 1.8.2

The 1.8.1 release fixes a bug in 1.8 that affects those who have WordPress files setup in a different directory than their site url. If you have any problems with this release, please post a reply with a description of the problem and any error messages you are receiving.

Last but note least, QuickPost, the Tumblr like plugin for WordPress has been updated to version 0.6

0.6 – Finished Safari Support; Added stripslashes for titles that have apostrophes; Minor Change to Blockquote formatting

Splog And Blog – Tell The Difference

If you are a blogger, chances are, you have either dealt with spammers already, or will be doing so in the future when your blog becomes more popular. These days, spammers are using any means necessary to get their links on your blog. These tactics include link filled comments, bogus pingbacks and bogus trackbacks. What I’m going to focus on within this article is deciding, whether a pingback or trackback is coming from a legitimate blog or not.

The example I use in this post will be from a random site that is attributed to a bogus trackback url that was found on a Mashable.com post. I won’t be directly linking to the example site because that is what those spamming bastards want. Determining whether a blog is fake or real is easy once you figure out the patterns. Granted, these patterns change from time to time, here is a collection of tactics I use to determine if a blog is fake or not.

What Is The Difference Between A Splog And Scraper?

Special thanks to Lorelle who stopped by and added her definition for these two terms in the comments section of this post.

A splog is a “spam blog”, a blog with little or no purpose other than to promote or sell something and make the blog owner money. The content is usually made up, or duplicated throughout the different posts, or a collection of post titles and excerpts from a variety of keyword matching posts in a link list.

A scraping blog is a blog that uses an automatic tool, often a WordPress Plugin, that snatches the content from legitimate blogs, called “scraping”, and uses it as its own with no original content. Some present the content in full posts, a big copyright no-no, or as an excerpt, often as you mentioned, with the “Charles wrote something interesting today” lead-in.

Also, according to Lorelle, “A scraping splog is the worst of both types.”

Precautions First:

When you discover that someone has linked to your post, the first thing you should do before visiting the site to check it’s authenticity is to make sure you have popup blocking software turned on as well as anti-virus software. I use something called Ad-Block-Plus which is an awesome FireFox extension. I highly recommend it. The reason for these precautions is that, it doesn’t take much for you to be infected with something. Especially if you run a Windows based machine that doesn’t have the latest security updates.

Checking The Theme:

The first thing to check for when visiting the source of the trackback URL is the blogs theme. A lot of spammers will generate a blog with the default theme and in the case of WordPress, this theme is called Kubrick. Here is an example of what I’m talking about.

Default WordPress Theme

Kubrick is actually a fantastic default theme for WordPress. Quite a lot of people end up using this theme. I also wanted to mention that spammers do use different themes other than Kubrick. In fact, I’ve noticed many of the sploggers are now using themes other than Kubrick. This is when it’s time to evaluate the content of that particular site. But before we move on, I want to show you something that appears on this blog that should never appear on ANYONES blog.

Adware On A Blog

Don’t worry, this is only an image. This is what I found on this particular example of a splog. If you were to click on this banner, you would probably be infected with some sort of adware or trojan even if you were protected by software. No blog should ever have an advertisement like this displayed on their blog. This is a dead give away to get the hell out of there before it’s too late.

Checking Out The Content:

Lets take a closer look at the content posted within the image up above. That post generated a trackback URL on Mashable.com, a very popular website covering social-networking and all that jazz. A good score for the spammer as they are sure to receive some sort of traffic through that backlink. Within this image, the title of the post matches the title of the original post on Mashable. The next dead give away is the text “By Charles“. There is no one on that blog by the name of Charles. In my experience, the spammers software automatically places a random name into the Author Field of the post. This author name usually links to the original post but in this case, the author name is not linked.

Another suspect of a splog is the related content. In the screenshot, you can see the title of the blog is Social Sites News. And since they linked to Mashable, you would think this blog is about social-networking and web 2.0 stuff. So why then, is there a link near the top of the page, to an article about Great Barrier Reef holds drug key to diseases. The reason is because, these spammers use software that resembles search engine spiders. They crawl content across the internet that contains a predefined list of keywords. Once an article is discovered that contains a keyword, the software scrapes the content, and then links to it, generating a trackback or pingback url. Here is some evidence that further substantiates my claims.

Categories Of Keywords

Each keyword this splog is targeting is labeled as a category. This is just a sample of the categories listed on this splog. I recognize the fact that there are bloggers out there that blog about A LOT of different subjects and each one of those subjects can be a category. Thankfully, there are other attributes that play into the matter as to whether the site is legit or a splog.

Checking The URL:

I’ve actually taken some flack for this section of the post. I’ve had numerous people tell me that the question mark and the obscure link text is nothing more than proof that the blogger in question doesn’t know about SEO friendly URLs. The 99% claim is not in general, that was a number based on my own experiences.

The question mark that is sometimes included in the URLS that these sploggers generate is nothing more than evidence that either the blogger doesn’t know about SEO friendly URLs, hasn’t bothered to change them, or at the very least, a potential sign that the blog may be that of a splog.

I’ve also been told by Jonathon Bailey to look at the actual domain of the said blog. According to Jonathon, many sploggers are using .info domains because of their cheap price. However, sploggers will use anything they can get their hands on in order to achieve their goal which usually consists of making a profit.

The Default Meta:

I’ve been informed that the default Meta block that is displayed by default on every fresh install of WordPress is not an indication of anything. At first, I thought the login link was a security issue, but Lorelle has reminded me that if someone wants to try to login to gain access to your administration panel, they probably already know the login link thus, making my LOGIN link security issue a moot point.

Blog Postings With Many Misspelled Or ReArranged Words:

Words that don’t make sense, are scraping splogs which run the stolen content through a spinning process, which “translates” the content to make it “different” from the original while staying the same and often injects ad links into the content or keywords that match whatever it is they are selling.

Conclusion:

This is by no means the end all be all of ways of determining a legitimate blog from a splog. These are all tactics that I use for this blog in determining whether a trackback or a pingback is actually legitimate. I will admit, I did comment on a blog one time, thanking them for linking to me. At first glance, they looked pretty legitimate but instead, I found out they scraped the content of a Mashable post and published the entire article word for word. Since the Mashable article linked to me, this splogger also linked to me. After that experience, I told myself that I would closely examine any other site that linked to me to determine it’s legitimacy.

If you feel up to taking on these bastards head on, you can check out a post that Lorelle ( How to Stop Content Theft: The Best Tips ) published on her blog which has tips and suggestions on how to report these time wasters.

I wanted to take this time to remind you that as a blogger, it is your responsibility to ensure that these crappy spammers don’t fill your blog with porn links, or links that would otherwise put your readers in danger. I’m sure Mashable tries to do a good job at combating spam and deleting bogus trackback URL’s, but as my example up above shows, they can’t get every one of them. As a reader, if I were to click a URL on Mashable.com which clearly looked related to the article in question, and that site ended up infecting me, I sure as hell would hold Mashable.com responsible for the infection. Wouldn’t you? If every blogger did their part with their own blogs to combat this problem, I’m pretty sure that spamming blogs would become a business model not worth pursuing.

If you disagree with anything you read in this post, or if you have some additional tips, feel free to post them below.

Letting Spam Loose For A Day

Akismet Logo

Mark your calenders because on December 15, 2007, WordPress user’s across the blogosphere will be turning off Akismet. Ok, not really. But Jesper Rønn-Jensen has decided to do it. He calls it, Spam Filter Free day where he will disable the Akismet anti spam tool on his blog for 24 hours to figure out, just how much work Akismet does for him. It’s an ambitious project and I can only imagine how much time it will take to clean up the mess after the event is over with.

I’ve seen numerous bloggers writing posts which state that Akismet is asking for us to disable our spam filter on this day and then report back to them with the results. This is not the case. Akismet merely brought Jesper’s post to the forefront and asked if anyone else would be willing to go through with it. If so, Akismet would love to hear back from you.

I’ve decided not to go through with the project. Like so many others that commented on Lorelle’s article, (Are You Willing To Go Naked For One Day For Akismet) I can see just how much work Akismet has saved me from doing by looking at the spam filter statistics. So far, Akismet has protected this site from 4,528 spam comments. I’ve left my blog alone for more than 24 hours and when I come back, I have to sift through over 100 or more spam comments to see if Mike was flagged as a spammer. Akismet is not perfect, but it does a damn fine job of blocking a lot of spam.

So will you be going naked on December 15?

A New Spin On Blog Spam

According to Lorelle, blog spammers have developed a new technique of scraping a blogs content and then publishing it on their own blog. The new technique centers around the use of WordPress plugins that excel in scraping the content and then using software or other plugins to replace certain words with synonyms. The result? The same old same old.

Here is an example of some text from an article that Lorelle wrote.

Yesterday, I wrote an analogy of comparing blogging to dancing, and how it helps to know the steps, but I also addressed the issue of blogging in your native language compared to blogging in English.

Words carry a responsibility. They convey meaning. They reek with intent. Change a word and you change the meaning.

And here is the text scraped from the article, with certain words replaced with synonyms.

Yesterday, I wrote an faith of scrutiny blogging to dancing, and how it helps to undergo the steps, but I also addressed the supply of blogging in your autochthonous module compared to blogging in English.

Words circularize a responsibility. They intercommunicate meaning. They exudate with intent. Change a word and you modify the meaning.

I don’t know about you, but I have never, ever, heard of the word autochthonous before. Does it even exist? At any rate, if you compare the two excerpts, it’s clear that the second one is obviously some sort of spam. I realize their are people out their who write in this fashion as English is not their native language. However, since the text IS in English, it has to be noted that there is no way a human being would write something like that. It comes down to common sense.

In the end, this is a new technique that is netting the same results. Crappy look a like posts which don’t gain any value for the spammer, unless the trackback link makes it through the spam filter.

Near the end of the article, Lorelle goes on to discuss various aspects of copyright law and if this new spamming technique violates a bloggers copyrights. Here is a published quote on her blog from Jonathon

Fortunately, the law is very clear on this subject. Copyright is not merely the right to copy one’s own work, but a set of rights that includes the right to create derivative works…This right to create derivative works covers the right to create translations and any other work based on copyrightable portions of the original. Spinning, since it starts with a copyright-protected work and creates a new work based upon it, violates that right.

Fair use arguments fall equally flat in the eyes of the law. Spinning is not transformative as it is designed to replace the original, it offers no commentary or criticism, it is for commercial use, it can greatly harm the market for the original work and usually is unattributed. There is almost no fair use argument left for the spammers who modify the posts they scrape, leaving the door wide open for rightsholders to take action.

Interesting, but here is my point regarding this mess. You’re more likely to waste time and energy going after these sploggers than actually accomplishing anything worthwhile. Most of these sploggers are automated, meaning they can be tracked to a particular location, but the only thing you’ll find is a machine with a programmed set of instructions. The reality of the situation is that, spam, splogging, feed and content scraping are all part of the game known as blogging. It happens and there is no PRACTICAL solution to combat the problem.

Here are some tips to help you go up against a content scraper:

Do as my friend Brad of Strangework.com has done and add a text link that says something like “By NAMEOFBLOG“. Because sploggers scrape the entire content of the post, this link will always be presented in the spammers post which will not only raise a red flag that the post was stolen, but will allow people to follow the link back to the source.
Instead of publishing the FULL RSS FEED, switch to only publishing a PARTIAL FEED. I don’t like partial feeds and neither do alot of other people but it helps in dealing with the spam issue.
If you notice a trackback URL on one of your posts, be sure to visit the blog the link points to. If the offending site has posts covering all sorts of topics with no rhyme or reason, chances are, it’s a spam blog. Instead of deleting the URL track back, submit it to Akistmet by selecting the SPAM it option within your commenting admin panel.

This post has inspired me to write up another article called, ‘What To Look For On A Blog You Suspect To Be A Splog‘. Look for that in the coming days.

What do you think of the issue of content scraping and splogging in general? If you’re a blogger, let me know how you do deal with issues and what you look for when deciding if a comment or trackback url is considered spammy.

Chat With Me Tonight

I wanted to extend an invitation to join me later tonight as I sit on my blog and do nothing. Ok, not exactly. I’ll be hanging around the blog from 8PM EST until 9 or 10PM. Perhaps you’ve always wanted to ask me a question or get to know me better. You’ll be able to ask questions and talk to whomever else decides to show up by using the ShoutBox located on the right most side bar of the blog. The shoutbox refreshes in real-time so it’s like a blog wide IM.

Hope to catch a few of you lurkers in their as well as the regulars.

WordPress Makes Up 0.8 Percent Of The Net

In the grand scheme of things, Mullenweg said he wants the future of the Web to be open source; and he hopes to get more people using open-source platforms to write their blogs, even if it’s not WordPress. But he’s obviously driven competitively, too. (His blog ranks No. 1 on Google because of all the links back to his site from WordPress.) He recently saw a survey from Google, in which the search giant examined all of the http headers of Web. He found that .8 percent of those pages were powered by WordPress. “That’s how far we’ve come, but we have a lot of work to do,” he said.

WordPress founder looks into blogging\’s future | Tech news blog – CNET News.com

Isn’t that amazing? If you compute the numbers, this means that one out of every 125 pages on the web is powered by WordPress. That is a VERY general observation and there is no break down of metrics that state which ones are using WP for a front end, versus which sites are using WordPress as a full fledged blogging solution. Still, this particular stat is amazing and gives credence to the fact that WordPress is on top of their game right now!

Use WordPress Like Tumblr

QuickPost Plugin Logo While browsing around the WordPress.org plugin database for something completely different, I happen to stumble upon a plugin that was inspired by the Tumblr bookmarklet. It doesn’t have every feature of the Tumblr bookmarklet, but it doesn’t have to.

The plugin is called QuickPost and was developed by a company called Twelve Horses. After downloading, installing, then activating the plugin, you’ll have to go into the QuickPost options area and select a default category for each bookmarklet tab.

Setting Default Categories

The bookmarklet supports the following types of posts: Quotes, Text/Links, Photos, and Videos.

QuickPost Tabs

For those that are using FireFox, installing the bookmarklet is as simple as dragging a button to your bookmark toolbar. The process is a little more complicated if your using Internet Explorer.

Dear Internet Explorer Users: Yours is a harder path to walk.. Right click the bookmarklet below and select “Add to favourites”. Your IE will probably tell you that this is an “Unsafe bookmark to add”. Ignore your smart arse browser and click OK. The setup will thus be completed.

Although the bookmarklet doesn’t support every thing the Tumblr version does, I don’t feel as if it needs to. I think this bookmarklet covers the majority of content most people post to their Tumblr blog or regular blog. One thing about this plugin is that it does not allow you to preview the post before it’s published.

There is a checkbox that is part of the plugin options that allows you to use the WYSIWYG text editor but I ‘d rather see the option of choosing between both when making a post. The reason being is that, it’s pretty difficult to post YouTube Embed HTML codes into the WYSIWYG editor without it screwing up. I get around this by switching to the CODE view of the post and making sure that embedding the video is the last thing I do when creating the post.

Thanks to this plugin, I’m seriously considering disbanding my Tumblr account and using this blog as my Tumbelog and everything else blog. After all, lifestreaming seems to be where everything is headed so perhaps setting the site up this way allows me to get a head start on the trend.

P.S. The blog post before this one was actually me testing the QUOTE function of the QuickPost plugin. Apparently, it works.

AskApache And Google XML Updates

I finally got the chance to upgrade the AskApache Google 404 Ajax Search plugin along with the Google XML Sitemap plugin. I’m still trying to determine what is new in the Ajax plugin which I’m sure AskApache will probably stop by and let us know. As for the Google XML Sitemap plugin, here is a short list of changes that occurred.

Changed HTTP client for ping requests to Snoopy
Added “safemode” for SQL which doesn’t use unbuffered results
Added option to run the building process in background using wp-cron
Added links to test the ping if it failed

Make sure you head to AskApache Google 404 and Google XML Sitemaps to download the updated plugins and install them on your blog if you’re using them.

One special note for those who use the AskApache plugin. I noticed the directory that houses the plugin files had it’s name changed. When you upload this plugin, make sure to delete the old AskApache directory so they don’t conflict.

Thoughts and Insanity From Jeff

The Musings Of Jeff Chandler

Wordpress