Back in college, my roommate had a favorite saying. Someone would do somthing dumb, rapidly followed by the unplesant consequences of those actions, and he would trot it out:
You done brang it on yourself.
I’ve used that line over the years to great effect (just ask my kids). I’ve also used another, related expression as well:
Play stupid games, win stupid prizes.
They both speak to the same point: The stuff we do has consequences.
Sometimes, we’re prepared to face those consequences because we can see them coming. Sometimes we can’t. Sometimes, we let ourselves be convinced that we can be careless about the things we do, and there won’t be any consequences.
But there are. There always are.
I’ve ranted in the past about how much I dislike the way that content management systems (CMS) like WordPress (The WebApp Hacker’s BFF™) get marketed:
“Build simply. Create any kind of website. No code, no manuals, no limits.”
You see, you can’t let a chunk of software do your thinkin' for you - because you never know when that software is going to do something stoopid1. Worse still, if you go ahead and spew out a WordPress site while having no idea how any of this stuff works, you likely won’t even notice when WordPress does that stoopid thing.
What stoopid thing has WordPress done now?
Simple. By default, the results of the built-in search for WordPress returns pages that aren’t marked with a
<meta name='robots' content='noindex'> tag.
If you’re looking at that and thinking hmmm… gibberish then let me explain. HTML, the language used to create web pages, is what is known as a “markup language.” Essentially, as a markup language, it takes the text of the web page, and adds a number of special tags to “markup” the text in a way that tells your browser how to display content on the page. When I want to create italics, the text to be italicized is placed between two special tags:
<i>Italicize this</i>. Back in the good old / bad old days, word processors worked this way as well… until the advent of the WYSIWYG (What You See Is What You Get) interface. I have both happy and horrific memories of the tags in WordPerfect (an ancient, non-WYSIWYG word processing program I used 25 or so years ago…)
There are several special markup tags in HTML that have nothing to do with formatting the webpage itself, but are used for other purposes. One of those, the
<meta> tag, is used to pass along various types of information about the page itself. These
<meta> tags are used to describe keywords about the page, or to give directions to various tools that might “consume” the page for various purposes. One of those tools would be a search engine, like Google, that would scan the page so that it can be indexed in a way that other people can find it.
Which leads us to the “noindex”
<meta> tag. Sometimes, you just don’t want a search engine to index a page from your site. There are lots of reasons you might want to do this: perhaps the page is transitory (a page generated from a search or for printing) or a different version of a standard page created only for mobile browsers. Generally speaking, something like the transitory results of a search page should not be indexed. It just doesn’t make much sense for… well… almost every site, so search results should be marked with a
<meta name='robots' content='noindex'> tag by default. If you have a specific reason for wanting your search results to be indexed, then you would probably have the ability to figure out how to disable this default behavior.
tl;dr: The default behavior for search results should be
But it isn’t.
Not in WordPress.
Thank you, WordPress.
“What’s the big deal," I hear you ask? “So a bunch of ‘Net yokels have their search result pages indexed by Google…"
But it is a big deal.
A very big deal. Check this out:
This is why we can’t have nice things. What do all of these sites have in common? They’re run on WordPress.
Every time WordPress makes a stoopid mistake like this, there’s a line of scammy, scummy, little bastards salivating to turn it to their advantage. In this case, the part of the bastards is being played by purveyers of “research” papers trying to boost the search engine placement of their site using less-than-legitimate methods. By getting their “buy research papers cheap” site mentioned on LOTS of legitimate sites, they’re seen as more popular and therefore get placed closer to the top of search results. It’s called Search Engine Optimization (SEO), and this is a pretty sleezy way to do it. (Note: there are actually legitimate SEO methods - this isn’t one of them.)
In this case (I believe) here’s how it works.
- The scammer adds a triggering link to a site that gets indexed by Google
- This link is a URL to a mainstream WordPress-based site, with parameters that trigger a search page with the information for the site the scammer wants to boost
- Google spiders the site, sees the triggering link, and adds it to the list of pages to index
- Google eventually spiders the link (with the search parameters attached)
- The default behavior of WordPress is to include the search terms on the generated search page.
- Because the WordPress search results page isn’t tagged as
noindexthe page is added to Google’s index for the legitimate site
- The bastards remove the link, rinse and repeat
Because the page content is generated by the URL, every time Google returns to check the page, it will see that content again. As far as Google is concerned, that search page (and all of that SEO stuff) is part of the victim site. FOREVER.
So… what’s to be done? Well, obviously, the folks at WordPress need some learnin’. (Please note: As my buddy Don “Cutaway” Weber always says: Some folks need to get education from a book, others need to get learnin' from a stick…) Here’s the stick: Yo! WordPress! Why the hell do you not mark search pages
noindex by default? Seriously!?!? You know all those people who you’ve convinced to “create any kind of website with no code, no manuals, and no limits?” You’re potentially placing their reputations at the mercy of any unscrupulous jerk who wants to use their site to boost the search engine placement of their less-than-legitimate business. Not cool. Not cool at all.
As mitigation, you can use a plug-in like Yoast SEO that automatically marks search pages
noindex (as God intended) until the WordPress devs get their heads screwed on right and fix2 this.
Owner, Principal Consultant
Bad Wolf Security, LLC
Senior Technical Engineer
March 2, 2021
1 This is how we spell "stupid" 'round here. Generally, for effect. There's "stupid" and then there's "stoopid."
2UPDATE (3/3/2021): It looks like someone else has already pointed this out to the WordPress folks and they've fixed it in their upcoming (next week!) release, 5.7. See here for the ticket tracking this issue.
I’ve done some checking, and if a
noindex meta tag appears on these pages, Google (and other search engines) should remove the pages from their index. That’s good news for the Internet, and bad news for the SEO hackerz.
I’m heartbroken for them.