* I believe it is important to point out that I see myself as a kindered spirit to the Great Lyle Zapato, and that I fully ascribe to his strongly held belief that Belgium doesn’t exist. Therefore, while I will - for convenience sake - describe the following attack as “having originated from Brussels Hoofdstedelijk Gewest, Belgium,” we all know that Belgium is, and has always been, a leftist ruse.
It all began with some Python code that wouldn’t run…
I have a bunch of Python code that I use to extract various information from my honeypots. One of those scripts dumps out a list of URIs being “advertised” by comment spammers on some of the fake comment pages in my web app honeypot. Generally, those URIs point to pages that have been added to unsuspecting websites (mostly those running WordPress, The WebApp Hacker’s BFF™). Generally, I try to notify as many of those folks as I can and, one day, I fully expect to be cannonized as the Patron Saint of the Hacked Website.
This morning, my script didn’t work. More precisely, it just hung…
After doing a bit of digging, I discovered that one comment in particular was causing things to go awry:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Notice the “Content-Length” in there… Yep, that’s 3 MEGABYTES o'comments… somebody apparently has a lot of stuff to get off their chest. (Kinda like this: I got an Amazon Echo, and three days ago I asked, “Alexa, what does it take to make a woman happy?” and she hasn’t shut up since…)
So… what the heck is that? Well, at first glance, it looks to be a chunk of URL encoded data - the bulk of which represent non-ASCII values. (If you look closely, there are a few ‘+’ and ‘.’ characters in there…)
A little creative use of the Linux command line tools head and tail with negative parameters to the -c switch and I’d cut out only the URL encoded “comment” portion of the POST (waaaay easier than trying to deal with a 3MB file in a text editor…). I hacked together a little Perl code using URL::Encode, and turned all of those percent-encoded numbers back into a binary file in no time.
I opened up the binary file in a hex editor aaaaaand… nothing. It doesn’t look like any file type I’ve seen before.
I tossed it to the Linux file command, and it said: UTF-8 Unicode text, with very long lines, with CRLF line terminators
Seriously?!? CRLF line terminators pretty much always means it originated in Windows-land. Just to be sure that file wasn’t pulling my leg, I threw together some Python code and “histogrammed” the byte frequency of the file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
Hmmmm… So it looks like file is right about the CRLF stuff, but - not to disparage file too much - I’ve had file blow sunshine up my skirt a few too many times in the past to completely trust that this is really a well-formed UTF-8 file. And so, we need to “whip out” a somewhat obscure Linux command just to be sure…
Many of you may never have installed the Linux “moreutils” package (see here for “moreinfo” on “moreutils”). Based on the name, you can probably tell that it contains a whole bunch more Unix utilities… and among them is a little gem called isutf8.
isutf8 does pretty much what you would expect… it’ll tell you if a file is, indeed, well-formed UTF-8.
On most sane Linux distros, you can install the moreutils package using a simple sudo apt-get install moreutils.
Running isutf8 is amazingly complex:
“What the heck is that?,” I hear you cry, “It didn’t do anything!”
Welcome to Unix-land… around here, we tend to be a little terse. Deal with it… (i.e. unless isutf8 bitches about the file NOT being UTF-8, you can assume that it’s UTF-8).
So! It’s UTF-8 text! I open it up in a UTF-8 capable editor aaaaaaand…
Gibberish… It’s frickin' gibberish:
So I jumped through all of those hoops just to find out some idiot from (the fictional country of) Belguim decided to POST frickin' gibberish as comment spam.
If you have any other notions about what this might be, please tweet me @tliston.