* I believe it is important to point out that I see myself as a kindered spirit to the Great Lyle Zapato, and that I fully ascribe to his strongly held belief that Belgium doesn’t exist. Therefore, while I will - for convenience sake - describe the following attack as “having originated from Brussels Hoofdstedelijk Gewest, Belgium,” we all know that Belgium is, and has always been, a leftist ruse.
It all began with some Python code that wouldn’t run…
I have a bunch of Python code that I use to extract various information from my honeypots. One of those scripts dumps out a list of URIs being “advertised” by comment spammers on some of the fake comment pages in my web app honeypot. Generally, those URIs point to pages that have been added to unsuspecting websites (mostly those running WordPress, The WebApp Hacker’s BFF™). Generally, I try to notify as many of those folks as I can and, one day, I fully expect to be cannonized as the Patron Saint of the Hacked Website.
This morning, my script didn’t work. More precisely, it just hung…
After doing a bit of digging, I discovered that one comment in particular was causing things to go awry:
POST /comments HTTP/1.1\r\n Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n Accept-Encoding: gzip, deflate\r\n Accept-Language: en-GB,en;q=0.5\r\n Connection: keep-alive\r\n Content-Length: 3100425\r\n Content-Type: application/x-www-form-urlencoded\r\n Dnt: 1\r\n Host: <redacted>\r\n Referer: http://<redacted>/index\r\n User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0\r\n\r\n comment=%C3%81%2F%C3%8C%C3%BA%7D%C3%8F%40%2C%C3%BD%C3%9D%C3%93_%C3%93%C3%89%C3%97_%C3%82%C3%8E%C2%BB%C2%A4 %C2%BD%C2%AA%C3%9C%C3%8F%C2%BA%C3%8B%C3%BE%C2%AC%C3%A5%3B%C2%A5%C2%A4%C3%BE%C3%B3%25%C3%A0%5C%C2%B2%C2%B5 %C2%B5%3E%C3%AA%C3%95%2B%C2%A1%C3%91%2B%C3%AF%C3%80%7B%C3%90%C3%AB%28%3D%C3%A6%C2%AB%C3%92_%C3%9A.%C3%87%C3 %A0%21%29%C3%B9%C3%8A%23%C3%8A%C3%9C%C3%BF%C3%A7%C2%B4%3F%C2%A9%7B%C3%99%C3%A7%C3%99%C2%B1%C2%B6%C3%96%C3%84 %C2%A7%C2%B8%C2%B1*%C2%B8%C3%B7%C3%92%C3%A4%C2%B6%C3%AB%C3%A1+%C2%AB%22%60%C3%94%C2%BD%60%5C%C3%AE%24%C3%BF %C2%AF%21%C2%B1%C3%A3%C2%BD%C3%BF%24%C3%BB%C3%A8%C2%A8%C2%AC%3F%C2%B8%C2%AC%C2%B2%C2%B4%C2%A8%C3%94%C2%BD %C2%A7*%C3%BB%60%C3%94%C3%9A%C3%86%C3%BD%3C%C3%A5%C3%B3%C3%8E%3F%C3%B6%C3%90%C3%8B%C3%8F%29%60%C2%BF%27%C3 %B1%C3%83%5C%C2%B8%C3%9D%40%C3%9D%C3%A7%C3%9C%C3%8A%C3%B8%21.%7E%60%C2%B2%C2%A4%7D%C2%BA%C3%A3%3D%C3%B0%C2 %BF%C2%AC%C2%B4%C3%A6%C3%88%7E%C3%9B%C2%B7%C2%A2%C3%A9%3D%C3%90%5E%C2%BB%C3%A6%C3%B0%5E%C3%A5%C3%9D%C2%AC%C3 . . . %C2%BA%23%C2%A7%C3%AC%C3%B9%5C%C3%85%C2%A1%C3%B0%2C_%40%C3%A3%C3%92%3C%C3%B8%C3%AE%3A%C3%AF%C3%8E%C3%A7%C3 %B9%C3%B7%C3%80%C3%B0%C2%B1%C3%86%5C%3F%2B%C2%BC%60%C2%AA%C3%84%C2%B2%C2%BA%C3%B7%C2%A8%C2%A7%60%C2%BC%C2 %AB%C2%AF*%7D%C2%BE_%C3%96%C3%9A%5E%5D%C2%BD%C3%90%C3%85%C3%89%C3%B0*%C3%8E%C3%AE%C2%AF%21%C3%A0%C3%86%C3 %B0%C3%BA%28%C3%A8%C2%B8%C3%80%C3%92%7D%C3%83%C3%B1%C3%9A%C3%A4%C2%A5%C3%BD%C3%84%C3%B7%C3%99%C2%A6%29%28 %2B_%C3%9A%C3%95%26%C2%A1%C3%8F%C3%8D%C3%94&submit=Submit
Notice the “Content-Length” in there… Yep, that’s 3 MEGABYTES o’comments… somebody apparently has a lot of stuff to get off their chest. (Kinda like this: I got an Amazon Echo, and three days ago I asked, “Alexa, what does it take to make a woman happy?” and she hasn’t shut up since…)
So… what the heck is that? Well, at first glance, it looks to be a chunk of URL encoded data - the bulk of which represent non-ASCII values. (If you look closely, there are a few ‘+' and ‘.' characters in there…)
A little creative use of the Linux command line tools
tail with negative parameters to the -c switch and I’d cut out only the URL encoded “comment” portion of the POST (waaaay easier than trying to deal with a 3MB file in a text editor…). I hacked together a little Perl code using URL::Encode, and turned all of those percent-encoded numbers back into a binary file in no time.
I opened up the binary file in a hex editor aaaaaand… nothing. It doesn’t look like any file type I’ve seen before.
I tossed it to the Linux
file command, and it said: UTF-8 Unicode text, with very long lines, with CRLF line terminators
Seriously?!? CRLF line terminators pretty much always means it originated in Windows-land. Just to be sure that file wasn’t pulling my leg, I threw together some Python code and “histogrammed” the byte frequency of the file:
0x0A = 29 0x0D = 29 0x21 = 4830 0x22 = 4726 0x23 = 4800 0x24 = 4746 0x25 = 4772 0x26 = 4715 0x27 = 4832 0x28 = 4727 0x29 = 4816 0x2A = 4757 0x2B = 9509 0x2C = 4723 0x2E = 4728 0x2F = 4869 0x3A = 4801 0x3B = 4693 0x3C = 4827 0x3D = 4785 0x3E = 4814 0x3F = 4758 0x40 = 4712 0x5B = 4797 0x5C = 4773 0x5D = 4724 0x5E = 4799 0x5F = 4765 0x60 = 4789 0x7B = 4902 0x7C = 4790 0x7D = 4834 0x7E = 4722 0x80 = 4645 0x81 = 4845 0x82 = 4925 0x83 = 4712 0x84 = 4686 0x85 = 4719 0x86 = 4766 0x87 = 4855 0x88 = 4705 0x89 = 4718 0x8A = 4608 0x8B = 4829 0x8C = 4662 0x8D = 4805 0x8E = 4742 0x8F = 4681 0x90 = 4715 0x91 = 4710 0x92 = 4800 0x93 = 4775 0x94 = 4752 0x95 = 4804 0x96 = 4716 0x97 = 4641 0x98 = 4579 0x99 = 4666 0x9A = 4717 0x9B = 4688 0x9C = 4780 0x9D = 4729 0x9E = 4717 0x9F = 4755 0xA0 = 4693 0xA1 = 9572 0xA2 = 9423 0xA3 = 9610 0xA4 = 9605 0xA5 = 9555 0xA6 = 9452 0xA7 = 9695 0xA8 = 9481 0xA9 = 9300 0xAA = 9562 0xAB = 9653 0xAC = 9464 0xAD = 4702 0xAE = 9557 0xAF = 9500 0xB0 = 9631 0xB1 = 9324 0xB2 = 9501 0xB3 = 9559 0xB4 = 9453 0xB5 = 9411 0xB6 = 9647 0xB7 = 9506 0xB8 = 9584 0xB9 = 9470 0xBA = 9506 0xBB = 9542 0xBC = 9691 0xBD = 9483 0xBE = 9507 0xBF = 9535 0xC2 = 143203 0xC3 = 303418
Hmmmm… So it looks like
file is right about the CRLF stuff, but - not to disparage
file too much - I’ve had
file blow sunshine up my skirt a few too many times in the past to completely trust that this is really a well-formed UTF-8 file. And so, we need to “whip out” a somewhat obscure Linux command just to be sure…
Many of you may never have installed the Linux “moreutils” package (see here for “moreinfo” on “moreutils”). Based on the name, you can probably tell that it contains a whole bunch more Unix utilities… and among them is a little gem called
isutf8 does pretty much what you would expect… it’ll tell you if a file is, indeed, well-formed UTF-8.
On most sane Linux distros, you can install the moreutils package using a simple
sudo apt-get install moreutils.
isutf8 is amazingly complex:
localhost ~ » isutf8 evilstuff.bin localhost ~ »
“What the heck is that?," I hear you cry, “It didn’t do anything!"
Welcome to Unix-land… around here, we tend to be a little terse. Deal with it… (i.e. unless
isutf8 bitches about the file NOT being UTF-8, you can assume that it’s UTF-8).
So! It’s UTF-8 text! I open it up in a UTF-8 capable editor aaaaaaand…
Gibberish… It’s frickin' gibberish:
So I jumped through all of those hoops just to find out some idiot from (the fictional country of) Belguim decided to POST frickin' gibberish as comment spam.
If you have any other notions about what this might be, please tweet me @tliston.
Owner, Principal Consultant
Bad Wolf Security, LLC
Senior Technical Engineer
May 12, 2016