A funny thing happened one night as I got home and started to get settled for bed. A devsupport ticket from our github issue tracker has caught my eye. The title went something like “Strange Characters on Qlink”. Something about strange things has always fascinated me, and this was not one of the usual places where these things would happen. I had to take a look.

The issue reporter went on to say that an image of a cat had appeared on a teacher’s class management page. As it turns out, it wasn’t a cat. But that good ole canine from the doge meme. And it was in some form of ASCII art.

Whoa! How did it end up there?

At first I thought someone had found a way to execute a piece of javascript into the page. kids are always finding ways to subvert the system, and if there’s some new tricks these pranksters are playing, my job tells me that I should be on top of it. I turned off javascript in the browser. cat is still there.

If it’s not javascript; must be something that’s been hacked into the view. Could it be some kind of ruby black art? I get a little rush out of the possibility I might be on to something. I open up the view for that particular page. Nothing out of the ordinary. I look at the controllers. Still nothing.

It seems to be getting rendered out of our data in the database. I fired up the console to try and poke around. Never would have I imagined what I would see next.

Terminal Animation or ASCII Sorcery

What I initially thought of as a new style of animation was apparently the terminal printing out a large data that’ve been sneaked into the User#first_name field by one crafty user. Which is what ends up getting rendered as our cat/doge.

I got a data dump of the user attributes in a text file so I can inspect it closer in ruby.

turns out that the first_name field got some 79617 worth of characters. I got a brief introduction to Ruby’s unpack and pack methods which seems useful here.

by retrieving the UTF-8 character, seems bulk of the data is coming from 100 - 900 range (I seem to recall most printable english characters don’t go over the 150 mark1). I also noticed that most of the characters above the 900 mark are in the 9600 and 7800 range, so I tried printing that out

Ah! These are the brushes used for the Doge art!

The uncharted regions of the utf-8 universe

Up until this moment my knowledge of utf-8 has been limited to the latin character set (and maybe some of the more popular asian sets like chinese or japanese). But there’s a big utf-8 world out there. Possibly to represent all known writing systems.

Japanese (kana) seem to be in the 12000 range

Thai?

Vietnamese? Even the ancient Filipino script is represented!

It’s also important to note, that some characters take different widths. This single width looking character actually has a size of 25.

Tiny dancer?

TL;DR Put a size limit on your input fields! 2

Notes

1: I was actually thinking of ASCII, which has 128 characters
2: MongoDB won’t magically cure your text limit issues

Credit

Banner Photo: Brian Bald