A funny thing happened one night as I got home and started to get settled for bed. A
devsupport ticket from our github issue tracker has caught my eye. The title went something like “Strange Characters on Qlink”. Something about strange things has always fascinated me, and this was not one of the usual places where these things would happen. I had to take a look.
The issue reporter went on to say that an image of a cat had appeared on a teacher’s class management page. As it turns out, it wasn’t a cat. But that good ole canine from the doge meme. And it was in some form of ASCII art.
Whoa! How did it end up there?
It seems to be getting rendered out of our data in the database. I fired up the console to try and poke around. Never would have I imagined what I would see next.
What I initially thought of as a new style of animation was apparently the terminal printing out a large data that’ve been sneaked
User#first_name field by one crafty user. Which is what ends up getting rendered as our cat/doge.
I got a data dump of the user attributes in a text file so I can inspect it closer in ruby.
turns out that the
first_name field got some
79617 worth of characters.
I got a brief introduction to Ruby’s unpack and pack methods which seems useful here.
by retrieving the UTF-8 character, seems bulk of the data is coming from 100 - 900 range (I seem to recall most printable english characters don’t go over the 150 mark1). I also noticed that most of the characters above the 900 mark are in the 9600 and 7800 range, so I tried printing that out
Ah! These are the brushes used for the Doge art!
The uncharted regions of the utf-8 universe
Up until this moment my knowledge of utf-8 has been limited to the latin character set (and maybe some of the more popular asian sets like chinese or japanese). But there’s a big utf-8 world out there. Possibly to represent all known writing systems.
It’s also important to note, that some characters take different widths. This single width looking character actually has a size of
TL;DR Put a size limit on your input fields! 2
1: I was actually thinking of ASCII, which has 128 characters
2: MongoDB won’t magically cure your text limit issues
Banner Photo: Brian Bald