Cracking Age of Wushu’s chat log encoding

Age of Wushu is a free-to-play* 3D martial arts action MMORPG, developed by Chinese company Snail. The game revolves around the Wuxia-inspired lore surrounding martial arts and adventures in Ming Dynasty China. (* “freemium” is more accurate)

I’ve been playing this game for the past few years and have chatted with a lot of players in the process. One crippling issue this game has, is the direct message (DM) history system.

The game keeps the chat logs locally, stored within an XML file, per account. Every string within this XML file is encoded in some way. Every time a new message arrives or is sent, the game writes to this XML file. This becomes a big issue on slower harddrives, when you have to deal with chat logs that are a few megabytes in size. Although a few megabytes sounds trivial, the game actually runs this disk write process on what I can only assume the main thread. Every time a new message comes in, the game actually freezes for a few frames (really). There were days where people started spamming your DM just to cripple your gameplay during a monthly martial arts tournament.

To solve this, people generally just went to their Age of Wushu installation directory, and deleted their XML file. But me being me, I never threw them away. I know the chat logs are stored in some kind of encoded format and thought it’d be fun to crack it and look back at the chat logs one day.

A small portion of the chat log files.

On the 28th of December, I was determined to finally figure out the format.

The XML format

Upon inspecting an XML file, you are immediately greeted with garbled characters (for the untrained eye).

During experimentation, I found out that

  • A “Record” is a chat session with another player
  • An “Item” is a chat message
  • An item’s chat content is always prepended with the string "2z3SMQ473PHc3bZO3Py9KHyZomCR3lyZjmAtKIvnjeLOJzASMzyU2aJcjelcjPueJO0[" for some reason
  • An item’s chat content is also appended with "25yPMeYZ2cxOGOg[" or "[2zvUFsW=".

I also decompiled the game’s Lua files in order to find the functions that handle chat history. The chat window seems to make calls to two functions:

There is a function which changes a widestring into the UTF8 charset, then proceeds to call an external function within the game’s DLLs in order to “encrypt” the string. There’s also a function which does the opposite.

The padding with the "="-character does suggest this encryption is Base64, but I’ve never seen the character "[" used in Base64 before. Running Base64 strings through a Base64 decoder gave me false results. Take the Base64 string "CmvsjXYqMUnIC5D=" for instance. I know for a fact that this says "Ersanio(GD)", yet a Base64 decoder gives me "kv*1I ".

Somewhat more confused with the introduction of “widestring” and the false Base64 results, I presented this problem to p4plus2’s Discord guild, snesdev. Although it’s called “snesdev”, the folks there have great interest in various programming-related topics in general.

I gave xfix, randomdude999, Alcaro and p4plus2 some example Base64 strings with some character names and their “Base64” equivalent. They speculated about “XOR cipher”, “lookup table” and “stream cipher” and I had no idea what those meant, serving as a reminder that one day I should look into cryptography. However, eventually, they settled on “Base64 with a shuffled alphabet”.

What is Base64?

Basically speaking (no pun intended), Base64 is a way to encode bits and bytes into a string. Since Base64 works with raw bits, it is possible to encode any data thinkable into a string. Think of text or images. When encoding, Base64 processes groups of 6 bits, rather than 8 bits. Every 6 bits are matched with a character. For this, Base64 makes use of some sort of a dictionary.

The Base64 index table

For example, the string "A" encodes into the Base64 string "QQ==". The character "A" is 65 in UTF8, thus "010000 01" in binary, grouped by 6 bits. Append the second group with four zeroes to have two groups of six bits, and you have "010000 010000", which according to the table above, equals to "QQ". We appended the second group with four zeroes just now. Because of this, we communicate this by appending the Base64 string with two "="; one for every two zeroes appended.

Shuffled index table

The thing is, Age of Wushu uses a different index table. The reason why I’ll explain later. I could either reverse engineer the DLL to try figure out the index table, or try to brute-force this shuffled index table. I chose to do the latter, as I have no experience with reverse-engineering actual DLLs.

I started sending one of my alt characters certain messages, so I wouldn’t have to spam a poor random by-passer’s DM with cryptic chat messages. I checked the encoded messages in Age of Wushu’s chat logs, and compared them to the regular Base64-encoded version.

For example, I sent the message "aaaaaaaaa". In Base64, this is "YWFhYWFhYWFh". In Age of Wushu’s shuffled Base64, this is "jXHnjXHnjXHn". Therefore, I know that Y = j, W = X, F = H, h = n. Repeat this for many more strings and you end up filling a shuffled table. To make things easier, randomdude999 gave me a string which hits every single Base64 character when encoded:

aa@aaAaaBaaCaaDaaEaaFaaGaaHaaIaaJaaKaaLaaMaaNaaOaaPaaQaaRaaSaaTaaUaaVaaWaaXaaYaaZaa[aa\aa]aa^aa_aa`aaaaabaacaadaaeaafaagaahaaiaajaakaalaamaanaaoaapaaqaaraasaataauaavaawaaxaayaazaa{aa|aa}aa~a`?

Encoded, this is:

Base64:
YWFAYWFBYWFCYWFDYWFEYWFFYWFGYWFHYWFIYWFJYWFKYWFLYWFMYWFNYWFOYWFPYWFQYWFRYWFSYWFTYWFUYWFVYWFWYWFXYWFYYWFZYWFaYWFbYWFcYWFdYWFeYWFfYWFgYWFhYWFiYWFjYWFkYWFlYWFmYWFnYWFoYWFpYWFqYWFrYWFsYWFtYWFuYWFvYWFwYWFxYWFyYWFzYWF0YWF1YWF2YWF3YWF4YWF5YWF6YWF7YWF8YWF9YWF+YWA/

Shuffled Base64:
jXH0jXH1jXH5jXH6jXHEjXHHjXHzjXHIjXHJjXHvjXHijXHFjXHujXHAjXHBjXH2jXH4jXHCjXHajXHbjXHLjXHljXHXjXHmjXHjjXH3jXHojXHMjXHGjXHKjXHNjXHkjXH7jXHnjXHOjXHcjXHDjXHwjXHPjXHQjXHpjXHqjXHrjXHtjXHRjXH8jXH9jXHSjXHTjXHxjXHUjXHsjXHZjXHdjXHejXHVjXHWjXHYjXHhjXHfjXHgjXHyjXH[jX0+

Trim the repeating characters and you get a more readable result:

Base64:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

Shuffled Base64:
0156EHzIJviFuAB24CabLlXmj3oMGKNk7nOcDwPQpqrtR89STxUsZdeVWYhfgy[+

He provided me a string which hit every character in the correct order! With this, I could basically make a vertical comparison and fill the shuffled Base64 index table used by Age of Wushu.

Age of Wushu’s shuffled Base64 index table

Using our example of encoding the string "A" into Base64 earlier, this means that using Age of Wushu’s index table, this encodes the bits "010000 01" into "44==". This table also explains why some strings contain the character "[", even though Base64 originally does not have this character.

With this reverse-engineered table, it is finally possible to read the chat logs in a human readable format! There is no need to write a custom Base64 decoder for this; Just use something like Python’s String translate on Age of Wushu’s Base64 string, before running it through a regular Base64 decoder.

Why is the chat encoded?

We saw earlier that chat messages are prepended and appended with mystery Base64 data. Running a line through the custom Base64 decoder produces the following:

<font face="font_title_tasktrace" color="#cecbc6" >AAA</font><br/>

The game uses HTML to format chat messages.

This is just my speculation, but I think the developers added this extra hurdle in the Base64 encoding in order to prevent people from editing chat logs and inserting their own custom HTML. The game is hardcoded to filter out the "<" and ">"-characters from any textual input in the game. Although I don’t know what malicious intent people could have by editing their own chat logs, as the opposite party cannot see these edits… but I cannot think of any other reason. Historically speaking, in the earlier versions of the game, it was possible to use the "<" and ">"-characters, allowing players to use links or even embedded images in the chat.

Just for clarification, the game doesn’t actually use HTML as per HTML standards. Rather, it has its own HTML-parsing engine which only recognizes certain tags. For example, I manually encoded a script tag into the chatlogs, and the game did not recognize this.

Conclusion

Age of Wushu uses a shuffled Base64 index table to prevent players from presumably editing their own chat logs in order to inject (malicious) HTML code. Now that this encoding is cracked, I’d say I opened a can of worms, but honestly, being able to read and edit your own chatlogs is harmless I’d say.

I will probably write some sort of web interface which accepts an XML file and outputs readable chat on the screen. Imagine something like WhatsApp’s web interface. No backend server will be needed for this and users won’t have to be worried about their data being uploaded somewhere.

Considering I have several chat logs, I might also write a function which merges these chat files into one, sort of zipping them together, based on the recipients’ names and the moment of the sent messages.

This will be a good opportunity to practice TypeScript and Angular. I’ll open source the program on GitHub eventually.