Exploring the Prolog language

The semester Advanced Software Development (ASD for short) had an assignment which requires me to write a blog about me exploring a new programming paradigm besides the typical one thought at school (object-oriented).

We had the choice of picking one of the following five languages: Prolog, Scala, Erlang, Clojure and Haskell. Last year I picked Haskell with its functional programming paradigm, but due to time constraints, I couldn’t finish the assignment in time. Instead of trying something I’ve already done once, I thought I should pick something different and see what the other languages are like. Thus, I decided that this time around, I would look into Prolog.

The assignment consisted of three parts:

  • Picking a language and paradigm (already done that)
  • Writing (IT) blog posts about my experiences and opinions related to this selected language and paradigm, while working my way through the Prolog chapter in the book “Seven Languages in Seven Weeks” by Bruce A. Tate (referred to as “the book” from now on). Conveniently enough, I already have this blog in a rather unknown corner of the internet I can use. At times, I will also try to compare various features of the language with the programming languages and paradigms I already know.
  • Defining a challenge befitting the selected language and paradigm and blogging my hardships and problems along with it.


Prolog is a logic programming language. Various resources on the internet mention that it’s good for Natural Language Processing and Artificial Intelligence, but when I did a quick Google search on Prolog’s real world applications with the query “what can Prolog solve”, I most notably got puzzle-related results such as a Sudoku-solver. Admittedly, I could’ve worded that query better. For this assignment, I will use the same tool as the book uses: “GNU Prolog”.

The book mentions that data in Prolog is in the form of logical rules: facts, rules, queries.

A fact is a basic assertion about some world. This is the first time the book introduces the term “world”. I think another word for this could be “domain” or “context” or the likes. A rule is an inference about the facts in that world, while a query is a question about that world

Example: likes

The book starts with a basic example:

Lines 1-3 are the “facts” while line 5 is a rule. Lines 1-3 define some very simple facts such as “wallace likes cheese” or “grommit likes cheese”.

I saved this as a .pl file and then double clicked it to run it in GNU Prolog, because during the installation, GNU Prolog associated .pl files with itself. After running the file, I had the option to give my own input because I see an input cursor blinking after the “| ?- ” characters in my Prolog console.

I inputted my query as the following: “likes(wallace, cheese)” but after hitting return, all that happened is my cursor jumping to a new line. Moments later, I noticed that I was missing a period at the end of the statement. I suppose the “.” means “end of statement”, so that the user can input statements in multiple lines. Once I got the query correct, I got a “yes” as output. Changing the parameters a bit (“cheese” to “cheesea” which is supposed to be a typo) gave me a “no”. Yes and no – synonyms for true and false in the programming world.

Lines 1-3 aren’t very exciting compared to line 5. There is a lot to say about line 5 which I’ll try to keep short. According to the book, the rule is called “friend/2” which is a shorthand for “friend with 2 parameters. X and Y are the parameters in this case, while “:-” probably is Prolog’s assignment characters. Then you see three blocks divided by commas: “\+(X = Y)”, “likes(X, Z)” and “likes(Y, Z)”. The book says that X
and Y cannot be the same, which is what “\+” is for it seems.

As for the “likes”, I assume they’re like function calls which return a yes or no. When the results of these three blocks is a “yes”, then you get a “yes” as the output of “friend”. However: nowhere in the “friends” call you specify “Z”. This rule checks that if two persons like the same Z (thing), then they are friends. What Prolog does there is check every single fact with Wallace and Grommit and see if they have a matching object they like. For example, Wallace and Grommit like cheese. Therefore, “friend(wallace, grommit).” returns a “yes”. You don’t specify “cheese” anywhere, Prolog “just knows” that they both like cheese.

For the sake of experimentation, I changed line 5 to “friend(X, Y) :- (X = Y)” This should mean that a person is a friend with itself. When I called this as “friend(wallace, wallace).”, I got a “yes”. This confirmed to me that a negation indeed is written as a “\+”.

Interestingly enough, the code example can be visualized as a tree of some sorts:

Where an arrow out is the output. And if at some point one of the arrows contains “no”, then the block the arrow is pointing at would also output a “no”. I imagine this can go infinitely deep. To test this I changed the example a bit, which also could be represented with the following tree.

Example: food

The book proceeds to a second example related to food:

A food has a type, and a food has a flavor. You find out if the food has a given flavor by querying for: the food, and the flavor. What the function does is check if a given food’s type and a given flavor’s type match up.

The book introduces a new keyword: “What”, then proceeds to call the fact as following: “food_type(What, meat).” which returns a prompt from the console: “What = spam ?”. From this point on, I can press ; to go to the next alternative, or a to list all alternatives. Prolog is telling me that spam is meat, but that there are more. When I hit ; I see “sausage”. But this is all executed in a fact. What about a rule, in this case, “food_flavor”?

When I call “food_flavor(What, sweet).”, I’m essentially asking “what food is sweet?” Then I get the following results: “jolt” and “twinkie”. This is actually pretty interesting because Prolog finds out if a food is sweet by looking at the given flavor, then searches the given flavor by matching a flavor’s food type with the food in food_type.

I like how this wildcard-like variable is called “What” because you can almost make it a humanly readable inquiry: What food has a sweet food flavor?

Example: Map coloring

The idea is that there are 3 colors: red, green and blue. Each state has a color but adjacent states may not have the same colors touching each other. So for example, Mississippi and Tennessee may not have the same color, or Alabama may not have the same color as its surrounding states. First all color combinations are defined in “different” as a fact. Same color combinations are not allowed. Calling this code: “coloring(Alabama, Mississippi, Georgia, Tennessee, Florida).” produces the following output:

Alabama = blue
Florida = green
Georgia = red
Mississippi = red
Tennessee = green ? ;

And 5 more combinations. The fact that it managed to find color combinations without any algorithm is something I have a hard time wrapping my mind around. All this code does is defining facts and rules and it manages to find all the alternatives. Yet I have a hard time understanding how the code really works, because the book doesn’t explain the code in-depth. Maybe I’ll figure it out later on when I have more experience.

Example: unification

This final example introduces a new character: the equals sign (“=”). In Java or C#, = is used to assign something to a variable (var x = 1). But in Prolog, it means “Find the values that make both sides match.” Calling:
dorothy(lion, tiger, bear).” produces the following output: “yes”. But calling “dorothy(One, Two, Three).” produces:

One = lion
Three = bear
Two = tiger

I suppose to “unify” one, two and three, Prolog ‘changes’ them into lion, tiger and bear, respectively. Same with X, Y, Z in dorothy – those three get changed into lion, tiger, bear when the rule is invoked. I guess that’s what is also happening at the map coloring example. The state names get mapped to red, green, blue, hence it output [State] = [Color]. It’s actually a unification output.

If I call “dorothy(One, Two, bear).” it only outputs:

One = lion
Two = tiger

There is no bear, as it’s already “the same”.

Thoughts so far

Prolog is quite interesting so far. It is somewhat reminiscent of a database where you have a bunch of records and you can execute queries, yet something feels ‘different’ about it. I still can’t wrap my mind around the Map Coloring example. I know one way or another, unification is involved, but unfortunately the book doesn’t explain it very well. It felt like that example was there for the sake of glorifying Prolog like “look at the kind of complicated problems Prolog can solve!”.

Yet, I believe in Prolog’s power. In the self study of day 1, I added my own challenge where I find all genres a specified instrument is used in, which could be done in 2 lines of code or so.

Compared to that, here’s the C# version I quickly put together.

Quite the difference, isn’t it? In Prolog I don’t need to define data structures or tie together two structures with an algorithm. Instead of that, all I need to do is define a rule.


The chapter starts off with a recursive ancestor function. If it wasn’t recursive, I’m sure it’d be a huge chain of function calls instead, and the length of the chain would depend on just how deep you’d like to search. I imagine it would be a pain to maintain that list.

First thing I notice here is how a rule is defined twice. How does Prolog differentiate between these two? The book mentions that ancestor/2 has two clauses. “When you have multiple clauses that make up a rule, only one of them must be true for the rule to be true.”

I think in this context, a clause simply means implementation of a rule. So there are two manners to execute a rule: a simple lookup of the father, or a recursive lookup, and once one is successfully executed, there’s no need to execute the other. The book also says: “The second clause is more complex: ancestor(X, Y) :- father(X, Z), ancestor(Z,
This clause says X is an ancestor of Y if we can prove that X is the
father of Z and we can also prove that same Z is an ancestor of Y.”

“zeb” is an ancestor of “john_boy_jr”. If I call “ancestor(zeb, john_boy_jr).”, I get a true. I think what happens is, behind the scenes, it calls the first clause which returns a false, it then calls the second clause. “father(X, Z)” returns a list of descendants of zeb, in this case just john_boy_sr. This is then used in the recursive ancestor call like “ancestor(john_boy_sr, john_boy_jr).” and will start evaluating from the first clause again. This time that evaluates to true, therefore, zeb indeed is john_boy_jr’s ancestor. It feels like the first clause in this case is the base case of the recursive function.

At first, I had difficulties understanding this. But writing it out step by step like this makes more sense to me. I now know that a rule can have multiple clauses and how recursion works.

Finally, the book also mentions we can use variables in a query, such as ”
ancestor(zeb, Who).”. It immediately dawned on me that the previous day I mistook “What” for a keyword. Rather, it’s just a variable named “What” for context’s sake. In this context, the author decided to use “Who” instead because we’re talking about people. Unsurprisingly, I got the descendants of zeb, then the descendants of zeb’s descendants, and so on.

Lists and tuples

There’s not much to be said here. From what I gather by reading this book, a tuple is a list but has a fixed size and is written with (), e.g. “(1, 2, 3)”. Meanwhile lists have a pipe operator which can separate it into a “head” and “tail”:

I imagine that you can “iterate” through a list if you recursively keep splitting a list into a “head” and “tail” until you have nothing left. In fact, I did exactly that.

Which outputs:

What I find interesting about lists is that lists are like arrays, except they can be accessed in two parts: the head and tail. You don’t have that in C# for example. In C#, you could use an array index 0 for the head, but accessing the tail would require copying an array over to a new array without the head.

Challenge: Textonyms

In order to delve deeper in the language (and because the assignment requires me to), I have come up with a challenge: Write a ‘textonym finder’. Rosetta Code has a page dedicated to this, but has no code example for Prolog, which means likely nobody has attempted this in Prolog before. A perfect candidate for an original challenge. I might even submit this to Rosetta Code for fun.

Originally I had an even more complicated challenge, but it had proven to be too challenging: creating a word searching puzzle. You know, the ones where you have a list of words and a grid of letters and you have to find words in it. The way I wanted to approach this problem was as follows:

  • Define the horizontal and vertical width of the grid
  • Define the list of words which can be used
  • Define that each word can be used only once
  • Define that the words can be on a horizontal or vertical grid (not diagonal for now) in either direction
  • Define that the letters of those words can somewhat cross each other

The first two points are easy but things became really blurry from the third point on already, and I had no idea on how to solve this problem.

I have come across a similar problem which was a crossword puzzle solver. Solving crossword puzzles is essentially like generating a word search puzzle, except that word search puzzles don’t have blanks.

A crossword puzzle

Unfortunately I really had no idea where to start even for crossword puzzles, and a bit of searching gave me some semi-popular crossword Prolog problem where they ask you to solve a symmetrical crossword puzzle. Trying to tinker around with the solutions for that problem quickly proved to be in vain, even with the GNU Prolog manual at my side, because that manual doesn’t have any practical code usage examples whatsoever.

Thankfully, I had a backup-challenge in case the first challenge proved to be difficult: textonyms.


In order to find the textonym of a number, I have come up with the following steps which in theory *should* work:

  • Create a database of some words: ‘test’, …
  • Create a dictionary of key mappings: (a, ‘2’); (b, ‘2’), …
  • Input the numbers as a string (e.g. ‘43556’ which would result in at least a ‘hello’).
  • Split the numbers into a list: atom_chars(In, OutList)
  • Produce every single combination of the list of numbers. For example: 43556
  • maps to GDJJM, HEKKM, IFLLO, GELKO, IEJJO, etc.
  • See if any of those combinations exist in the words database.
  • (Optional: load the database of words from a text file)

First off, I mapped each character to a number:

Then, I defined some words:

What I want to do is basically ‘map’ some numbers to a combination of characters. GNU-Prolog has a function exactly for that: maplist. This website seems to handle mapping extensively and shows many examples. One example that caught my interest is the following:

This is exactly what I needed, because I want to iterate over a list of numbers and ‘convert every character into something‘; applying a function on each element if you will.

In order to do that, I first had to write a conversion script. Actually, no, I didn’t have to, because I already have one: keymap. I can simply call it like this: keymap(Out, ‘9’). That’ll give me every single character that belongs to the number 9. I use keymap in maplist: maplist(keymap, [‘4’, ‘3’, ‘5’, ‘5’, ‘6’], Out).

Pretty much what I wanted!

I now have generated a list of all possible characters for a certain numbers with a single line of code, although it took me three days to get this far, because I spent most of my time writing a recursive function instead. Finding that maplist page is a blessing.

Next up, I’ll need to take this output and see if they exist in the database of words. For that, I’ll need to write a function which takes a list of chars as input and either put them together as a string, or splice up the words in the words database into lists of individual characters. The former sounds easier, so I’ll go with that.

After a bit of looking around, I found a function called ‘atom_chars/2’. This can either combine or split characters to/from a string. This function is also handy in splitting the input number into a list of characters for the above function. Here’s an example of how it works both ways:


I used this function in my own function called charlist(In) in order to check if a list of characters exists in the words database:

charlist(In) :-
atom_chars(Out, In),

It can successfully detect whether ‘hello’ exists or doesn’t exist:


Now the tricky part: putting it all together. So far I’ve made a few building blocks, and it’s time to build something functional out of it. So far, this is what I put together:

When I call mapchars with the number combination, I don’t get quite the desiredresu lt. Calling it with ‘43556’ (correct) and ‘12345’ (incorrect) gives me a boolean result rather than the matching word:

So close!

Somehow, I need to make it display the missing words. The function returns a boolean instead of the matched word. A eureka moment filled in the final piece of the puzzle: I need to return ValidWord from ‘charlist’ all the way back to ‘mapchars’, so I changed the final two lines. This is how the function looks like now:

And the function call:

Exactly what I wanted!

In the end, maplist saves the day! I expanded the database for entry 43556 some more with additional words:

And the result:

Curiously enough, GNU-Prolog is unable to tackle the first two words in the database due to environmental constraints:

I am unable to find a solution for this, even when I search for the exact word combination “Atom table full”. Out of curiosity, I experimented to find the limits:

The maximum amount of characters I can query is 9. Any more than that, and the program crashes.

Words database

To go an extra mile, I wanted to populate the database of words with a text file. I found a way to read files line by line into an array, but didn’t find a way to create facts out of those words. Unfortunately, I had to leave this out of the challenge due to time constraints.

This is how the end result looks like.

Interfacing and real-world applications

Prolog is an interesting language but I find it somewhat constraining that I have to work from the GNU-Prolog program. Right now this is more of an academical context, but what if I wanted to actually use Prolog in some of my applications? For example, how do I call Prolog from C# or Java? Or what if I want to make visual applications and not just CLI? In this post I will briefly look into various options. Note that I won’t distinguish between Prolog dialects (GNU/SWI-Prolog, and so on).


Yield Prolog: Not exactly a Prolog compiler in C#, but it does add a “Variable” keyword which in combination with Prolog should simulate Prolog’s unification.

Also, it seems like NuGet also has a lot of Prolog-related packages, although the amount of downloads seem to be minimal:


JPL: It seems to allow for embedded Java code in Prolog, and the other way around.

B-Prolog: This also seems to have a bi-directional interface with Java but also C.

SICStus Prolog: According to this page, you can embed SICStus Prolog in Java or .NET application servers.


Tau Prolog: A Prolog interpreter written entirely in Javascript. This means that you can place Prolog code in Javascript files and it’ll execute. There’s also a Node.js NPM package for it.

Use cases

There are various use cases for Prolog, which I will list below.

Closing thoughts

Although there are quite some Prolog-related content on the internet from communities, most of them seem to be tutorials or people asking questions. Some websites I come across also look like they’ve been written pre-2005. I have the feeling that nowadays Prolog is applied more in an academic context rather than modern-day applications and is slowly fading into obscurity.

One major annoyance I had with Prolog is how there are dialects (GNU/SWI) and how they seem to differ a bit in functionality. SWI Prolog shows up in searches more often so it makes me wonder why the Seven Languages in Seven Days book decided to go for GNU Prolog. There also doesn’t seem to be any official IDE for Prolog and debugging Prolog code is something I can only dream of .

It also feels like (at least GNU) Prolog is tailored for older machines, which I noticed when I tried to generate textonyms for more than nine numbers. Some websites also have ancient designs which kind of gives me the impression that Prolog is an ‘ancient’ language probably used in legacy systems nowadays.

Besides the annoyances, I found Prolog itself to be quite interesting, especially with unification being able to go both ways. For example, atom_chars can split an atom into chars, or merge chars into an atom. Normally in C# you’d have two separate functions for that. In Prolog you just define a variable and Prolog fills it in for you. That’s what my textonym-program mainly uses too. Although the opposite doesn’t
seem to be possible (finding numbers for a word) because of the way I used atom_chars.

Truthfully, I just can’t help but have mixed feelings about Prolog as a language. Logical programming as a concept, however, is most interesting.

The making of my SNES assembly tutorials

I’ve been meaning to write a post after releasing the most recent version(s) of my assembly (ASM) tutorials, but never got around to it until now. Here’s a shameless plug of the tutorials in question:

Ersanio’s ASM Tutorial – Assembly for the SNES (v2.3)

Ersanio’s ASM Tutorial – Assembly for Super Mario World (v1.1)

If you’re an SNES ASM hacker, chances are that you’ve heard about these. So yeah, I’m the author (if that wasn’t obvious already, hehe).

Here’s a picture of an outdated physical version which I printed out with the remainder of my high school printer credits before graduating back then. Gotta put that money to use somehow!

History and idea

“Ersanio’s ASM Tutorial” is an idea I came up with all the way back during my highschool years. SMWCentral’s archives show that my very first ASM tutorial thread was made in November 2008. I was 16 back then.

I learned ASM because of people like Ghettoyouth and Bio. They were pumping out incredible ASM work and it really fascinated me. I didn’t know a thing about programming but I was set on learning ASM anyway. However, there was a problem: because the ASM field was rather young, documentation was difficult to find.

After a lot of searching on the internet, as well as repeatedly asking for help from people who knew ASM, I finally learned ASM myself. It was a tough journey but in the end it all worked out. Yet, I was still a bit bothered by how scattered the information was. I saw people who are willing to learn ASM, but had hard times because of how information was scattered. At the same time, I felt bad for people who knew ASM because I’m sure they’d be bombarded with ASM-related questions… including myself. That is how I came up with the idea of making an ASM tutorial. In a way, I wanted the people to experience my passion as well, so this tutorial would be some kind of a helping hand for people who wish to learn ASM.

Writing my ASM tutorials

Writing my ASM tutorial was actually challenging, because the target audience are people who don’t know ASM. It’s very hard to imagine the things the audience doesn’t know because you yourself know ASM already, so some of the knowledge is “common sense” to you as the writer. This kind of tutorial is one which teaches the audience a “skillset” rather than a “utility”; if I were to write a custom block inserter tutorial instead that would’ve been easy, because the tutorial would be about how the tool works. It teaches the reader a “routine”. On the other hand, teaching a skillset such as ASM is like telling the reader that there’s a bunch of abstract concepts and you need to somehow combine them to achieve a result.

This shouldn’t come as a surprise, but first and foremost, when you write a tutorial, make absolutely sure that there are no mistakes whatsoever. You wouldn’t want to teach the reader false information. This is the most important rule I followed.

When I started writing my ASM tutorial, I simply wrote a bunch of topics as chapters (kind of in the order I learned them), then started filling them with content. I asked for a lot of feedback from people who both know and didn’t know ASM which was a huge help.

There was a balance I needed to keep while writing the tutorial. When I wrote the ASM tutorial, I wanted to exactly explain what happened behind the scenes when you execute a certain piece of code, but this has the risk of making the tutorial extremely verbose, as well as introducing advanced concepts very early on in the tutorial. Here’s an example of seemingly simple code which could suffer from that problem:

LDA #$01
STA $19

In the context of Super Mario World, this is simply giving the player the Mushroom powerup status. But what happens behind the scenes in ASM is much more complicated:

  • A is in 8-bit mode
  • The direct page register is $00
  • The data bank register is $00
  • What is RTS? Return? To where?
  • The processor flag ‘zero’ is cleared

This is a lot of information to introduce to a beginner so I had to keep it to the bare essentials, often simplifying such information to things like “when the address is between $7E0000-$7E00FF, you can simpify it to $00 to $FF” because this is the most common use case scenario of direct page.

Another important thing to keep in mind when teaching a skillset is the usage of examples. You could write down a lot of facts, but without examples it wouldn’t stick around the reader. It’s a chance to get the reader involved and make them think about code. Visual aid can also be helpful, such as a visual representation of the stack.

Interestingly enough, having a good vocabulary also helped me improve my ASM tutorial a lot. An extensive vocabulary helps you with conveying information in a more understandable language. In the beginning my English wasn’t the greatest, but that didn’t stop me from writing the initial versions of my ASM tutorial. As my English improved, I found that some of the things I have written were of poor quality. I essentially ended up almost entirely rewriting my ASM tutorial because of this, but I think it was very worth it.


Very recently I came across this online book sort of thing: Model-Based Machine Learning. Machine learning is interesting in itself but what interested me more is the format of this book. I really like how you can just click through the book and how it’s organized. I’m not sure if they used some kind of tool to generate this online book, but maybe I’ll put my ASM tutorials in this format as well one day even if I have to do it manually.

Closing words

I would like to believe that my ASM tutorials were of great contribution to SMWCentral. However, I didn’t expect it to be recommended in other communities as well. p4plus2 and Raidenthequick mentioned how my tutorials are being used in other communities as well and I was really amazed by this.

I think it’s safe to say that both of these ASM tutorials are my proudest work in my hobby of ROM hacking.

DEFLATE on the Super Nintendo Entertainment System

I’ve been meaning to write this post for a while now, but I never got around to it until today.

Recently, p4plus2 and I released a new SNES project – an inflate algorithm for the SNES. It is capable of decompressing DEFLATE files, and the code is some serious wizardry.

Without further ado, here’s the C3 thread:
In here are all the data we found as well as a brief explanation of the format and possible applications.

And the repository link:

How the idea came to be

Before last C3, I came up with the idea of porting a certain piece of 6502 code to the 65c816, i.e. Super Nintendo Entertainment System. It’s an inflate algorithm written by Piotr Fusik. A few years ago (really), p4plus2 linked me this and I forever stashed it in my “projects to do” list, not really understanding what DEFLATE really does. At least, until very recently.

After my latest semester I’ve learned about graphs and trees and this is exactly what DEFLATE uses – a Huffman Tree. I finally started to understand what DEFLATE really is and found lightweight tools to compress data so now I have things to test. I told p4plus2 about the idea of porting DEFLATE to the SNES and he told me he’s been wanting to do this also, so this became some kind of a collaboration project. Ultimately, I wanted to implement this in Super Mario World as I know that some hack collabs suffered from ROM space issues. There are solutions (such as bankswitching using the SA-1) but solving the problem without any enhancement chips sounded more challenging.

The initial approach

At first, I thought it’d be a matter of  “porting an old processor’s opcodes to its successor’s opcodes” which would give it some kind of a performance boost simply by utilizing newly-added opcodes, so you wouldn’t need to “beat around the bush” anymore when you wanted something done. For example, the 6502 doesn’t have stack instructions for the index registers, so you would need to transfer the index registers to A first, then push that using PHA. That’s two steps for a push. In the 65c816, you can simply use PHY or PHX. The first thing I did was making sure the 6502 code worked on the SNES without any optimizations whatsoever. This worked, but then p4plus2 took this project to the next level.

Benchmarking decompression algorithms

p4plus2 has ways to measure code performance on the SNES by using a custom-built SNES debugger. He measured them in “clocks” and we used a compressed GFX00.bin as our test file.

The ported 6502 code used 7415572 clocks (~21 frames). 21 frames might not seem a lot but consider the fact that I want to port this to Super Mario World. Super Mario World can load up to 11 graphics (GFX) files per level, so that’d be roughly 231 frames, give or take a few depending on how well the files are compressed. That’s almost 4 seconds on the level loading screen just for decompressing GFX files. The original SMW decompression algorithm (LC_LZ2) already bothered me to the point of initiating a very successful ASM collab to optimize the LC_LZ2 algorithm.

By p4plus2’s request, I made a bunch of “wrapper” homebrew ROMs which ran various decompression routines on GFX00.bin. I compressed GFX00.bin into LC_LZ2 and LC_LZ3 and ran three decompression algorithms on it: SMW’s original LC_LZ2 decompressor, the highly optimized LC_LZ2 decompressor and the LC_LZ3 decompressor which was based on the highly optimized LC_LZ2 decompressor. p4plus2 benchmarked the ROMs (including the ported 6502 inflate code) and got the following results:

PERFORMANCE: Decompressing GFX00.bin
lz2 (original)	1638070 clocks
lz2 (optimized)	599616 clocks
lz3		1041172 clocks
DEFLATE 	7415572 clocks

The results kind of scared me. The inflate routine would need a LOT of optimizations. What’s more, p4plus2 recommended that we just start over from scratch so that we would have full control over the code. Making small changes would require us to make many changes in several places in the source code sometimes. To check if this project is worth it at all, I decided to compare DEFLATE to other formats.

Comparing various compression formats

(Admittedly I should’ve done this step before even starting the project. What if the compression format was inferior?)

I compressed all the GFX files of SMW in DEFLATE using zopfli and compared them in other compressed formats in order to see if DEFLATE is a superior compression format or not.

The results were incredible as the compression format beat even LC_LZ3 which is considered an “upgrade” of LC_LZ2:

As you can see, DEFLATE saves hundreds of bytes compared to LC_LZ3. When you count all GFX files, DEFLATE saves 0x4A53 bytes total compared to LC_LZ3, which is a HUGE improvement within the context of SNES ROM space. It’s funny to see the compression graphically as well:

In LC_LZ2 you can still see some of the original graphics just slightly distorted. LC_LZ3 is even more distorted. Then you have DEFLATE which is just… a garbled mess. You can see that every single byte is processed one way or another, thanks to the Huffman coding.

Code with momentum

p4plus2 decided to rewrite the routine from scratch and I applaud his fortitude. As I still have trouble understanding DEFLATE I mostly had a supporting role including finding optimization points in the code, suggesting edits, minor optimizations to his code and pretty much brainstorming with him in general.

At first, p4plus2 wrote the code unoptimized but in such a format that it would be easy to reformat it to an optimized form later. Many existing implementations generated the static tree dynamically in the RAM but we decided to just put the static tree in the ROM. After applying many optimization tricks, we got the code down to 4753212 clocks, from 7695346. It still doesn’t beat the previously mentioned decompression routines,  but it’s one hell of an improvement.

Optimization tricks

p4plus2 already had experience with optimizing SNES code as he worked on a few TASbot projects before, which also required insane amounts of optimizing. Most of the optimization was pretty ‘basic’, such as unrolling loops, inline functions rather than function calls, direct copy being DMA’d rather than block moved and switching around branches so that they are taken less often. We also removed some obsolete opcodes (such as CLC/SEC) which required a bit of code analyzing and experimenting.

The most interesting optimization tricks used are messing around with the stack pointer and honestly I think that’s entering black magic territory. Take a look at this code:

An inexperienced ASMer would immediately notice that there’s a pull without a push. That would cause the program to crash! …But there are strings attached. First of all, this entire code is in 16-bit mode as mentioned earlier. Second, code_length_codes is a table with 16-bit values and the stack pointer is set to that. As the stack pointer is a 16-bit value, it can also point to bank 00 of the ROM. Finally, PLX increases the stack pointer by +2 (as we’re in 16-bit mode). As a result, this code reads out the bytes and stores them in the RAM addresses defined in the code_length_codes table. Every PLX is basically shorthand for a bunch of instructions which read out code_length_codes, transferring it to the X register, then increasing the code_length_codes index by 2. This is quite the extreme (and clever!) optimization but it was possible as the stack wasn’t used in this area.

The second extreme optimization also involves the stack pointer register as well as a bit of the direct page register. There’s no stack pulling magic here, however.

This routine is the main bottleneck of inflate. It’s called hundreds of times so it only makes sense to optimize this routine as much as we can, even if it means shaving off 1 cycle at a time.

First, we used the direct page register to clear the accumulator because it’s always nice to assume the direct page register is set to 0000h. Then we basically used the stack register for scratch RAM purposes. It’s faster than using actual scratch RAM. A TCS/TSC takes two cycles while doing an “LDA $dp” or “STA $dp” in 16-bit mode takes four. Also, a TDC takes two cycles while an “LDA #$0000” takes three. Because the routine gets called hundreds of times, the optimizations are pretty much hundreds of cycles!

Future projects

With the recent developments on the SA-1 chip thanks to Vitor Vilela and his SA-1 pack, the possibility of an SA-1 port is very real. This would mean that the inflate code could run even faster with minimal modifications to the code itself. What’s more, inflate could be optimized even more with a barrel shifter and SA-1 just happens to have one.

Personally I would also like to make a SuperFX port of inflate so that the Yoshi’s Island community would be able to benefit from the compression, although compression tests show that LC_LZ16’s compression is superior in some cases. I don’t think this will hold me back, though. As I don’t fully understand the DEFLATE format yet, I plan to write inflate in C# first to grasp the essentials of the decompression algorithm. After I truly understand how inflate works, I could give the SuperFX version a serious attempt. I know that if I were to write one right now, with my current knowledge, all I would do is take the SNES code and port it to SuperFX like some kind of a tool.

Closing words

This is possibly the grandest SMW hacking ASM projects I’ve worked on so far. I’m really glad that me and p4plus2 worked on this project together. If I tried this solo, I would’ve simply ported the 6502 code and called it a day (but the project technically would’ve been successful)!

There were so many steps involved. It wasn’t a matter of “oh I got the idea to code LevelASM, I’ll just code it real quick” but it actually required careful planning, experimenting and a solid understanding of the algorithm.

In my opinion, this project is very revolutionary for SNES ROM hacking in general and I hope that other people will find even more optimization points. Furthermore, I hope that FuSoYa officially implements this in Lunar Magic once there’s a working SA-1 port.

Thanks to p4plus2, I learned new (optimization) tricks for the SNES. I also confirmed my suspicions that he is an actual wizard.

A Chip-8 emulator for the Super Nintendo Entertainment System

Update: Code snippets are now in Gist.

Oh hey. It has been ~2 years since my last post. Guess it’s time to bring a change to this.

So yeah, I coded a Chip-8 emulator for the SNES. You can view the project here:


And here is the playable SNES ROM!

The namesake is inspired by “Snes9x” (I never knew what the 9x really meant). I didn’t want to simply call it Super Chip-8 because there’s a variant of the Chip-8 called SCHIP (Super Chip) and it could cause confusion, and conveniently enough the name already had a number so I just stuck an “x” at the end.

How the idea came to be

Writing a (crude) SNES emulator has always been on my wishlist – so I started doing research on the SNES as well as emulation in general. Google seems to have taken notice of my emulation-related searches. During my daily routine of checking the Google app for stories to read, it suggested to me this certain article:

Writing a CHIP-8 emulator with Rust and WebAssembly

Upon seeing the title alone, I had three questions in mind: “What is Chip-8”, “what is Rust” and “what is WebAssembly”? I looked into the latter two and they didn’t really interest me. Looking into Chip-8 was another story, though.

I seem to have a soft spot for emulation, assembly and old/retro consoles in general. So when I read about the Chip-8 I noticed how simple it was (it’s considered the “hello world” of emulation) and I got really into the idea of writing an emulator for this. Then I found out there’s already like a thousand emulators for this. It was then that I had this impulsive thought: The Chip-8 is so simple, it could probably run even on the SNES. Amused by my own idea, I set up the project directory.

Setting up the project

To work on this project, all I needed are three tools:

  • A text editor: Notepad++
  • An assembler: asar
  • An accurate emulator/debugger: bsnes-plus

I also needed proper Chip-8 documentation, so I used “Cowgod’s Chip-8 Technical Reference“.

And… that’s about it. The rest was up to my coding and problem-solving skills.

Actual coding

Prior to this project, I had never finished a SNES homebrew ROM before. The greatest extent of my homebrewing is activating the screen and displaying things, but in terms of gameplay or controller inputs, I did nothing. This was going to be a whole new experience. I decided to approach this project in my own way – take a really safe approach and define every RAM address, and (almost) every magical number. I also decided to not optimize the code, because I think I would’ve lost control over my code really fast if I did that from the very beginning already.

The display of the Chip-8 is 64×32. The display of the (NTSC) SNES is 256×224 by default. p4plus2 gave me the idea to make the screen mode 7. It has a very simple graphics format (1 byte per pixel basically) and you can scale the screen, so I could make the Chip-8 display 256×128. People won’t have to squint their eyes when using the emulator, at least. Because the (scaled up) horizontal resolution is a perfect 256 pixels wide, I decided to use HDMA to color the screen boundaries.

In order to emulate the Chip-8, I had to allocate some RAM for its registers.

These are all in the SNES direct page (except for the memory), which would allow for slightly faster access to the registers.

I also allocated some RAM for opcode parameters. Each opcode could be ‘dissected’ into 6 variables: The opcode itself and 5 parameters. All of these variables are filled in regardless of the opcode currently being processed.

Inside the main game loop, I added a subroutine call to an opcode parser. Because every opcode is exactly two bytes, it was a matter of reading an opcode, then increasing the program counter by 2 to get to the next instruction.

I made use of a pointer table which is used by !Opcode. Each opcode will have its own function, but there are certain opcodes which act as a ‘container’ for another group of opcodes. The greatest example is opcode $08 – “Arithmetic”. The SNES – in my opinion – is pretty okay with pointer tables, and I had no problems with making yet another pointer table for those specific container opcodes.

My biggest problem was thinking of a proper solution for the Chip-8 input. Officially, the system has sixteen keys. The SNES only has twelve and you can’t use all 12 for conventional input. Start and select are in the middle of the controller and you’ll have to reach them with your thumb, forcing you to stop using the D-pad or the ABXY buttons. This leaves you with 10 buttons, and there’s no real way to divide 16 buttons over 10 buttons.

Then I got the idea of using button combinations. You could map the 16 keys to the ABXY buttons by using button combinations with the L+R buttons:

  • ABXY
  • L+[ABXY]
  • R+[ABXY]
  • LR+[ABXY]

However, using this controller scheme for every single ROM would be very as awkward and uncomfortable. So I got the idea to make a custom controller scheme for every playable ROM.

It involved extra work, but in the end, it was worth it. Technically the emulator supports all 16 keys, but at the same time, you can set intuitive controls for each playable ROM.

Closing words

In the end, I am surprised at myself for being able to complete this project at all. I think the very idea of coding an emulator for the SNES kept me going on. Personally, I think it’s a pretty crazy idea.

SMWCentral’s C3 event was also nearing so I thought it was the perfect opportunity to finish a project and show it off to everyone.

The fact that I can tell people that I coded an emulator for the SNES, no matter how simple the Chip-8 system may be, is something that I can be proud of.

Idea: A potential game engine

Most SNES ROM hackers probably have realized how limiting the SNES sometimes can be. You can only have 8 (H)DMA channels, the amount of OAM tiles on the screen is limited (think of the 33’s Range Over and 35’s Time Over limitations), the VRAM sometimes is small when you have a level with many varying graphics, etc. I guess that is understandable. Consoles didn’t have that much processing power back in the days due to limited technology, which was possibly also expensive.

Limited ROM space is also a major downer. Nintendo fit an entire game into a 512kB ROM – Super Mario World – using various tricks. They used LC_LZ2 compression, Reduced graphics from 4bpp to 3bpp, they used RLE compression, etc. They could only afford so much ROM space. As time passed, ROMs got bigger and bigger but the ROMs were still small enough to not unlock the extreme potential of the SNES (like allowing movie sequences and/or making quick time-event games).

The PPU registers are also pretty limiting. Certain registers can only be written to during V-blanks (like writing new graphical data). Write too much and you risk flashing scanlines at the top of the screen.

Sometimes, certain SNES limitations could be bypassed. There were enhancement chips which allow bankswitching for more ROM space for example, or chips which allow you to write bitmap data directly like the SuperFX. However, even with the enhancement chips, some limitations just don’t change. You can’t increase the amount of audio channels, you can’t increase the amount of layers, you can’t increase the amount of sprites shown on-screen, etc.

Finally, coding for the SNES has always been a pain to begin with. You code in ASM (Assembly). You write a bunch of LDAs and STAs and hope things work out. It’s very unreadable.

A question I had in mind for a long time is “How do I overcome the SNES limitations?” Of course, there are multiple answers for that, ranged from something as simple as “just don’t attempt it” to “even if you overcome it, it’ll have limited practical use”. So I decided to look at it from another point of view – emulators.

It should be possible to modify emulators to include less limitations, but the concept of “ROM hacking” wouldn’t remain the same anymore. Sure, you can allow 512 or more layers, but obviously this won’t run on the actual SNES. It’ll pretty much be like building your own game engine, and you’re still limited to ASM of course.

This idea gave me another idea. How about building a game engine based on the SNES’ hardware, but with its limits gone? You’d have infinite graphics space. You’d have infinite layers. You’d have infinite music channels. Infinite palettes, etc. You can have as many HDMA channels as you want affecting layers, brightness registers, and so on. You won’t have to allocate RAM for variables manually (because declaring variables in a high-level language just picks a free RAM address for you). You can display as many sprite/OAM tiles as you want to display as long as your graphics card can handle it. And so on.

For example, want a level with very neat parallax scrolling? Pick 5 layers with 4bpp graphics. Or pick 5 layers with 8bpp graphics with one set of palettes for each 8bpp layer. Or you could have a mode 7 background while you have a fully playable level.

Only question is, how would this be built? Take bits and pieces from emulators and make your own engine, or start completely from scratch? Personally I’d attempt the former, but I wouldn’t even know in what language to start. Oh well.