Craftsmanship Notes: Project: LicLacMoe

Description

LicLacMoe is a desktop application that allows you to play tic-tac-toe against local Large Language Models (LLMs).

I released the first official version in May 4th, 2025. The name, of course, is a word play with tic-tac-toe, replacing each first letter with the initials of "large language model".

The application assumes you have an LLM server running on your machine (this is a deliberate choice), by default on port 8080 but that is configurable. It presents the player with a visual tic-tac-toe grid that can be used for playing - once the player makes their move, a call to the local LLM server is made with the current state of the match, so that the LLM can pick the next move of the AI opponent. The entire interaction with the LLM is through playing, with no conversational interface.

I developed LicLacMoe as a way to explore using LLMs in a way that does not involve any conversation between the user and the AI system. Chatbots have become almost synonymous with LLMs, in large part due to how they were popularized, so it was an interesting experiment to use them in a completely different manner.

Context

I developed LicLacMoe as I was exploring how to build systems using generative AI technologies for the first time. I blogged about the set of 4 projects that came out of this exercise in my post entitled "First Steps Into AI Engineering". LicLacMoe was the fourth and last of these, and the most purely exploratory one.

As described in the post just mentioned, for LicLacMoe I wanted to write most of the code myself, without relying too much on frameworks, so as to get a better feeling about working with these models. It is also intentional to explore only open and local AI models, as it is my intention to find out how far can we go working only with models that can be fully personal and owned by its users.

Highlights

Interacting with LLMs without chatting

LLMs have caught our full attention due to their uncanny ability to behave like a human being in a conversation. However, the big question on everyone's minds was if the models actually have some degree of reasoning intelligence, or if they are just really good at reproducing our patterns of communication (of course, it must be mentioned the obvious philosophical question: "could it be that there is no difference?"). To a certain degree, the appearance of reasoning models, and the current trend of agentic AI, have shown that LLMs can definitely be exploited for some amount of reasoning intelligence, but at large scales the question still remains. My first intent with creating an application that uses LLMs without any chat interface was to see how it would feel to use LLMs purely as a source of thinking, without any verbal communication. While the long time it takes for it to generate an answer can be a bit frustrating, overall it was a positive experience - it is a really, really weird way of interacting with a computer system.

My second intent was to just get used to incorporating LLM answers in a bigger system, as part of the User Interface. I honestly think chatting (especially when you have to type long messages) is not the best interface for any complex computer system, far from that. In order to make full use of the potential that generative AI systems have, we must learn to incorporate them seamlessly into our flows, and that includes into our computerized applications. This was just a first step in this direction, I have several other ideas I want to explore further with regards to this.

Not needing to code game rules and strategy

As mentioned previously, using LLMs for pure intelligence is a really weird experience. One of the weirdest parts was that, in order to implement LicLacMoe, I did not have to implement a strategy that knew the rules of tic-tac-toe at all. I still implemented the logic of the game in order to verify the end result of matches, but I think with a little more development time I could have even replaced that with well-crafted prompts.

I am sure that this was in big part due to tic-tac-toe being an extremely simple and popular game. It is reasonable to assume that most (if not all) models will have seen enough examples of matches and descriptions of the game to be have memorized a pretty good understanding of how to play it. The same would most likely not be the case for more complex games - I find it very interesting to think about how complex of a game is it possible to teach LLMs simply by feeding it enough cases.

Regardless, it felt very odd to rely on a system that "just knew" the rules, and to which I could just feed the current state of the match and it would produce a next move. Of course, it would not always be a valid move (error handling and retry policies were more essential in here than in any other LLM-based system I have implemented so far), nor a particularly brilliant one. But even small models would consistently give something workable in a reasonable amount of time (and retries).

Reasoning vs non-reasoning models

This leads into the final interesting note. While testing the application, I found that non-reasoning models would mostly generate moves that looked a bit random, and could very easily be defeated. I had to make some changes to the logic of parsing the answer from the LLM to support using reasoning models - however, changing to these models drastically improved the performance of the AI player. I tested it with Qwen 3, 8B parameters, 8 bit quantization - a rather small model as far as LLMs go. In comparison, the non-reasoning model I used was Gemma 3, 27B parameters, 8 bit quantization, a model more than 3 times the size. While I have never been a huge fan of reasoning models (to my common use cases they usually don't offer too much improvement, and are considerably slower), in this particular case it was easy to see the value that such models bring.

Future Expansions

Benchmark of performances

As mentioned before, while testing the application I used a non-reasoning 27B parameters models (which had bad performance) and an 8B reasoning model (with significantly better performance). One thing I would like to do, if I ever have the time to, is make a more comprehensive list of the performance for several models of different families and sizes. I would be especially interested in seeing how small in size we could go with a reasoning model and still have it able to avoid defeat in most matches. I would be pleasantly surprised if this is possible with a model smaller than 4B.

Induce reasoning for non-reasoning models

Another interesting exploration would be to craft the base prompt so that even non-reasoning models would think about the current state first before choosing a move. This is easily done with very popular techniques to force step-by-step thinking. It would involve changing the base prompt and possibly the parsing of the response as well. This could then be compared with the improvement in performance gained when switching to a reasoning model, to see if the training that these models receive in reasoning actually gives them an advantage or not.

Model vs model

Finally, the last extension I might make is to change the game to support AI vs AI mode, with two LLMs playing against each other. This could then allow tournaments to be played, and metrics to be gathered as to which models performs better against each other. It would be a nice and fun addition, but it probably won't be a priority any time soon for me.

Setup

Although LicLacMoe is not one of my portfolio projects (which have a fixed set of quality standards I expect to maintain through their entire lifecycle), I did configure most of the foundations I use for those.

I use Github Actions to generate a new release for LicLacMoe whenever new code is pushed into the main branch and alters relevant core files of the project.

I have a changelog file, a file with guidelines about contributing and an architecture.md file (an idea I adapted from this great article), beyond the usual readme file as documentation.

As this project was done in an exploratory and proof-of-concept approach, I did not include automated tests. This is the main point of departure with regards to the quality standards I expect from my portfolio projects.

Links

Source code: Github

Executable: Releases

Craftsmanship Notes

Monday, June 30, 2025

Project: LicLacMoe