Craftsmanship Notes: November 2024

Description

Chargen is a desktop tool written in Java to generate images and biographies for fantasy characters based on their skills.

I wrote a sketch version of it around May of 2024, and released an official version in November 29th, 2024.

The application assumes you have an LLM server running on your machine (this is a deliberate choice), on port 8080 (I plan to make this configurable in the near future), and a Stable Diffusion server for image generation also running on your machine on port 7860 (should also be configurable in the near future). The application presents a graphical interface for entering the information about the character, such as their name, class and skills, and two buttons: one to generate an avatar image for the character, and the other to generate a textual biography. Both use the information entered about the character to generate appropriate outputs. The biography and avatar generated can be saved to disk as text and image.

Chargen was developed in the spirit of a proof-of-concept, so that I could explore interacting with two different generative AI tools in the same application and learn from this experience. It is not intended to be used by a general audience.

Context

I developed Chargen as I was exploring how to build systems using generative AI technologies for the first time. I blogged about the set of 4 projects that came out of this exercise in my post entitled "First Steps Into AI Engineering". Chargen was the second of these 4, and the first one to deal with image generation.

As described in the post just mentioned, for Chargen I wanted to write most of the code myself, without relying too much on frameworks, so as to get a better feeling about working with these models. It is also intentional to explore only open and local AI models, as it is my intention to find out how far can we go working only with models that can be fully personal and owned by its users.

For Chargen I chose to work with text generation and image generation as separate tools instead of using a LLM that also has image output, as those are not yet very widespread as open source models.

Highlights

The biggest benefit of this project to me was learning how to generate images programmatically with Stable Diffusion. Although I had already used it quite a bit, it had always been through some provided UI, such as the classic text-generation-webui by Oobabooga. Getting used to access this service through code opened up my mind to a lot of cool ideas that hopefully I will be able to explore in the future.

Another good point was getting familiar with using LLMs as a single-purpose tool. I have always liked Simon Willison's analogy of seeing LLMs as a "calculator for words", and I think that one of the reasons why this requires some mental effort to achieve is that the usual way we interact with LLMs - as a chatbot - steers our thinking towards the opposite direction. We get lost in the simulation of emotions, the rhetoric and the stealthy handling of conversation state, and that makes it hard to see the system we are interacting with as something that is just processing some text to generate other text. Using them in the context of Chargen, in which every request has a very specific type of output and fulfills one very specific need, without any context or state from previous messages, helps the vision of LLMs being a calculator.

Chargen also gave me an opportunity to work with prompts in a more nuanced way. When developing a simple chatbot, usually you only think of a good system prompt to guide the tone of the conversation, and from that point onwards just relay to the model the state of the conversation plus the new message from the human interacting with it. In Chargen, however, both prompts (for generating the avatar and the biography) were tailored to produce a very specific result, almost as a function. Thinking about how to integrate the user input (in the form of the character's attributes) with the function-like prompt was an enlightening experience.

One thing that ended up being challenging was coming up with a prompt to generate good avatars that did not fall into the uncanny valley. I believe this was mostly because I chose to restrict myself to working with the base Stable Diffusion 1.5 model, which is quite old by now and nowhere near being among the best ones you can find even in the open source landscape. The fact that I did not want to include any trademarketed words in the prompt also had a big impact. I could not find a prompt that would generate excellent images, so I settled for one that would generate reasonably acceptable ones, most of the time. For any application looking for more robust results, this should be easily fixable by changing the two major points I mentioned.

For this project I wanted to work only with Java and focus on learning about interacting with generative AI models, so I picked a GUI library that does not generate the most beautiful applications ever. Swing had the advantage that it requires no external dependencies, that it is fairly straightforward to use and that I had already used it in several other projects in the past. But neither the code to create the views nor the views themselves end up particularly beautiful, so for any serious attempt at creating a similar application for a wider audience, a better GUI engine should be chosen. I do not intend to spend too much time improving the UI either.

Chargen requires two generative AI models to be running on your local machine at the same time. That makes it almost impossible to expect anyone else to run it by themselves to check out, so I am not investing into making it too user-friendly. Instead, I see it as a successful proof-of-concept, that gives good indications on how to build a similar system for widespread use if desired - one that should definitely then consume remotely hosted models. I was a little surprised to discover that my machine handled pretty well running all the required setup to make Chargen work - but then again, I am a software developer interested in AI that also likes to play games on his PC, so my machine is not really an average end-user's one.

Future Expansions

As mentioned previously, I do not intend to spend too much time improving Chargen, as I already reaped most of the benefits I expected from it in the initial development. It could certainly benefit from some work to make the UI more pleasant and a deeper research for image generation models that could put out better results, but I am content with the current state as I do not expect it to be used by other people.

I do intend to make it slightly more customizable, though. Especially by allowing for a similar configuration through command-line options as JenAI has. The most important parameters being the ports for the LLM and the Stable Diffusion models. I will have to make the "model" property configurable in order to support ollama as a backend as well. I might make the prompts configurable in the future, but do not plan to do so as of now.

Another possible expansion would be to add more constraints to the character attributes, such as a maximum number of total points to be distributed, or drop-down options for details that can be restricted to a set. If I do implement this in the future, I would prefer to make it an optional feature instead of replacing the current, freer, functionality. I do not see a very strong reason to restrict the possibilities that the generative AI models give to a creative user in this context.

Setup

Although Chargen is not one of my portfolio projects (which have a fixed set of quality standards I expect to maintain through their entire lifecycle), I did configure most of the foundations I use for those.

I use Github Actions to generate a new release for Chargen whenever new code is pushed into the main branch and alters relevant core files of the project.

I have a changelog file, a file with guidelines about contributing and an architecture.md file (an idea I adapted from this great article), beyond the usual readme file as documentation.

As this project was done in an exploratory and proof-of-concept approach, I did not include automated tests. This is the main point of departure with regards to the quality standards I expect from my portfolio projects.

Links

Source code: Github

Executable: Releases

October was a quieter month than September for me. There were less events going on, so I could focus more on my usual habits and routine. A few interesting movements happened in my main job that are not yet ready for disclosure, but that should be officially announced in the next few weeks. I was also able to get a fair bit done in my personal projects. Here is a short summary of what happened:

Achievements

JenAI

JenAI is the main project I have been working on in my spare time, I described it in detail here. I was able to complete the initial scope I had planned for it in this month, with the implementation of streaming responses. I also added another minor (but very useful) feature, and made some enhancements to the overall quality of the project. To put it in specific terms:

Release 1.6.0: Added Streaming Response feature. With this mode, the response from the model is streamed continuously as it is generated, which ensures a quicker feedback and overall better experience. It also prevents the interface being locked for several minutes (if using a very large model, for instance) while the response is being computed. This release did not remove the previous default mode (block response), which can still be activated with a command line option on startup.
Release 1.7.0: Added Temperature feature. This released added a new command line option to set the temperature to be used by the model when computing responses. By tuning the value used for this option, it is possible to determine how creative/random the answers will be during the conversation.
Quality improvements: This month I started adding unit tests to the project. Since the project started as an exploratory exercise, which I used to learn how to use LLM server APIs, I did not feel the need to add unit tests to it (quite to the contrary, for explorations they only get in the way). However, as my understanding of how to use the APIs grew and JenAI incorporated more features, it slowly shifted to a project I would like to add to my personal portfolio. For that, I consider it mandatory to include unit tests, so I started adding them this month. It has been a nice experience, as it has allowed me to see opportunities for better design and to apply some techniques I have been studying to deal with legacy code (and serves as a reminder that even your own recent code can be legacy if it does not include automated tests). Additionally, I have also added a CONTRIBUTING.md file with guidelines of how to contribute to the project and created new screenshots to use in the README, using the most recent version.

With the new additions done in this month, I consider the initial scope of JenAI complete. The most likely next step would be to incorporate running inferences directly within the project, instead of relying on having an LLM server running locally and using its API. For that, I will need to find a good and reliable Java library (I have no intention of implementing that by myself), and while I do have some candidates already shortlisted to investigate further, currently there is none that I know for sure would serve. If I do end up implementing that in the future, it would likely bump the major version of the project, and effectively start JenAI version 2. That is not something I intend to do before finishing at least three other projects related to AI, so there is little chance of it coming before 2025.

CoIntelligence book review

Earlier this year I read the book CoIntelligence: Living And Working With AI, by Ethan Mollick, as I have been fascinated with the potential of generative AI to have a profound impact in every part of our culture in the near-to-mid future. I picked the book as a hobby study in my personal studies habit. Ever since I finished studying it, way back in June, I have been working on a review for it. In October I finally got to write and post it. With this, I can now pick up the next hobby study book (Salman Khan's Brave New Words) and start it, as I had been waiting to get the review done to avoid mixing up the contents due to them talking about closely related topics.

Personal studies habit

September was so busy that I was not able to make much progress in my personal studies habit, as I mentioned in the monthly recap post for that month. With October being a quieter month, I was able to get back on track on this. I progressed nicely in the career study book I am working on (Working Effectively With Legacy Code), and I got to write the review for CoIntelligence, as mentioned just above.

Plans for November

Start next AI Engineering project

With the initial scope for JenAI done, I can start the next project in my First Steps Into AI Engineering series. I expect this one to have a shorter development cycle than JenAI, as I will reuse much of the API-related features and enhancements I did for that project into this new one. My current expectation is to have it in a state good enough to be made public by the middle of the month, and have it completed by the end of the month - though those are not really hard deadlines. I will be writing about it as I progress in its development.

Keep up with routine habits

Other than that, I intend to keep up the rhythm with my usual habits and routine. I am starting Brave New Words as the next hobby study book, and will probably finish Working Effectively With Legacy Code as the career study book in this month. Hopefully I will also be able to share more news about the developments in my daily job as well.

Craftsmanship Notes

Thursday, November 28, 2024

Project: Chargen