Description
Chargen is a desktop tool written in Java to generate images and biographies for fantasy characters based on their skills.
I wrote a sketch version of it around May of 2024, and released an official version in November 29th, 2024.
The application assumes you have an LLM server running on your machine (this is a deliberate choice), on port 8080 (I plan to make this configurable in the near future), and a Stable Diffusion server for image generation also running on your machine on port 7860 (should also be configurable in the near future). The application presents a graphical interface for entering the information about the character, such as their name, class and skills, and two buttons: one to generate an avatar image for the character, and the other to generate a textual biography. Both use the information entered about the character to generate appropriate outputs. The biography and avatar generated can be saved to disk as text and image.
Chargen was developed in the spirit of a proof-of-concept, so that I could explore interacting with two different generative AI tools in the same application and learn from this experience. It is not intended to be used by a general audience.
Context
I developed Chargen as I was exploring how to build systems using generative AI technologies for the first time. I blogged about the set of 4 projects that came out of this exercise in my post entitled "First Steps Into AI Engineering". Chargen was the second of these 4, and the first one to deal with image generation.
As described in the post just mentioned, for Chargen I wanted to write most of the code myself, without relying too much on frameworks, so as to get a better feeling about working with these models. It is also intentional to explore only open and local AI models, as it is my intention to find out how far can we go working only with models that can be fully personal and owned by its users.
For Chargen I chose to work with text generation and image generation as separate tools instead of using a LLM that also has image output, as those are not yet very widespread as open source models.
Highlights
The biggest benefit of this project to me was learning how to generate images programmatically with Stable Diffusion. Although I had already used it quite a bit, it had always been through some provided UI, such as the classic text-generation-webui by Oobabooga. Getting used to access this service through code opened up my mind to a lot of cool ideas that hopefully I will be able to explore in the future.
Another good point was getting familiar with using LLMs as a single-purpose tool. I have always liked Simon Willison's analogy of seeing LLMs as a "calculator for words", and I think that one of the reasons why this requires some mental effort to achieve is that the usual way we interact with LLMs - as a chatbot - steers our thinking towards the opposite direction. We get lost in the simulation of emotions, the rhetoric and the stealthy handling of conversation state, and that makes it hard to see the system we are interacting with as something that is just processing some text to generate other text. Using them in the context of Chargen, in which every request has a very specific type of output and fulfills one very specific need, without any context or state from previous messages, helps the vision of LLMs being a calculator.
Chargen also gave me an opportunity to work with prompts in a more nuanced way. When developing a simple chatbot, usually you only think of a good system prompt to guide the tone of the conversation, and from that point onwards just relay to the model the state of the conversation plus the new message from the human interacting with it. In Chargen, however, both prompts (for generating the avatar and the biography) were tailored to produce a very specific result, almost as a function. Thinking about how to integrate the user input (in the form of the character's attributes) with the function-like prompt was an enlightening experience.
One thing that ended up being challenging was coming up with a prompt to generate good avatars that did not fall into the uncanny valley. I believe this was mostly because I chose to restrict myself to working with the base Stable Diffusion 1.5 model, which is quite old by now and nowhere near being among the best ones you can find even in the open source landscape. The fact that I did not want to include any trademarketed words in the prompt also had a big impact. I could not find a prompt that would generate excellent images, so I settled for one that would generate reasonably acceptable ones, most of the time. For any application looking for more robust results, this should be easily fixable by changing the two major points I mentioned.
For this project I wanted to work only with Java and focus on learning about interacting with generative AI models, so I picked a GUI library that does not generate the most beautiful applications ever. Swing had the advantage that it requires no external dependencies, that it is fairly straightforward to use and that I had already used it in several other projects in the past. But neither the code to create the views nor the views themselves end up particularly beautiful, so for any serious attempt at creating a similar application for a wider audience, a better GUI engine should be chosen. I do not intend to spend too much time improving the UI either.
Chargen requires two generative AI models to be running on your local machine at the same time. That makes it almost impossible to expect anyone else to run it by themselves to check out, so I am not investing into making it too user-friendly. Instead, I see it as a successful proof-of-concept, that gives good indications on how to build a similar system for widespread use if desired - one that should definitely then consume remotely hosted models. I was a little surprised to discover that my machine handled pretty well running all the required setup to make Chargen work - but then again, I am a software developer interested in AI that also likes to play games on his PC, so my machine is not really an average end-user's one.
Future Expansions
As mentioned previously, I do not intend to spend too much time improving Chargen, as I already reaped most of the benefits I expected from it in the initial development. It could certainly benefit from some work to make the UI more pleasant and a deeper research for image generation models that could put out better results, but I am content with the current state as I do not expect it to be used by other people.
I do intend to make it slightly more customizable, though. Especially by allowing for a similar configuration through command-line options as JenAI has. The most important parameters being the ports for the LLM and the Stable Diffusion models. I will have to make the "model" property configurable in order to support ollama as a backend as well. I might make the prompts configurable in the future, but do not plan to do so as of now.
Another possible expansion would be to add more constraints to the character attributes, such as a maximum number of total points to be distributed, or drop-down options for details that can be restricted to a set. If I do implement this in the future, I would prefer to make it an optional feature instead of replacing the current, freer, functionality. I do not see a very strong reason to restrict the possibilities that the generative AI models give to a creative user in this context.
Setup
Although Chargen is not one of my portfolio projects (which have a fixed set of quality standards I expect to maintain through their entire lifecycle), I did configure most of the foundations I use for those.
I use Github Actions to generate a new release for Chargen whenever new code is pushed into the main branch and alters relevant core files of the project.
I have a changelog file, a file with guidelines about contributing and an architecture.md file (an idea I adapted from this great article), beyond the usual readme file as documentation.
As this project was done in an exploratory and proof-of-concept approach, I did not include automated tests. This is the main point of departure with regards to the quality standards I expect from my portfolio projects.
Links
Source code: Github
Executable: Releases