Saturday, August 24, 2024

First Steps Into AI Engineering

Generative AI is the biggest development we have achieved in our generation.

Nothing so transformative has appeared since mobile phones and the cloud - and, arguably, generative AI has the potential to have a much bigger impact than either of those in the years to come. It is very exciting to witness something like this happening first-hand.


My interest in the field

I have followed the entry of this area into mainstream discourse for a while. I started last year, 2023, working on my graduation project at college, in which I applied an older LLM (Google's BERT) for a specific task (poem classification), so I had just been studying and understanding how these models work internally when they exploded in popularity. For the rest of that year (and the first part of this one) I was too busy solving all sorts of issues, but I tried to keep up with the news, reading regularly whenever something came up and trying stuff out whenever time allowed it.

Now that things are more stable, I have started exploring the use of these technologies for building systems. While I am not particularly interested in training models and all of the Machine Learning / Data Science part of the endeavour, I am very interested in how we can take these models and use their capabilities in ways that expand what we can achieve with computing systems beyond what has been possible until now. This is what is being called AI Engineering.

So in the past three months I have been building a few projects to get familiar with incorporating generative AI models (mostly LLMs, but also image generation models) into deliverable systems. These were mostly exploratory work, private PoCs and experiments meant for practice. Now that I have learned a few things by building them, I am making public and open versions of them to leave as reference. I expect to work on this for a couple of months, as I create public repositories for each on Github and polish both the source code and the set of features available.


Axioms

There are a few things I have decided to follow in these projects. These things are very important to me, at this particular moment of time. Some of them might change in the future, but, for now, I feel comfortable in making them axioms for this initial set of projects. They are:


1. I want the projects to work specifically with local models.

2. I want to build these projects using Java.

3. I want the projects to be as minimalistic as possible, avoiding ready-made frameworks like Spring AI or LangChain.


Of these, I consider the first one to be the most important. At the moment, my interest is very much into what we can build using local, private, personal AI models. Even though most of the "frontier" models are closed, I believe the real transformative nature of this technology can better express itself when it is in the hands of the final users, in a similar way as what happened when personal computers became ubiquitous, breaking through the model of large mainframe servers controlled centrally by some entity. I might change the focus of my interest with regards to this in the future, but I don't expect that to be very likely.

The second and the third ones are due to more practical concerns: I want to build them with Java because it is my main professional language and using it will serve as a double exercise; I want to avoid frameworks because I want to build things by myself as much as possible in this first stage of learning. Both of them might very well change in the short or middle term future, but they are set in stone for these specific projects.


Scope

I came up with 4 projects to develop as the scope for these first steps. Each one of them allows me to explore a new aspect of the technology in a more or less progressive manner.

Three of those four I have already developed in a sketch-y manner, I intend to recreate open versions of them from scratch while making their code cleaner and more organized. I haven't included in the scope making them usable to a large audience, as their purpose is still to be exploratory experiments to give me practice in working with generative AI technologies, instead of becoming general-use tools for a community.

The fourth one I have not yet developed, but I have a clear enough idea of what I want with it to be confident that it should not take too long to code up.

I intend to write a blog post for each one of them as I publish their public versions. I will also update this post with links to each one during the process.


Projects (ongoing)


1. JenAI.

The first project is the simplest application one can think of for using LLMs: a chatbot. And in the simplest platform a developer could think of: the terminal. This project allowed me to learn the basics of consuming LLM models through API calls, and managing the state for a continuous conversation. I am having fun using it personally both for silly humorous purposes as well as serious work and learning, but its scope will never grow enough to match the production-grade alternatives, such as Simon Willison's llm.

Source code: JenAI @ GitHub

Blog post: JenAI @ Blogger


2. Chargen.

The second project is a desktop application to generate both avatar pictures and biographies for fictional characters. This project allowed me to work with two different types of generative AI models within the same application (it was also the first time I worked with Stable Diffusion programmatically). It required a different approach to prompting, one that used more specific prompts that would be used only once (so no state handling) and with a very specific type of expected output.

Source code: Chargen @ GitHub

Blog post: Chargen @ Blogger


3. Local Language Practice (LLP).

The third project is a desktop application to practice languages through roleplaying a conversation between two characters. This one was considerably more complex than the two before it, and made me think about and iterate through the main prompts (especially how they would be built using extra information from each scene) with much more care. It also allowed me to add an extra usage of LLMs, as a built-in translator widget to help maintain the flow of the conversation while clarifying any part that the user does not understand. It was quite interesting to integrate both usages in the same system, as it allowed me to think about LLMs in a more abstract manner, as components of an application instead of the entire application itself (or at least the overwhelming core).

Source code: LLP @ GitHub

Blog post: LLP @ Blogger


(This post will be updated with the other projects as they are released.)


No comments:

Post a Comment

Monthly Recap - 2025-07 - July

July was completely a vacation month. I took a month off of work before switching areas inside the company, and used this opportunity to als...