Research and experiments are at the heart of any exercise that involves AI. Building LLM applications is no different. Unlike traditional web apps that follow a pre-decided design that has little to no variation, AI-based designs rely heavily on the experiments and can change depending on early outcomes. The success factor is experimenting on clearly defined expectations in iterations, followed by continuously evaluating each iteration. In LLM-native development, the success criteria is usually the quality of the output, which means that the focus is on producing accurate and highly relevant results. This can be either a response from chatbot, text summary, image generation or even an action (Agentic approach) defined by LLM. Generating quality results consistently requires a deep understanding of the underlying language models, constant fine-tuning of the prompts, and rigorous evaluation to ensure that the application meets the desired standards.
What kind of tech skill set do you need in the team?
You might assume that a team with only a handful of data scientists is sufficient to build you an LLM application. But in reality, engineering skills are equally or more important to actually ‘deliver’ the target product, as LLM applications do not follow the classical ML approach. For both data scientists and software engineers, some mindset shifts are required to get familiar with the development approach. I have seen both roles making this journey, such as data scientists getting familiar with cloud infrastructure and application deployment and on the other hand, engineers familiarizing themselves with the intricacies of model usage and evaluation of LLM outputs. Ultimately, you need AI practitioners in team who are not there just to ‘code’, rather research, collaborate and improve on the AI applicability.
Do I really need to ‘experiment’ since we are going to use pre-trained language models?
Popular LLMs like GPT-4o are already trained on large set of data and capable of recognizing and generating texts, images etc., hence you do not need to ‘train’ these types of model. Very few scenarios might require to fine-tune the model but that is also achievable easily without needing classical ML approach. However, let’s not confuse the term ‘experiment’ with ‘model training’ methodology used in predictive ML. As I’ve mentioned above that quality of the application output matters. setting up iterations of experiments can help us to reach the target quality of result. For example — if you’re building a chatbot and you want to control how the bot output should look like to end user, an iterative and experimental approach on prompt improvement and fine-tuning hyper parameters will help you find the right way to generate most accurate and consistent output.
Build a prototype early in your journey
Build a prototype (also referred to as MVP — minimum viable product) with only the core functionalities as early as possible, ideally within 2–4 weeks. If you’re using a knowledge base for RAG approach, use a subset of data to avoid extensive data pre-processing.
- Gaining quick feedback from a subset of target users helps you to understand whether the solution is meeting their expectations.
- Review with stakeholders to not only show the good results, also discuss the limitations and constraints your team found out during prototype building. This is crucial to mitigate risks early, and also to make informed decision regarding delivery.
- The team can finalize the tech stack, security and scalability requirements to move the prototype to fully functional product and delivery timeline.
Determine if your prototype is ready for building into the ‘product’
Availability of multiple AI-focused samples have made it super easy to create a prototype, and initial testing of such prototypes usually delivers promising results. By the time the prototype is ready, the team might have more understanding on success criteria, market research, target user base, platform requirements etc. At this point, considering following questions can help to decide the direction to which the product can move:
- Does the functionalities developed in the prototype serve the primary need of the end users or business process?
- What are the challenges that team faced during prototype development that might come up in production journey? Are there any methods to mitigate these risks?
- Does the prototype pose any risk with regards to responsible AI principles? If so, then what guardrails can be implemented to avoid these risks? (We’ll discuss more on this point in part 2)
- If the solution is to be integrated into an existing product, what might be a show-stopper for that?
- If the solution handles sensitive data, are effective measures been taken to handle the data privacy and security?
- Do you need to define any performance requirement for the product? Is the prototype results promising in this aspect or can be improved further?
- What are the security requirements does your product need?
- Does your product need any UI? (A common LLM-based use case is chatbot, hence UI requirements are necessary to be defined as early as possible)
- Do you have a cost estimate for the LLM usage from your MVP? How does it look like considering the estimated scale of usage in production and your budget?
If you can gain satisfactory answers to most of the questions after initial review, coupled with good results from your prototype, then you can move forward with the product development.
Stay tuned for part 2 where I will talk about what should be your approach to product development, how you can implement responsible AI early into the product and cost management techniques.
Please follow me if you want to read more such content about new and exciting technology. If you have any feedback, please leave a comment. Thanks 🙂