In the rapidly evolving landscape of artificial intelligence (AI), Generative AI stands as a beacon of innovation, offering the capability to create new content across various modalities such as text, images, and audio. With its ability to go beyond mere data interpretation and generate novel outputs, Generative AI models have garnered significant attention and interest. In this discussion, we delve into the fundamentals of Generative AI, exploring its applications, challenges, and implications for industries worldwide.
What is Generative AI
Generative AI is a subset of artificial intelligence focused on the creation of new content rather than solely analyzing existing data. Generative AI models are trained on a large corpus of data and trained to generate new content similar to the data they’ve been trained on.
There are several types of generative AI models, including.
ChatGPT from Open AI has been the cynosure of public attention since its release on Nov 30, 2022. It provides several LLMs (the most recent and powerful one is GPT 4.0). Large language models (LLMs) are generative AI models pre-trained on extensive internet data and usually based on transformer architecture. The advent of LLMs has thus opened up a world of possibilities, driving innovation and efficiency across various sectors.
One common use case of generative AI is known as RAG (Retrieval Augmented Generation). RAG empowers enterprises to enhance the power of LLMs with the enterprise or external data and generate responses that are specific to the domain.
At a high level, RAG involves preprocessing enterprise data into small chunks and embedding them using existing technology, essentially converting data into numbers. These embeddings are stored in a specialized vector database, enabling semantic search. When a user queries the database, the most relevant chunks are retrieved and sent to an LLM along with additional context, which is used to generate a response.
Challenges in Implementation
While the integration of generative AI, particularly LLMs, can significantly enhance productivity, it does present a fair bit of challenges. The abundance of options and the surrounding noise concerning tools, technologies, and vendors can be overwhelming. Here are some key questions to consider.
Open-source vs closed: There are way too many open-sourced models on Hugging Face to choose from and also there are many closed models like Open AI, Claude from Anthropic, Gemini, etc. to choose from. Do we choose smaller models or larger ones, there are cost/benefit trade-offs. Are we running the model on edge devices, does the quantized model perform as advertised and give a significant performance with reduced cost?
In context learning vs fine-tuning: Is the task simple enough that in context, few shot inference would suffice or it’s complex enough that would require fine-tuning? Fine-tuning would be expensive and would require maintaining a reasonable tech stack.
Embedding selection: Embedding uses a pre-trained model to convert the text chunk to numbers and there would one-time cost associated with whichever model is used for embedding. Several options are available both open source and closed.
Make the model grounded: Making the model grounded, means using LLM with information that is specific to the use cases only. It’s important to ensure the quality of response and have the necessary guardrails.
Vector DB selection: There are several vector DBs that we can choose from. There are open-sourced versions, in memory vector DB and SAAS-based offerings. The choice would depend on the use case.
Validations: There are several scenarios that the validation needs to be based on.
o Retrieval quality – Relevance to the user query
o Hallucination – Making sure the LLM are grounded
o Unclear queries – If the queries are not clear, the RAG system should seek
clarifying questions and also guiding users to ask questions within the context the system is created to be used.
o Privacy violations – No PII information and other sensitive information
o Context adherence - Out of domain question
Costs: Typically, the cost can be put in the following broad buckets
o Vector DB: Several vector DB vendors offer slightly different pricing models, including options for serverless and dedicated single-tenant deployment.
o Embedding cost: The text chunk needs to be embedded before saving the vector DB. The embedding is done via an embedding model and there is a choice of using the open-sourced or closed one.
o Model selection: Most closed models like the one offered by Open AI,
Anthropic, cohere, etc. have pricing based on tokens. Model selection would
depend on performance, accuracy required, and price.
o Compute cost: Typically, it would depend on the scale requirement for the inference server and if the open-source model is hosted on the cloud then it would depend on the size of the model and the hardware chosen. There is also open of going serverless.
With so many variables a good decision framework is needed to select appropriate tools and tech so that a cost-effective RAG-based, valuable Gen AI solution can be implemented
The key external dependencies that RAG-based solutions would generally have are the foundational models (i.e. LLMs), vector DB, some 3rd party solutions for tracing and monitoring the effectiveness of the RAG solution, and the compute cost at the cloud.
The solution should be designed, so that it’s easier to swap these as need be, using traditional software architectural patterns to avoid hard dependencies. Often it would require a few iterations to settle on one of them and would also require the ability to change it dynamically based on the observed response from the system.
Choosing the LLM
Selecting the appropriate LLM would depend on the performance, cost, and complexity of the task. The usual approach is to try out a few and have the ability to change them.
The decision tree below gives a general guideline on choosing one.
Choosing the Vector DB
There are dozens of vector DB solutions available, both open-source and non-open-source. The choice of the vector DB would depend on speed, scalability, developer experience, and community support. There are in-memory vector DB as well. Here is a comparison of the top few of them, along with some features:
Vecdbs.com provides a card view of the most popular ones.
** There are many open-source vector DB benchmarking tools. There is extensive documentation on GitHub about these tools and also the result of the test already performed, using the open datasets, vector dimension, and the similarity search that was performed. Listing some of them below:
Some performance benchmarking was done that shares the latency result and QPS results on different vector DB with 1 million data. Read more here.
Choosing the monitoring and analytics tool
RAG-base applications could become a black box unless we have some tracing and analytic agent, that traces the interactions with LLMs, quality of response, quality of retrieval etc. This is an evolving area, one such good tool is Galileo.
Galileo RAG & Agent Analytics help track the performance and quality of RAG applications. The quality is measured on context adherence, completeness, chunk utilization, tone, factual accuracy, applications, and many other parameters.
Conclusion
The Generative AI landscape is rapidly evolving, presenting numerous challenges in implementation, tech stack selection, and observability and maintenance.
In today’s dynamic business environment, aligning risk management with strategic decision-making is essential.
At Kaamsha, we lead the way, helping companies navigate the Generative AI landscape with confidence, from identifying strategic use cases to actual deployment. We prioritize a balanced approach to adoption, integrating seamlessly with privacy and security controls. Our commitment is to promote both performance and trust, playing a pivotal role in the responsible and effective advancement of AI technologies.
Comments