Hindi and Tamil Question Answer / RAG

In this notebook, we use new Navarasa LLMs from TeluguLLM to create a Hindi and Tamil Question Answering system. Since we’re using a 7B model with PEFT, this notebook is run on Google Colab with an A100. If you’re working with a smaller machine, I’d encourage to try the 2B model instead.

Time: 25 minLevel: BeginnerOpen In Colab
AuthorNirant Kasliwal
  1. !pip install -U fastembed datasets qdrant-client peft transformers accelerate bitsandbytes -qq
  1. huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
  2. To disable this warning, you can either:
  3. - Avoid using `tokenizers` before the fork if possible
  4. - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
  1. import numpy as np
  2. from datasets import load_dataset
  3. from peft import AutoPeftModelForCausalLM
  4. from qdrant_client import QdrantClient
  5. from qdrant_client.models import PointStruct, VectorParams, Distance
  6. from transformers import AutoTokenizer
  7. from fastembed import TextEmbedding
  1. hf_token = "<your_hf_token_here>" # Get your token from https://huggingface.co/settings/token, needed for Gemma weights

Setting Up

We’ll download the dataset, our LLM model weights and embedding model weights next

  1. embedding_model = "sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
  2. model_id = "Telugu-LLM-Labs/Indic-gemma-2b-finetuned-sft-Navarasa"
  1. ds = load_dataset("nirantk/chaii-hindi-and-tamil-question-answering", split="train")
  1. ds

This dataset has questions and contexts which have corresponding answers. The answers must be found by the LLM. This is an extractive Question Answering problem.

In order to do this, we’ll setup an embedding model from FastEmbed. And then add it to Qdrant in memory mode, which is powered by Numpy.

  1. embedding_model = TextEmbedding(model_name=embedding_model)

We’ll use the 7B model here, the 2B model isn’t great and was suffering from reading comprehension challenges.

Downloading the Navarasa LLM

We’ll download the Navarasa LLM from TeluguLLM-Labs. This is a 7B model with PEFT.

  1. model = AutoPeftModelForCausalLM.from_pretrained(
  2. model_id,
  3. load_in_4bit=False,
  4. token=hf_token,
  5. )
  6. tokenizer = AutoTokenizer.from_pretrained(model_id)

Embed the Context into Vectors

  1. questions, contexts = list(ds["question"]), list(ds["context"])
  1. context_embeddings: list[np.ndarray] = list(
  2. embedding_model.embed(contexts)
  3. ) # Note the list() call - this is a generator
  1. len(context_embeddings[0])
  1. def embed_text(text: str) -&gt; np.array:
  2. return list(embedding_model.embed(text))[0]
  1. context_points = [
  2. PointStruct(id=idx, vector=emb, payload={"text": text})
  3. for idx, (emb, text) in enumerate(zip(context_embeddings, contexts))
  4. ]
  1. len(context_points[0].vector)

Insert into Qdrant

  1. search_client = QdrantClient(":memory:")
  2. search_client.create_collection(
  3. collection_name="hindi_tamil_contexts",
  4. vectors_config=VectorParams(size=len(context_points[0].vector), distance=Distance.COSINE),
  5. )
  6. search_client.upsert(collection_name="hindi_tamil_contexts", points=context_points)

Selecting a Question

I’ve randomly selected a question here, with a specific and we then find the answer to it. We have the correct answer for it too — so we can compare the two when you run the code.

  1. idx = 997
  2. question = questions[idx]
  3. print(question)
  4. search_context = search_client.search(
  5. query_vector=embed_text(question), collection_name="hindi_tamil_contexts", limit=2
  6. )
  1. search_context_text = search_context[0].payload["text"]
  2. len(search_context_text)

Running the Model with a Question & Context

  1. input_prompt = """
  2. Answer the following question based on the context given after it in the same language as the question:
  3. ### Question:
  4. {}
  5. ### Context:
  6. {}
  7. ### Answer:
  8. {}"""
  9. input_text = input_prompt.format(
  10. questions[idx], # question
  11. search_context_text[:2000], # context
  12. "", # output - leave this blank for generation!
  13. )
  14. inputs = tokenizer([input_text], return_tensors="pt")
  15. outputs = model.generate(**inputs, max_new_tokens=50, use_cache=True)
  16. response = tokenizer.batch_decode(outputs)[0]
  1. response.split(sep="### Answer:")[-1].strip("<eos>").strip()
  1. ds[idx]["answer_text"]