LLM KRL Model Sample usecase Google Gemma English to Tamil translation with LLM Respect Language models LLM

LLM KRL Model Sample usecase Google Gemma English to Tamil translation with LLM Respect Language models LLM_KRL models

To develop a multi-level contextual classification model for English-to-Tamil sentiment analysis, incorporating Google's Gemma models can enhance performance, especially for Tamil language processing. Here's a structured approach:

1. Data Preparation:

Dataset Creation: Compile a dataset containing English sentences, their Tamil translations, sentiment labels (positive/negative), and, for positive sentiments, an additional label indicating respect.

Example Data Structure:

| English Text | Tamil Translation | Sentiment | Respect (if Positive) | |-------------------------------------------|------------------------------------------------|-----------|-----------------------| | I am very happy to meet you | உங்களை சந்திப்பதில் மிகவும் மகிழ்ச்சி | Positive | Respect | | I am disappointed with your work | உங்கள் வேலைக்கு நான் வருத்தப்படுகிறேன் | Negative | | | You have done an excellent job, well done | நீங்கள் சிறந்த வேலை செய்தீர்கள், நல்லது | Positive | Respect | | This is not good, I expected better | இது நல்லதல்ல, நான் நல்லவை எதிர்பார்த்தேன் | Negative | |

2. Model Architecture:

Stage 1: Sentiment Classification (Positive/Negative) using Gemma.

Stage 2: For Positive sentiments, classify as Respect/Not Respect.

3. Implementation Steps:

import pandas as pd from sklearn.model_selection import train_test_split from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments import torch

Sample dataset

data = { 'text_english': [ "I am very happy to meet you", "I am disappointed with your work", "You have done an excellent job, well done", "This is not good, I expected better", "Thank you very much for your support", "I don't like your attitude", "I'm grateful for your guidance", "Your work lacks quality", "Well done, you've made us proud", "I appreciate your effort", "You are an inspiration", "I regret working with you", ], 'text_tamil': [ "உங்களை சந்திப்பதில் மிகவும் மகிழ்ச்சி", "உங்கள் வேலைக்கு நான் வருத்தப்படுகிறேன்", "நீங்கள் சிறந்த வேலை செய்தீர்கள், நல்லது", "இது நல்லதல்ல, நான் நல்லவை எதிர்பார்த்தேன்", "உங்கள் உதவிக்காக மிகவும் நன்றி", "உங்கள் அணுகுமுறை எனக்கு பிடிக்கவில்லை", "உங்கள் வழிகாட்டலுக்கு நான் நன்றி கூறுகிறேன்", "உங்கள் வேலை தரம் குறைவாக உள்ளது", "நல்ல செய்தி, நீங்கள் எங்களை பெருமைப்படுத்தினீர்கள்", "உங்கள் முயற்சியை நான் பாராட்டுகிறேன்", "நீங்கள் ஒரு பேரனுபவம்", "உங்களுடன் பணியாற்றியது வருத்தமாக உள்ளது", ], 'sentiment': [ 'positive', 'negative', 'positive', 'negative', 'positive', 'negative', 'positive', 'negative', 'positive', 'positive', 'positive', 'negative' ], 'respect': [ 'respect', None, 'respect', None, 'respect', None, 'respect', None, 'respect', 'respect', 'respect', None ] }

Convert data to DataFrame

df = pd.DataFrame(data)

Split the data

train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

Initialize tokenizer and model

tokenizer = AutoTokenizer.from_pretrained('google/gemma-2b') model = AutoModelForSequenceClassification.from_pretrained('google/gemma-2b', num_labels=2)

Encoding function

def encode_data(texts, sentiments): inputs = tokenizer(texts.tolist(), return_tensors="pt", padding=True, truncation=True) labels = torch.tensor([1 if sentiment == "positive" else 0 for sentiment in sentiments]) return inputs, labels

Encode training and testing data

train_texts, train_labels = encode_data(train_df['text_tamil'], train_df['sentiment']) test_texts, test_labels = encode_data(test_df['text_tamil'], test_df['sentiment'])

Training arguments

training_args = TrainingArguments( output_dir='./results', evaluation_strategy="epoch", num_train_epochs=3, per_device_train_batch_size=4, per_device_eval_batch_size=4, )

Trainer

trainer = Trainer( model=model, args=training_args, train_dataset=train_labels, eval_dataset=test_labels )

Train the model

trainer.train()

Function for multi-level classification

def classify_text(text, model, tokenizer): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) probs = torch.nn.functional.softmax(outputs.logits, dim=-1) sentiment = 'positive' if torch.argmax(probs) == 1 else 'negative'

respect = None

if sentiment == 'positive':

# Placeholder for respect detection logic

respect_prob = random.choice([0.8, 0.2])

respect = 'respect' if respect_prob > 0.5 else 'not respect'

return sentiment, respect

Example classification

for text in test_df['text_tamil'].tolist(): sentiment, respect = classify_text(text, model, tokenizer) print(f"Text: {text} | Sentiment: {sentiment} | Respect: {respect}")

Kumaran1987

Search This Blog

LLM KRL Model Sample usecase Google Gemma English to Tamil translation with LLM Respect Language models LLM_KRL models

Labels

Comments

Post a Comment

Popular posts from this blog

"How to maintain or retain tabs in same tab after button click events or postback?" using JQuery in ASP.NET C#

Login and Registration forms in C# windows application with Back end Microsoft SQL Server for data access

JSP and Servlet Form Submission without page refresh show results on the same page using Jquery AJAX