LLM KRL English to Tamil Version 1 - Bert model used

  In Kaggle Notebook Published Here   import pandas as pd import torch from sklearn.model_selection import train_test_split, KFold from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, mean_squared_error, roc_auc_score, precision_recall_curve, auc, roc_curve from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments, EarlyStoppingCallback import numpy as np import matplotlib.pyplot as plt import random # Expand dataset with 100 samples by duplicating and adding slight variations sample_data = {     "text_english": [         "I am very happy to meet you", "I am disappointed with your work",         "You have done an excellent job, well done", "This is not good, I expected better",         "Thank you very much for your support", "I don't like your attitude",         "I'm grateful for your guidance", "Your work lacks quality",    

LLM KRL Model Sample usecase Google Gemma English to Tamil translation with LLM Respect Language models LLM_KRL models


​​To develop a multi-level contextual classification model for English-to-Tamil sentiment analysis, incorporating Google's Gemma models can enhance performance, especially for Tamil language processing.​​ Here's a structured approach:

1. Data Preparation:

Dataset Creation: ​​Compile a dataset containing English sentences, their Tamil translations, sentiment labels (positive/negative), and, for positive sentiments, an additional label indicating respect.​​

Example Data Structure:

​​| English Text                              | Tamil Translation                              | Sentiment | Respect (if Positive) | |-------------------------------------------|------------------------------------------------|-----------|-----------------------| | I am very happy to meet you               | உங்களை சந்திப்பதில் à®®ிகவுà®®் மகிà®´்ச்சி           | Positive  | Respect               | | I am disappointed with your work          | உங்கள் வேலைக்கு நான் வருத்தப்படுகிà®±ேன்          | Negative  |                       | | You have done an excellent job, well done | நீà®™்கள் சிறந்த வேலை செய்தீà®°்கள், நல்லது         | Positive  | Respect               | | This is not good, I expected better       | இது நல்லதல்ல, நான் நல்லவை எதிà®°்பாà®°்த்தேன்      | Negative  |                       |​​


2. Model Architecture:

Stage 1: ​​Sentiment Classification (Positive/Negative) using Gemma.​​

Stage 2: ​​For Positive sentiments, classify as Respect/Not Respect.​​


3. Implementation Steps:

import pandas as pd from sklearn.model_selection import train_test_split from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments import torch

Sample dataset

data = { 'text_english': [ "I am very happy to meet you", "I am disappointed with your work", "You have done an excellent job, well done", "This is not good, I expected better", "Thank you very much for your support", "I don't like your attitude", "I'm grateful for your guidance", "Your work lacks quality", "Well done, you've made us proud", "I appreciate your effort", "You are an inspiration", "I regret working with you", ], 'text_tamil': [ "உங்களை சந்திப்பதில் à®®ிகவுà®®் மகிà®´்ச்சி", "உங்கள் வேலைக்கு நான் வருத்தப்படுகிà®±ேன்", "நீà®™்கள் சிறந்த வேலை செய்தீà®°்கள், நல்லது", "இது நல்லதல்ல, நான் நல்லவை எதிà®°்பாà®°்த்தேன்", "உங்கள் உதவிக்காக à®®ிகவுà®®் நன்à®±ி", "உங்கள் அணுகுà®®ுà®±ை எனக்கு பிடிக்கவில்லை", "உங்கள் வழிகாட்டலுக்கு நான் நன்à®±ி கூà®±ுகிà®±ேன்", "உங்கள் வேலை தரம் குà®±ைவாக உள்ளது", "நல்ல செய்தி, நீà®™்கள் எங்களை பெà®°ுà®®ைப்படுத்தினீà®°்கள்", "உங்கள் à®®ுயற்சியை நான் பாà®°ாட்டுகிà®±ேன்", "நீà®™்கள் à®’à®°ு பேரனுபவம்", "உங்களுடன் பணியாà®±்à®±ியது வருத்தமாக உள்ளது", ], 'sentiment': [ 'positive', 'negative', 'positive', 'negative', 'positive', 'negative', 'positive', 'negative', 'positive', 'positive', 'positive', 'negative' ], 'respect': [ 'respect', None, 'respect', None, 'respect', None, 'respect', None, 'respect', 'respect', 'respect', None ] }

Convert data to DataFrame

df = pd.DataFrame(data)

Split the data

train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

Initialize tokenizer and model

tokenizer = AutoTokenizer.from_pretrained('google/gemma-2b') model = AutoModelForSequenceClassification.from_pretrained('google/gemma-2b', num_labels=2)

Encoding function

def encode_data(texts, sentiments): inputs = tokenizer(texts.tolist(), return_tensors="pt", padding=True, truncation=True) labels = torch.tensor([1 if sentiment == "positive" else 0 for sentiment in sentiments]) return inputs, labels

Encode training and testing data

train_texts, train_labels = encode_data(train_df['text_tamil'], train_df['sentiment']) test_texts, test_labels = encode_data(test_df['text_tamil'], test_df['sentiment'])

Training arguments

training_args = TrainingArguments( output_dir='./results', evaluation_strategy="epoch", num_train_epochs=3, per_device_train_batch_size=4, per_device_eval_batch_size=4, )

Trainer

trainer = Trainer( model=model, args=training_args, train_dataset=train_labels, eval_dataset=test_labels )

Train the model

trainer.train()

Function for multi-level classification

def classify_text(text, model, tokenizer): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) probs = torch.nn.functional.softmax(outputs.logits, dim=-1) sentiment = 'positive' if torch.argmax(probs) == 1 else 'negative'

respect = None
if sentiment == 'positive':
    # Placeholder for respect detection logic
    respect_prob = random.choice([0.8, 0.2])
    respect = 'respect' if respect_prob > 0.5 else 'not respect'

return sentiment, respect

Example classification

for text in test_df['text_tamil'].tolist(): sentiment, respect = classify_text(text, model, tokenizer) print(f"Text: {text} | Sentiment: {sentiment} | Respect: {respect}")



Comments

Popular posts from this blog

"How to maintain or retain tabs in same tab after button click events or postback?" using JQuery in ASP.NET C#

Login and Registration forms in C# windows application with Back end Microsoft SQL Server for data access

JSP and Servlet Form Submission without page refresh show results on the same page using Jquery AJAX