Skip to main content

LLM KRL Model Sample usecase Google Gemma English to Tamil translation with LLM Respect Language models LLM_KRL models


​​To develop a multi-level contextual classification model for English-to-Tamil sentiment analysis, incorporating Google's Gemma models can enhance performance, especially for Tamil language processing.​​ Here's a structured approach:

1. Data Preparation:

Dataset Creation: ​​Compile a dataset containing English sentences, their Tamil translations, sentiment labels (positive/negative), and, for positive sentiments, an additional label indicating respect.​​

Example Data Structure:

​​| English Text                              | Tamil Translation                              | Sentiment | Respect (if Positive) | |-------------------------------------------|------------------------------------------------|-----------|-----------------------| | I am very happy to meet you               | உங்களை சந்திப்பதில் à®®ிகவுà®®் மகிà®´்ச்சி           | Positive  | Respect               | | I am disappointed with your work          | உங்கள் வேலைக்கு நான் வருத்தப்படுகிà®±ேன்          | Negative  |                       | | You have done an excellent job, well done | நீà®™்கள் சிறந்த வேலை செய்தீà®°்கள், நல்லது         | Positive  | Respect               | | This is not good, I expected better       | இது நல்லதல்ல, நான் நல்லவை எதிà®°்பாà®°்த்தேன்      | Negative  |                       |​​


2. Model Architecture:

Stage 1: ​​Sentiment Classification (Positive/Negative) using Gemma.​​

Stage 2: ​​For Positive sentiments, classify as Respect/Not Respect.​​


3. Implementation Steps:

import pandas as pd from sklearn.model_selection import train_test_split from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments import torch

Sample dataset

data = { 'text_english': [ "I am very happy to meet you", "I am disappointed with your work", "You have done an excellent job, well done", "This is not good, I expected better", "Thank you very much for your support", "I don't like your attitude", "I'm grateful for your guidance", "Your work lacks quality", "Well done, you've made us proud", "I appreciate your effort", "You are an inspiration", "I regret working with you", ], 'text_tamil': [ "உங்களை சந்திப்பதில் à®®ிகவுà®®் மகிà®´்ச்சி", "உங்கள் வேலைக்கு நான் வருத்தப்படுகிà®±ேன்", "நீà®™்கள் சிறந்த வேலை செய்தீà®°்கள், நல்லது", "இது நல்லதல்ல, நான் நல்லவை எதிà®°்பாà®°்த்தேன்", "உங்கள் உதவிக்காக à®®ிகவுà®®் நன்à®±ி", "உங்கள் அணுகுà®®ுà®±ை எனக்கு பிடிக்கவில்லை", "உங்கள் வழிகாட்டலுக்கு நான் நன்à®±ி கூà®±ுகிà®±ேன்", "உங்கள் வேலை தரம் குà®±ைவாக உள்ளது", "நல்ல செய்தி, நீà®™்கள் எங்களை பெà®°ுà®®ைப்படுத்தினீà®°்கள்", "உங்கள் à®®ுயற்சியை நான் பாà®°ாட்டுகிà®±ேன்", "நீà®™்கள் à®’à®°ு பேரனுபவம்", "உங்களுடன் பணியாà®±்à®±ியது வருத்தமாக உள்ளது", ], 'sentiment': [ 'positive', 'negative', 'positive', 'negative', 'positive', 'negative', 'positive', 'negative', 'positive', 'positive', 'positive', 'negative' ], 'respect': [ 'respect', None, 'respect', None, 'respect', None, 'respect', None, 'respect', 'respect', 'respect', None ] }

Convert data to DataFrame

df = pd.DataFrame(data)

Split the data

train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

Initialize tokenizer and model

tokenizer = AutoTokenizer.from_pretrained('google/gemma-2b') model = AutoModelForSequenceClassification.from_pretrained('google/gemma-2b', num_labels=2)

Encoding function

def encode_data(texts, sentiments): inputs = tokenizer(texts.tolist(), return_tensors="pt", padding=True, truncation=True) labels = torch.tensor([1 if sentiment == "positive" else 0 for sentiment in sentiments]) return inputs, labels

Encode training and testing data

train_texts, train_labels = encode_data(train_df['text_tamil'], train_df['sentiment']) test_texts, test_labels = encode_data(test_df['text_tamil'], test_df['sentiment'])

Training arguments

training_args = TrainingArguments( output_dir='./results', evaluation_strategy="epoch", num_train_epochs=3, per_device_train_batch_size=4, per_device_eval_batch_size=4, )

Trainer

trainer = Trainer( model=model, args=training_args, train_dataset=train_labels, eval_dataset=test_labels )

Train the model

trainer.train()

Function for multi-level classification

def classify_text(text, model, tokenizer): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) probs = torch.nn.functional.softmax(outputs.logits, dim=-1) sentiment = 'positive' if torch.argmax(probs) == 1 else 'negative'

respect = None
if sentiment == 'positive':
    # Placeholder for respect detection logic
    respect_prob = random.choice([0.8, 0.2])
    respect = 'respect' if respect_prob > 0.5 else 'not respect'

return sentiment, respect

Example classification

for text in test_df['text_tamil'].tolist(): sentiment, respect = classify_text(text, model, tokenizer) print(f"Text: {text} | Sentiment: {sentiment} | Respect: {respect}")



Comments

Popular posts from this blog

Java Swing MySql JDBC: insert data into database

Program import javax.swing.*; import java.awt.*; import java.awt.event.*; import java.sql.*; public class insertswing implements ActionListener {   JFrame fr;JPanel po;   JLabel l1,l2,main;   JTextField tf1,tf2;   GridBagConstraints gbc;   GridBagLayout go;   JButton ok,exit; public insertswing(){ fr=new JFrame("New User Data "); Font f=new Font("Verdana",Font.BOLD,24); po=new JPanel(); fr.getContentPane().add(po); fr.setVisible(true); fr.setSize(1024,768); fr.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); po.setBackground(Color.WHITE); go=new GridBagLayout(); gbc=new GridBagConstraints(); po.setLayout(go); main=new JLabel("Enter User Details "); main.setFont(f); l1=new JLabel("Name  :");tf1=new JTextField(20); l2=new JLabel("User Name  :");tf2=new JTextField(20); ok=new JButton("Accept"); exit=new JButton("Exit"); gbc.anchor=GridBagConstraints.NORTH;gbc.gridx=5;gbc.gridy=0; go.s...

JSP and Servlet Form Submission without page refresh show results on the same page using Jquery AJAX

Code Snippet HTML Form  <form id='ajaxform' name='ajaxform' action='ajaxformexample' method='post'>  First Name: <input type='text' id='firstname' name='firstname' size='30' required/><br/>  Last Name: <input type='text' id='lastname' name='lastname' size='30'required/><br/>  Email:  <input type='email' id='emailid' name='emailid' size='30'required/><br/>  Password:  <input type='password' id='pwd' name='pwd' size='30'required/><br/>  <input type='Submit'/>   <div id='content'> </div> </form> the above HTML Form uses post method and url servlet redirect to " ajaxformexample " Javascript Code  var form = $('#ajaxform'); // id of form tag  form.submit(function () {  $.ajax({  ...

Guidewire Reinstatement and Rewrite

Guidewire Reinstatement, Rewrite Mid Term, Rewrite Full Term, and Rewrite New Term In Guidewire PolicyCenter, different types of policy transactions allow users to modify, renew, reinstate, or rewrite policies under various circumstances. Here̢۪s an explanation of Reinstatement, Rewrite Mid Term, Rewrite Full Term, and Rewrite New Term, along with their similarities, differences, and example scenarios. 1. Reinstatement Definition: - Reinstatement is a process that brings a canceled policy back into force. This is typically done after a policy has been canceled due to non-payment or other reasons, and the insurer agrees to reinstate the policy, often after the insured has met certain conditions (e.g., paying outstanding premiums). Scenario Example: - A policyholder misses their premium payment, and the policy is canceled. After paying the overdue amount, the insurer reinstates the policy without any changes to the original policy terms and conditions. Key Points: - The poli...