Skip to main content

Guidewire Self Managed version H2 Tables ingest into Databricks Delta Tables

Prerequisite 

1. In All Guidewire xCenters import sample data

2. Create account in Databricks Community edition, In this edition Databricks is Free



Step 1: Query H2 tables

'' failed to upload. Invalid response: RpcError




Step 2: Save Resultsets into CSV files and rename prefix with clm_



Step 3: Upload all these CSV into Databricks Filestore




Click Create Table then Below pop up will open and upload all csv files






Step 4:  Check it in Catalog--> FileStores --> All Clm_ prefix files are available !!



Step 5: Ingested CLM_ csv prefix files now it will be created as delta tables in Databricks as well as migrate to new guidewire raw database

PySpark Code has below 

import os
from pyspark.sql import SparkSession
from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col

# Initialize Spark session (usually pre-initialized in Databricks)
spark = SparkSession.builder.appName("CreateTablesFromCSV").getOrCreate()

# Define the base path for FileStore
base_path = "dbfs:/FileStore/"

# Define file prefixes to look for
file_prefixes = ["clm_"]

# Iterate over file prefixes and create tables based on file names
for prefix in file_prefixes:
    # List all CSV files that start with the given prefix in the FileStore
    csv_files = dbutils.fs.ls(base_path)
   
    # Filter for CSV files that match the prefix
    prefix_files = [f.path for f in csv_files if f.name.startswith(prefix) and f.name.endswith(".csv")]
   
    # Load each CSV file and create a table
    for file_path in prefix_files:
        # Extract the table name from the file name (remove ".csv")
        table_name = os.path.basename(file_path).replace(".csv", "")
       
        # Load the CSV file into a DataFrame
        df = spark.read.format("csv").option("header", "true").load(file_path)
       
        # Change the ID column data type from string to Integer, if it exists
        if 'ID' in df.columns:
            df = df.withColumn("ID", col("ID").cast(IntegerType()))
       
        # Create a table with the same name as the file (without .csv extension)
        df.write.format("delta").mode("overwrite").saveAsTable(table_name)
       
        # Print success message
        print(f"Table '{table_name}' created successfully from file '{file_path}'.")

 Log info 

Table 'clm_cc_claim_h2' created successfully from file 'dbfs:/FileStore/clm_cc_claim_h2.csv'. Table 'clm_cc_contact_h2' created successfully from file 'dbfs:/FileStore/clm_cc_contact_h2.csv'. Table 'clm_cc_credential_h2' created successfully from file 'dbfs:/FileStore/clm_cc_credential_h2.csv'. Table 'clm_cc_group_h2' created successfully from file 'dbfs:/FileStore/clm_cc_group_h2.csv'. Table 'clm_cc_user_h2' created successfully from file 'dbfs:/FileStore/clm_cc_user_h2.csv'. Table 'clm_cctl_claimstate_h2' created successfully from file 'dbfs:/FileStore/clm_cctl_claimstate_h2.csv'. Table 'clm_cctl_grouptype_h2' created successfully from file 'dbfs:/FileStore/clm_cctl_grouptype_h2.csv'. 

Table 'clm_cctl_lobcode_h2' created successfully from file 'dbfs:/FileStore/clm_cctl_lobcode_h2.csv'.



 

Now these tables are available in default database, let's create new database related to guidewire and then move all these clm prefix table into guidewire claim raw table

PySpark Code below for this

from pyspark.sql import SparkSession

# Initialize Spark session (usually pre-initialized in Databricks)
spark = SparkSession.builder.appName("MoveTablesToNewDatabase").getOrCreate()

# Define the new database name
new_database_name = "gw_claims_raw_db"

# Step 1: Create a new database (if it doesn't exist)
spark.sql(f"CREATE DATABASE IF NOT EXISTS {new_database_name}")

# Step 2: List all tables in the default database
default_tables = [row.name for row in spark.catalog.listTables('default')]

print(f"Tables found in default database: {default_tables}")

# Step 3: Copy each table from default database to the new database
for table_name in default_tables:
    # Read the table from the default database
    df = spark.table(f"default.{table_name}")
   
    # Write the table to the new database
    df.write.format("delta").mode("overwrite").saveAsTable(f"{new_database_name}.{table_name}")
   
    # Print success message
    print(f"Table '{table_name}' moved successfully from 'default' to '{new_database_name}'.")


Log message has below


Tables found in default database: ['clm_cc_claim_h2', 'clm_cc_contact_h2', 'clm_cc_credential_h2', 'clm_cc_group_h2', 'clm_cc_user_h2', 'clm_cctl_claimstate_h2', 'clm_cctl_grouptype_h2', 'clm_cctl_lobcode_h2'] Table 'clm_cc_claim_h2' moved successfully from 'default' to 'gw_claims_raw_db'. Table 'clm_cc_contact_h2' moved successfully from 'default' to 'gw_claims_raw_db'. Table 'clm_cc_credential_h2' moved successfully from 'default' to 'gw_claims_raw_db'. Table 'clm_cc_group_h2' moved successfully from 'default' to 'gw_claims_raw_db'. Table 'clm_cc_user_h2' moved successfully from 'default' to 'gw_claims_raw_db'. Table 'clm_cctl_claimstate_h2' moved successfully from 'default' to 'gw_claims_raw_db'. Table 'clm_cctl_grouptype_h2' moved successfully from 'default' to 'gw_claims_raw_db'. 

Table 'clm_cctl_lobcode_h2' moved successfully from 'default' to 'gw_claims_raw_db'.



These above Pyspark published in the below link for reusability purpose

GW Tables Created and Moved to Guidewire Raw Database v1.0 Published 

it will be look like this below 


This is done with Guidewire Self Managed version 

from H2 to CSV tables

from CSV to delta tables in databricks notebook 

Finally


Now create new notebook and then exploring guidewire claims raw tables via simple SQL Queries

In Next Blog Post,

will see more Databricks SQL queries for Guidewire Claims Raw Tables and Queries


Additional Notes: It is Guidewire Self-managed version and In Guidewire Cloud we have s3 bucket parquet files and other options like OSS to convert Parquet to csv files !

For enterprise firm versions on Guidewire Cloud it has capabilities like Data Engineer team to load these type csv to delta tables it will be automated where as here we are doing manual process to create csv files

 Data science team able to work on ML experiments like databricks environment




 



Comments

Popular posts from this blog

"How to maintain or retain tabs in same tab after button click events or postback?" using JQuery in ASP.NET C#

In this post I'll share an details about " How to maintain or retain tabs in same tab after button click events or postback? " Step 1: you need to download Jquery and JQueryUI Javascript libraries from this site http://jqueryui.com/ Step 2: As usually you can create ASP.NET website from Visual Studio IDE and add Jquery and JqueryUI plugins in the header section of aspx page. Step 3: Add HiddenField control inside aspx page which is very useful to retain tab in same page Step 4: Use the HiddenField ID in Jquery code to indicate that CurrentTab Index Step 5: In code Behind, using Enumerations concept give the tab index values as user defined variable  Step 6: Use the Enum values in every Button click events on different tabs to check that tab could be retained in the same tab Further, Here I'll give the code details and snap shot pictures, 1. Default.aspx: Design Page First Second Third ...

Login and Registration forms in C# windows application with Back end Microsoft SQL Server for data access

In this article, I'm gonna share about how to make login and register form with MS SQL database; 1. Flow Chart Logic 2. Normal Features 3. Form Designs Login Form Design Sign in Form Design Password Retrieve Form 4. Database Design and SQL queries and Stored Procedure Create new Database as "schooldata" create table registerdata (  ID int identity,  Username nvarchar(100),  Password nvarchar(100),  Fullname  nvarchar(100),  MobileNO nvarchar(100),  EmailID nvarchar(100)  ) select * from registerdata create procedure regis (  @Username as nvarchar(100),  @Password as nvarchar(100),  @Fullname as nvarchar(100),  @MobileNO as nvarchar(100),  @EmailID as nvarchar(100)  ) as begin insert into registerdata (Username, Password, Fullname, MobileNO,EmailID) values (@Username, @Password, @Fullname, @MobileNO, @EmailID) ...

Guidewire Related Interview Question and answers part 1

common Guidewire questions and answers 20 Guidewire BC Q&A Top 100 Guidewire Interview FAQ Guidewire Claimcenter 20 Interview Questions Guidewire Rating concepts