DATA ANALYTICS

Property and Casualty Insurance Industry

 

Identifies drivers of insurance costs, evaluate the economics of customers’ behavior and estimate the expected losses for segments of risks using advanced statistical and analytical techniques on large data sets.


DESCRIPTION OF PROJECT

PROJECT:  "Comparing adjusters’ performance in the property and casualty insurance industry using a parametric and nonparametric analysis".
The purpose of this project is to compare adjusters’ performance. By performance, we mean how much the adjusters settle claims in average. More precisely, we test whether we can detect statistically significant differences in severities between adjusters. Claim adjusters investigate claims in order to evaluate the extent of the company’s liability. We assume that claims are randomly assigned to adjusters; an experienced adjuster who is assigned more difficult claims may appear to be a poor performer in a test of this type even if his or her performance is above average. The test was limited to Bodily injury adjusters. These adjusters handle only bodily injury (BI) claims. To reduce heterogeneity in claims assignments, we analyzed the data separately for each geographical area. We used 2 methods to evaluate the differences in severities between adjusters.  The first method focuses on the parametric analysis. After testing competing distributions, we made the assumption that the distribution of the severity data is lognormal. The lognormal seems the closest to the true distribution of our data.  We then transform the data from lognormal to normal in order to apply the test for difference in means. The second method is based on the nonparametric analysis. No distribution is assumed. The Wilcoxon rank sum non parametric test was used. This test is the non-parametric equivalent to the two sample t-test. A limitation of this analysis may lies in the fact that the claims may not be randomly assigned.

PRINCIPALS: Torna Omar Soro, PhD, Research Economist (Predictive Modeler):

COUNTRY: Massachusetts, USA

SPONSORS/COMPANY: Plymouth Rock Assurance Corporation, Inc, (Boston, MA. USA), www.prac.com

DATES: 7/2006–12/2008


DESCRIPTION OF PROJECT

PROJECT: Incurred but Not Reported (IBNR ALAE) Estimation: Actuarial Model versus Econometric Model using dynamic Panel data analysis.

The purpose of this memo is to compare the IBNR estimate using the actuarial method and the econometric method. IBNR stands for claim that incurred but the total cost is unknown. ALAE stand for allocated loss adjustment expenses. The data are stored in a triangle structure. The actuarial method uses a set of loss development factors based on a set of weighted average losses. More weights are given to recent accident years. A factor is computed for each quarter. The econometric method estimates IBNR ALAE of BIL (Bodily Injury) coverage using the past values of ALAE and the age of the accident. We use a dynamic panel data or multilevel analysis to build the model. We find that:

  • The Econometric method provides a better estimate of Paid ALAE compared to the actuarial method.
  • The IBNR-BIL estimated using the econometric method is less than the IBNR estimated using the actuarial method

 

PRINCIPALS: Torna Omar Soro, PhD, Research Economist (Predictive Modeler):

COUNTRY: Massachusetts, USA

SPONSORS/COMPANY: Plymouth Rock Assurance Corporation, Inc, (Boston, MA. USA), www.prac.com

DATES: 7/2006–12/2008


DESCRIPTION OF PROJECT

PROJECT: "Modeling Lifetime value of a Policy Holder using duration (survival) Analysis: An application to the Property and Casualty Insurance Industry."
Casualty and property Insurance firms are increasingly interesting in viewing policy holders in terms of their lifetime value which is the net present value of policy holders’ calculated profit over a certain number of months or years. Four elements are needed in order to estimate the lifetime value: the monthly margin or profit, the duration, the discounted rate and a series of policy holder survival probabilities. This project focuses on the computation of the series of probabilities using a parametric model (Weibull) and a nonparametric model (Cox model). The output also allows us to see how the characteristics ( credit score, location, age, sdip, etc….) of the policy holder affect the retention.

Data structure (model data and validation data)
We divide the data in two datasets: The model dataset and validation dataset. The model dataset contains 90% of the data while the validation dataset contains only 10%.  
Estimation using parametric method (Weibull model)
We assumed that the distribution of the survival time follow a weibull distribution. This allows us to estimate the survival and the hazard as a function of the duration and the characteristics of the policy holder.
Estimation using Semiparametric model (Cox model)
The Cox model assumed no distribution of the survival time. We also assumed the presence of proportional hazard. This means that the hazard function ratio is constant over time. The Cox model provides estimates of the hazard function.
Model Validation (Hosmer and Lemeshow goodness of fit for survival model)
We apply Cox model to the validation dataset using Hosmer and Lemeshow goodness of fit test.  The test is based on partitions of the survival probability space. We define groups based on the estimated survival probabilities at a certain point in time. The chosen time is arbitrary. The Z score is obtained by dividing the difference between the observed and expected counts by the square root of the expected. Using 1% level of significance, none of the deciles have a significant difference between the observed and expected counts. Therefore there is agreement between observed and expected number of events within each of the 10 deciles at risk.
Test for Proportionality Assumption
In order to perform the Proportional hazard test, we insert an interaction of each variable with the log of survival time in our model. We then run the model. The test shows a rejection of the proportional hazard assumption. This suggests that the model may not have proportional hazard in some of the covariates. Therefore we include in our model the interactions with significant coefficients. We then compare the model to the Cox model without interactions. We found that the model with interactions exhibit a lower AIC (Akaike information criteria) and lower BIC (Bayesian information criteria), compared to the model without interaction. This means that the model with interaction is better than the model without interaction.

 

PRINCIPALS: Torna Omar Soro, PhD, Research Economist (Predictive Modeler)

COUNTRY: Massachusetts, USA

SPONSORS/COMPANY: Plymouth Rock Assurance Corporation, Inc, (Boston, MA. USA), www.prac.com

DATES: 7/2006–12/2008


DESCRIPTION OF PROJECT

PROJECT: "Estimation of claims severity in the property and casualty insurance industry: Empirical analysis using generalized linear model (GLM-Gamma Distribution)."
The purpose of this project is to estimate the expected loss of a claim given the characteristics of the policy holder. The estimation is condition on the fact that the policy holder has a claim. We have fewer observations when estimating the severity compared to claim frequency. We have two competing distribution to estimate severity: the gamma distribution and the lognormal distribution.  The results of these estimations are combined with the claim frequency model to produce the expected loss of a policy holder.

PRINCIPALS: Torna Omar Soro, PhD, Research Economist (Predictive Modeler)

COUNTRY: Massachusetts, USA

SPONSORS/COMPANY: Plymouth Rock Assurance Corporation, Inc, (Boston, MA. USA), www.prac.com

DATES: 7/2006–12/2008


DESCRIPTION OF PROJECT

PROJECT: "Estimation of claims frequency in the property and casualty insurance industry: Empirical analysis using generalized linear model (GLM-Poisson model-Negative Binomial-Zero inflated Poisson model)".
The purpose of this project is to compute the likelihood of a policy holder to have a claim given his characteristics (credit score, sex, age of the primary driver, number of driver, number of vehicles, location of the vehicle, sdip etc…..). The results of this project, combined with the claim severity project 2 are used to compute the pure premium or expected loss generated by a policy holder. The number of claims can be model as a count model. The natural stochastic model for counts is a Poisson point process for the occurrence of the event of interest. This implies the use of a Poisson distribution for the number of claims or frequency of accident. We perform the estimation using a Poisson model. We found for instance that, drivers with low credit score have a high likelihood of having a claim. Older drivers tend to have low claim frequency while policy holders living in poor neighborhood tend to have high frequency claims.

Limitations of the Poisson model (Excess Zeros and Overdispersion)
We take the analysis further by investigating the limitation of the Poisson model. The first limitation is related to the excess zeros in real data and the overdispersion issue.
In many applications, a Poisson distribution density predicts the probability of a zero count to be considerably less than is actually observed in the sample. This is termed the excess Zeros problem, as there are more zero in the data than the Poisson predicts. Another deficiency of the Poisson model is that for count data the variance usually exceeds the mean (Var(y|x) > E(y|x)), a feature called overdispersion. The Poisson model instead implies equality of the variance and the mean, a property called equidispersion. In order to overcome these limitations of the Poisson model, we develop 3 additional models: Negative Binomial model, Zero inflated Poisson (ZIP) and the Hurdle or two parts model. Zero-inflated Poisson model is a modified count model. This supplements a count density f2(.) with a binary process with density f1(.). The Hurdle or two parts model uses two data-generating processes. The first data generating process focuses on the Zeros (no claims) and the second data-generating process focuses on the positive claims (greater than zero). Its main drawback is that its coefficients are twice the coefficients of ZIP and Poisson model.

Comparison of Poisson, Zero-inflated Poisson and Hurdle model using Vuong test  
We used the non nested test of Vuong (Vuong-test) to compare the Poisson, the Zero inflated Poisson and the Hurdle model. We find that the Zero-inflated Poisson (ZIP) outperforms the Poisson model. This means that we have excess Zeros problem, as there are more zero in the data than the Poisson predicts. I also found that the ZIP outperforms the Hurdle model which outperforms the Poisson model. ZIP, Poisson and Hurdle outperform the negative binomial model.
 ZIP model  > Hurdle  model >Poisson Model > Negative Binomial model. So the ZIP appears to be the best model to handle claims frequency. Its limitation may lie in the mathematical complexity.

 

PRINCIPALS: Torna Omar Soro, PhD, Research Economist (Predictive Modeler)

COUNTRY: Massachusetts, USA

SPONSORS/COMPANY: Plymouth Rock Assurance Corporation, Inc, (Boston, MA. USA), www.prac.com

DATES: 7/2006–12/2008


DESCRIPTION OF PROJECT

PROJECT: "Growth model, Flat rate cancellation and Midterm rate cancellation model estimation."
The purpose of this project is to estimate the growth rate, flat cancellation and midterm cancellation of agents. An insurance agent is an intermediate between the drivers and the insurance firm. The dataset used in this analysis is composed of a survey on agents’ characteristics and three dependent variables (growth, flat cancellation and midterm cancellation). We assumed that these three dependent variables follow a log normal distribution. The log normal seems the closest to the nonparametric distribution (kernel).
The dependent variable (growth) is constructed by dividing new business by prior year total policy. We have flat cancellation when the policy holder does not renew his policy at the policy expiration date. Flat cancellation rate is obtained by dividing the total flat cancellation by prior year total business.
We have midterm cancellation when the policy holder cancels his policy before the expiration date. The midterm cancellation rate is computed by dividing the total midterm cancellation by the total expired policies.
The results of these models were used to compute the expected growth rate, flat rate cancellation and midterm cancellation of newly appointed agents.

PRINCIPALS: Torna Omar Soro, PhD, Research Economist (Predictive Modeler)

COUNTRIES: Massachusetts, USA

SPONSORS/COMPANY: Plymouth Rock Assurance Corporation, Inc, (Boston, MA. USA), www.prac.com

DATES: 7/2006–12/2008