User:Bsharman
Risk prediction in life insurance industry using supervised learning algorithms
Presented By
Bharat Sharman, Dylan Li, Leonie Lu, Mingdao Li
Introduction
Risk assessment lies at the core of the Life Insurance Industry. It is extremely important for a Life Insurance Company to assess the risk of an application accurately in order to make sure that applications with an actual low risk are accepted and an actual high risk are rejected. Otherwise, individuals with an unacceptably high risk profile will be issued policies and when they pass away, the company will face large losses due to high insurance payouts. Such a situation is called ‘Adverse Selection’, where individuals who are most likely to suffer losses take insurance and those who are not likely to suffer losses do not and thus, the company suffers losses as a result.
Traditionally, the process of Underwriting (deciding whether or not to insure the life of an individual) has been done using Actuarial calculations. Actuaries group customers according to their estimated levels of risk determined from historical data. (Cummins J, 2013) However, these conventional techniques are time consuming and it is not uncommon to take a month to issue a policy. They are expensive as a lot of manual processes need to be executed.
Predictive Analysis has emerged as a useful technique to streamline the underwriting process to reduce the time of Policy issuance and to improve the accuracy of risk prediction. In this paper, the authors use data from Prudential Life Insurance company and investigate the most appropriate data extraction method and the most appropriate algorithm to assess risk.
Literature Review
Before a Life Insurance company issues a policy, it must execute a series of underwriting related tasks. (Mishr, 2016)These tasks involve gathering extensive information about the applicant. The insurer has to analyze the employment, medical, family and insurance histories of the applicant and factor all of them into a complicated series of calculations to determine the risk rating of the applicant. On basis of this risk rating, premiums are calculated. (Prince, 2016)
In a competitive marketplace, customers need policies to be issued quickly and long wait times can lead to them switch to other providers. (Chen, 2016). In addition, the costs of doing the data gathering and analysis can be expensive. The insurance company bears the expenses of the medical examinations and if a policy lapses, then the insurer has to bear the losses of all these costs. (J Carson, 2017). If the underwriting process uses Predictive Analytics, then the costs and time associated with many of these processes can be reduced via streamlining.
Methods and Techniques
In Figure 1, the process flow of the analytics approach has been depicted. These stages will now be described in the following sections.
Description of the Dataset
The data is obtained from the Kaggle competition hosted by the Prudential Life Insurance company. It has 59381 applications with 128 attributes. The attributes are continuous and discrete as well as categorical variables. The data attributes, their types and the description is shown in Table 1 below: