Deep learning: using long short-term memory algorithm in forecast insurance charges | Статья в сборнике международной научной конференции

Отправьте статью сегодня! Журнал выйдет 17 августа, печатный экземпляр отправим 21 августа.

Опубликовать статью в журнале

Авторы: ,

Рубрика: 1. Экономическая теория

Опубликовано в

VI международная научная конференция «Инновационная экономика» (Казань, июнь 2019)

Дата публикации: 04.06.2019

Статья просмотрена: 7 раз

Библиографическое описание:

Нгуен Х. К., Тран Т. Т. Deep learning: using long short-term memory algorithm in forecast insurance charges [Текст] // Инновационная экономика: материалы VI Междунар. науч. конф. (г. Казань, июнь 2019 г.). — Казань: Молодой ученый, 2019. — С. 1-9. — URL https://moluch.ru/conf/econ/archive/335/15120/ (дата обращения: 19.08.2019).



Forecasting insurance charges for new customers has always been a daunting task for insurance companies due to the limitations of information and computational complexity. Today, with the development of computer science, artificial intelligence and especially Deep Learning has brought new solutions ever before. With many neural units and artificial neural layers, Deep learning will be able to self-study and identify many complex issues so it is able to effectively handle the forecast of insurance charges to be paid to customers. This article focuses on introducing Long short-term memory (LSTM) algorithm in Deep Learning to forecast insurance charges through a set of actual data, through which readers can understand and apply algorithms into your own research or business issues.

Keywords: deep learning, forecast, classification, insurance charges, LSTM.

Life insurance is a potential and attractive market, but it is also very challenging and competitive for all insurance businesses. In order for insurance companies to operate effectively, the calculation of the correct insurance premium is extremely. However, it is very difficult to determine insurance fees because it must be based on scientific calculation bases to value products while ensuring customers' benefits and ensuring the profitability of insurance companies. At the same time, companies also have to calculate how to be able to pay insurance policies as well as to ensure the reserve level for other activities. Therefore, it is extremely important to accurately predict the insurance charges to pay insurance buyers, which is the most important for determining insurance premiums.

Today, with the development of computer science, artificial intelligence and especially Deep Learning has brought new solutions ever before. Deep Learning tries to simulate the bio-brain to help computers not only have the ability to process information like the human brain but also can manipulate on big data. With many neural units and artificial neural layers, Deep learning will be able to self-study and identify many complex issues so it is able to effectively handle the forecast of insurance charges to be paid to customers. Using Deep learning helps forecast insurance charges for customers easily and quickly, saving a lot of time, manpower and costs for insurers. This article studies the use of Long short-term memory (LSTM) algorithm in Recurrent Neural Network to forecast the insurance cost to be paid to customers, in Python programming language and programming library: Numpy, Pandas, Matplotlib, Scikit-learn, Tensor flow, Keras through a data processing reality.

Long short-term memory algorithm and the data set

Long short-term memory (LSTM)

Long short-term memory (LSTM) is an architectural form of Recurrent Neural Network (RNN) used in Deep Learning. Unlike standard feedforward neural networks, LSTM has feedback connections that make it a «general purpose computer». (wikipedia.org, 2019) LSTM can learn to do tasks that require memory about events that happened thousands or even millions of discrete time steps earlier. LSTM can not only process single data points (such as images), but also entire sequences of data (such as speech or video). (Sepp Hochreiter & Jürgen Schmidhuber, 1997)

A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate:

Fig. 1. A common LSTM unit component: xt: input vector to the LSTM unit; Ft: forget gate's activation vector (Sigmoid); It: input gate's activation vector (Sigmoid); Ot: output gate's activation vector (Sigmoid); ht: hidden state vector also known as output vector of the LSTM unit; ct: cell state vector [Hình 1: A LSTM unit with forget gate Nguồn: https://colah.github.io/posts/2015–08-Understanding-LSTMs/]

The Data set

«Sample Insurance Claim Prediction Dataset» is based on «Medical Cost Personal Datasets». The data contains information of the people and based on this information the insurance company calculates the insurance charges. The goal is to forecast the insurance charges to be paid to new policyholder based on the information that the insurance company is provided from, thus calculating the premium.

Các trường thông tin bao gồm:

age: age of policyholder

sex: gender of policy holder (female=0, male=1)

bmi: Body mass index

children: number of children / dependents of policyholder

smoker: smoking state of policyholder (non-smoke=0; smoker=1)

region: the residential area of policyholder in the US (Northeast, Northwest, Southeast, Southwest)

charges: individual medical costs billed by health insurance insuranceclaim

Source: https://www.kaggle.com/easonlai/sample-insurance-claim-prediction-dataset

Building forecasting model

Using Jupyter programming editor software to build forecasting model:

The network has 5 neural layers, 577 neural units, and 845.151 params.

The model after training has loss = 0.12 and mean absolute error = 0.0438.

DISCUSSION

To assess the accuracy of the forecast model in fact, need to check the model through the independent test data set:

With the test data set, the model for the result changes not much compared to the train data set: loss = 0.0050 data set; mean absolute error = 0.0451.

R2 Test score = 0.8607 shows model capable of explaining 86.07 % for the variation of the variable insurance charge. Thus, model_LSTM of insurance charges prediction is a good forecasting model with an ability to explain for dependent variable greater than 80 %.

Conclusion

With 6 basic information of customers, models that explain variable coverage of more than 80 %, the potential of Deep learning usage is huge in insurance premiums, as well as in general business. With the development of science and technology, the processing power of computers is increasingly strong, making Deep Learning more and more perfect, and more accurate, bringing great efficiency to society. This article introduced the use of LSTM algorithm in Deep Learning to forecast the insurance charges, but Deep Learning applications are still very large and the author will continue to introduce in the next article.

References:

  1. Alex Smola, & S. V. N. Vishwanathan. (2008). INTRODUCTION TO MACHINE LEARNING. Cambridge University Press.
  2. Edouard Duchesnayt, & Tommy Löfsted. (2018). Statistics and Machine Learning in Python — Release 0.2. ftp://ftp.cea.fr.
  3. Frank Rosenblatt. (1961). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Washington DC: Spartan Books.
  4. Guido van Rossum, & et al. (2018). The Python Language Reference — Release 3.7.1. Python Software Foundation.
  5. JürgenSchmidhuber. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
  6. NumPy community. (2018). NumPy Reference — Release 1.15.4. SciPy.org.
  7. Keras. (2019). Keras Documentation. keras.io
  8. Scikit-learn developers. (2018). scikit-learn user guide — Release 0.20.1. scikit-learn.org.
  9. Sepp Hochreiter, & Jürgen Schmidhuber. (1997). Long Short-Term Memory (Volume 9, Issue 8 ed.). Massachusetts Institute of Technology.
  10. Shai Shalev-Shwartz, & Shai Ben-David. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
  11. Shashi Sathyanarayana. (2014). A Gentle Introduction to Backpropagation. Numeric Insight, Inc Whitepaper.
  12. Wes McKinney & et al. (2018). pandas: powerful Python data analysis toolkit — Release 0.23.4. PyData Development Team.
  13. wikipedia.org/wiki/Long_short-term_memory

Ключевые слова

classification, deep learning, forecast, insurance charges, LSTM