Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article


Master of Science


Applied Mathematics


Zitikis, Ricardas

2nd Supervisor

Zou, Xingfu



The emergence of P2P(Peer-to-peer) lending has opened up a popular way for micro-finance, and the financial lending industry in many countries is growing rapidly. While it facilitates lending to individuals and small and medium-sized enterprises, improving the risk identification capability of the P2P platform is vitally necessary for the sustainable development of the platform. Especially the potential credit risk caused by information asymmetry, this may be fatal to this industry. In order to alleviate the adverse effects of this problem, this paper takes Lending Club’s real loan data as the empirical research object. The random forest is used to screen the importance of features, and backpropagation neural network approach is used to establish a credit risk classification model. Before loaning, the loan applicants can be divided into default and non-default. The results show that the credit risk measurement model is effective in predicting whether the lender will default.

Summary for Lay Audience

P2P(Peer-to-peer) lending is an innovative financial mode realized through the Internet. It has the advantages of convenient transactions, applicable to a wide range of people, and improving the efficiency of financial capital circulation. Compared to traditional banks, P2P lending play a important role in the lending industry. However, the lack of risk identification capabilities of many P2P platforms has led to problems that some P2P borrowers frequently fail to pay on time or pay less than the entire principal and interest. This type of risk is defined as credit risk and is primarily caused by the information asymmetry between borrowers and lenders. If this type of micro-financing method wants to achieve sustainable development, each P2P platform should find its own appropriate credit risk model to evaluate loan applicants so as to minimize losses and bring stable benefits for lenders. In this thesis, we will take Lending Club, the largest P2P platform in the United States, as an example and analyze its real lending data during 2018-2019. First, the missing data and outliers need to be preprocessing in its suitable way respectively to reduce the impact on model accuracy; then the random forest approach was used to rank and screen the importance of the features; finally the selected features are used to build a credit risk model using the BP neural network approach. By using this model, all loan applicants are able to be classified into two types: default and non-default. Loan applicants who are classified as default can be rejected in order to lower the credit risk. The empirical analysis results show that this model is effective for Lending Club case. This model also behave certain reference significance for the credit risk analysis of other P2P platforms.