Hybrid recommender system based on Apache Spark | Статья в журнале «Молодой ученый»

Отправьте статью сегодня! Журнал выйдет 28 декабря, печатный экземпляр отправим 1 января.

Опубликовать статью в журнале

Авторы: ,

Рубрика: Информационные технологии

Опубликовано в Молодой учёный №24 (419) июнь 2022 г.

Дата публикации: 17.06.2022

Статья просмотрена: 20 раз

Библиографическое описание:

Ши, Юнцзе. Hybrid recommender system based on Apache Spark / Юнцзе Ши, С. В. Григорьев. — Текст : непосредственный // Молодой ученый. — 2022. — № 24 (419). — С. 77-79. — URL: https://moluch.ru/archive/419/93151/ (дата обращения: 17.12.2024).



Through the analysis of the recommendation system, it is found that the traditional recommendation algorithm has some shortcomings, and the emergence of the hybrid recommendation algorithm effectively alleviates these problems.

Keywords: hybrid, recommender, system, Spark, collaborative, filtering

Introduction

Recommendation algorithms are the core of recommendation systems, and for recommendation algorithms, the more common ones are: collaborative filtering recommendation algorithms, content-based recommendation algorithms, association rule-based recommendation algorithms, and model-based recommendation algorithms. Although, many of them have been applied to the specific business of some large Internet companies, including the most classic and earliest used collaborative filtering-based recommendation algorithm, however, such common problems of recommendation algorithms as data sparsity [1] and cold start [2] still exist, so further research on common recommendation algorithms is needed thus to solve these problems.

Along with the exponential growth of data volume in the Internet era, traditional recommendation systems can no longer store huge amount of data as well as perform high-speed counting in single-server mode. Therefore, how to handle network-level data sets [3] is the problem that recommendation systems should reconsider. At this point, distributed counting techniques that store massive amounts of data on multiple machines emerge in order to improve the efficiency of system operation and to effectively reduce the information load. At the same time, the counting speed of the recommender system was further improved by parallelizing the models used.

Recommender Systems

Collaborative filtering-based recommendation algorithms

For collaborative filtering-based recommendation algorithms, we can classify such algorithms into: user-based, item-based, and model-based collaborative filtering algorithms. In most of our common recommendation systems, this type of recommendation algorithm is used [4]. For collaborative filtering algorithms, the main idea is to perform similarity calculation based on the user's historical information to accomplish the recommendation task. The following are brief descriptions of these types of collaborative filtering algorithms, respectively.

Hybrid recommender systems

Hybrid recommendation is a recommendation method that combines multiple recommendation algorithms in order to make better recommendations. The main reason for combining multiple recommendation algorithms is that any single recommendation algorithm has more or less its defects, and the hybrid recommendation, which combines multiple recommendation algorithms, can take advantage of some aspects of a single recommendation algorithm, and effectively integrate these advantages, so as to compensate for the disadvantages of some single recommendation algorithms, which is equivalent to a method that combines the strengths of a hundred schools of thought. If the combination is right, the advantages are indeed great.

Design and Implementation of Hybrid Recommendation Algorithm

The design ideas of the hybrid recommendation algorithm are as follows: first, in view of the problems of cold start and data sparseness in the traditional recommendation algorithm based on collaborative filtering, this research uses a statistical-based recommendation algorithm in the design of the offline recommendation algorithm to effectively alleviate the data.

For the sparse problem, the content-based recommendation algorithm is used to effectively alleviate the cold start problem, and the ALS-based matrix factorization model is used to further optimize and improve the collaborative filtering-based recommendation algorithm. Secondly, in view of the fact that the real recommendation system cannot update the recommendation results significantly after this rating or after several recent ratings, and cannot meet the real-time or quasi-real-time requirements of response time, this research proposes a model-based real-time recommendation algorithm to improve this problem.

Design and implementation of offline recommendation algorithm

For the offline algorithm design part of the recommendation system, the recommendation algorithms mainly used in this research are: statistics-based, collaborative filtering-based and content-based recommendation algorithms. Among them, the statistical-based recommendation algorithm mainly uses Spark SQL to perform statistical analysis on the offline recommendation of the recommendation system and the data information used for real-time recommendation, and prepares the data for them.

Experiments and Analysis

For comparative experiments, the first thing to mention is the experimental dataset. This article mainly uses the open source MovieLens dataset after processing. The size of the dataset is 200k, 1M and 10M. There are mainly three kinds of specific files in the dataset, namely: Movies dataset, Ratings dataset and Tag dataset.

Among them, the first set of comparative experiments set up a 10M data set as a fixed size data set, so that the commonly used recommendation algorithms and the hybrid recommendation algorithm designed by this system are respectively executed in the Spark platform and the stand-alone system. In order to verify the advantages of the hybrid recommendation algorithm proposed in this research, the following is a comparison chart of the execution efficiency of different algorithms, as shown in Figure 1:

112

Fig. 1. Execution efficiency of different algorithms

For the second set of comparative experiments, it is mainly by using data sets of different sizes to execute on a single-machine system and a Spark distributed platform. Here, the same hybrid recommendation algorithm is guaranteed to be executed, so as to compare its execution efficiency and to verify this. The system is based on the advantages of the Spark platform. Figure 2 is the corresponding comparison chart:

屏幕截图 2022-05-25 121221

Fig. 2. Advantages of the Spark platform

By comparing and analyzing the execution efficiency of the two sets of comparative experiments, the first set of comparative experiments can verify that the hybrid recommendation algorithm proposed in this research has more advantages in algorithm execution efficiency than the traditional recommendation algorithm, and the second set of comparative experiments can verify Building a recommendation system based on the Spark distributed platform is more efficient than a single-machine recommendation system in the case of dealing with a large amount of data.

Conclusion

Through a comprehensive survey of the recommendation system, it is found that there are some problems in common recommendation systems, and the specific manifestations are as follows: Traditional recommendation algorithms have problems of sparse data and cold start, common recommendation systems cannot well meet users' requirements for real-time update of recommendation results, and with the continuous expansion of data scale, common stand-alone recommendation systems have been unable to complete recommendation tasks well. In view of these problems, this paper gives corresponding solutions.

References:

  1. Wang G, Liu H. Survey of personalized recommendation system [J]. Computer Engineering and Applications, 2012, 48(7): 66–76.
  2. Barjasteh I, Forsati R, Ross D, et al. Cold-Start Recommendation with Provable Guarantees: A Decoupled Approach [J]. IEEE Transactions on Knowledge & Data Engineering, 2016,28(6):1462–1474.
  3. Cozza V, Hoang V T, Petrocchi M, et al. Experimental Measures of News Personalization in Google News [J]. 2016.21–53.
  4. Meng Xiangwu, LIU Shudong, Zhang Yujie, et al. Research on Social Recommendation System [J]. Journal of Software, 2015, 26(6):1356–1372.
Основные термины (генерируются автоматически): IEEE, LIU, SQL.


Ключевые слова

system, hybrid, recommender, Spark, collaborative, filtering

Похожие статьи

Rapport as a component for successful English teaching

In this article, the author analyzes data presented by various articles to come to a conclusion whether rapport and specific ways of communication may influence English teaching in a positive way.

Some ıssues of research of effectıveness of renewable energy sources

In this scientific article was given the main information about renewable energy potential of Azerbaijan, also issues of investigation such problems as the effectiveness and fields of application.

Enhancing reading process through effective strategies among students

This article highlights the most effective ways of clarifying the appropriate answer as much as quickly. It discusses how to encourage learners to develop reading skill through different strategies.

Application of innovative Building structures

This article analyzes and provides methods for solving problems associated with resource conservation during design. And also in the study, we attach great importance to the economic aspect of structures.

Probability Generating Functions For Markov Matrix

A general matrix representation is given for the multivariate transition probability generating functions of a Markov Process with a finite number of states. It is indicated how numerous derived probability distributions can be obtained by simple sub...

Using drama techniques for EFL classes

The paper is devoted to drama activities as successful in making language learners experience in operation and providing motivation to use language embedded in a context and a situation.

Different techniques of using boards while foreign language teaching

The article is devoted to the advantages of using boards while teaching English. With the help of different examples, the advantages and different techniques of using boards are explained.

Examples of the Use of Games in Primary Classes

In this article it is spoken about the importance of playing activities in the development of a student’s personality; classifying games; examples of games that a teacher can use in the learning process.

Improvement of the technological process of the seed treater

The paper deals with the issues related to the seed dressing process, justification of the modernization of the seed dresser and the calculation of the air flow in SolidWorks program.

Analysis of Approaches to Implementing the Principle of TQM «Continuous Improvement» in Production Management

The paper analyzes the approaches to the study and development cycle PDCA, which is the basis of the principle of TQM «continuous improvement». The techniques «8D», QS-story as examples of approaches to continuous improvement of production processes...

Похожие статьи

Rapport as a component for successful English teaching

In this article, the author analyzes data presented by various articles to come to a conclusion whether rapport and specific ways of communication may influence English teaching in a positive way.

Some ıssues of research of effectıveness of renewable energy sources

In this scientific article was given the main information about renewable energy potential of Azerbaijan, also issues of investigation such problems as the effectiveness and fields of application.

Enhancing reading process through effective strategies among students

This article highlights the most effective ways of clarifying the appropriate answer as much as quickly. It discusses how to encourage learners to develop reading skill through different strategies.

Application of innovative Building structures

This article analyzes and provides methods for solving problems associated with resource conservation during design. And also in the study, we attach great importance to the economic aspect of structures.

Probability Generating Functions For Markov Matrix

A general matrix representation is given for the multivariate transition probability generating functions of a Markov Process with a finite number of states. It is indicated how numerous derived probability distributions can be obtained by simple sub...

Using drama techniques for EFL classes

The paper is devoted to drama activities as successful in making language learners experience in operation and providing motivation to use language embedded in a context and a situation.

Different techniques of using boards while foreign language teaching

The article is devoted to the advantages of using boards while teaching English. With the help of different examples, the advantages and different techniques of using boards are explained.

Examples of the Use of Games in Primary Classes

In this article it is spoken about the importance of playing activities in the development of a student’s personality; classifying games; examples of games that a teacher can use in the learning process.

Improvement of the technological process of the seed treater

The paper deals with the issues related to the seed dressing process, justification of the modernization of the seed dresser and the calculation of the air flow in SolidWorks program.

Analysis of Approaches to Implementing the Principle of TQM «Continuous Improvement» in Production Management

The paper analyzes the approaches to the study and development cycle PDCA, which is the basis of the principle of TQM «continuous improvement». The techniques «8D», QS-story as examples of approaches to continuous improvement of production processes...

Задать вопрос