A Framework For Detecting Fraudulent Apps In Google Play Store
Table of contents
Existing system and motivation
The fraudulent app developers try to increase the ranking while searching the apps by providing fake reviews, etc. This happens because the popularity of the app and also provides financial benefits to the app developer. The main limitation of the existing approach is that the evidences of fraudulent behavior are difficult to be obtained in a particular time due to which a reputated app will get affected. It’s not suitable for extracting fraud evidences at a particular given time period. Hence existing system does not accurately detect fraud effectively and not able to come up with a solution to solve the user’s problem The three evidences such as higher rating, ranking and good reviews are aggregated to finalize the best results will give the user to identify the fraud and fraudfree apps in the playstore.
By analyzing the fraud apps, user can safely download the best app that was recommended according to the specific category. So this problem can be overcome by the solution which was suggested for the fraud detection in the apps in the playstore.
Proposed system
An app which has higher rating, ranking and good reviews may attract more users to download and can also be ranked higher in the leader board. The rating, ranking and reviews are not always real to believe. Some fraudulent developers boost their apps dishonestly. The Google playstore fraud app is detected by aggregating the three evidences such as ranking based, co review based and rating based evidence. Thus by aggregating entire activities of leading apps, it can achieve accuracy in classifying standard datasets of fraudulent and legitimate apps. This paper proposes a framework for detecting fraudulent apps in Google Play Store.
An incremental learning approach is proposed. The apps evidence such as rating, ranking and review evidences will be integrated by an unsupervised evidence-aggregation method for evaluating the mobile Apps. Here we have implemented incremental learning approach to effectively characterize the large dataset and to provide better aggregation. Methodologies Used In Proposed System Incremental learning algorithm The aim of incremental learning is for the learning model to adapt to new data without forgetting its existing knowledge; it does not retrain the model. Some incremental learners have built-in some parameter or assumption that controls the relevancy of old data, while others, called stable incremental machine learning algorithms Porter Stemmer Algorithm The Porter stemming algorithm is a process for removing the commoner morphological and inflexional endings from words in English. The five steps include normalization, case folding, lemmatization, morphology and opinion words.
Preprocessing
Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. The current volume of data managed by our systems have surpassed the processing capacity of traditional systems, and this applies to data mining as well.
Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Here, data preprocessing involves transforming the large set of data into table format which can be able to list the details of the particular app. This aids in analysis of data which are used for the decision making. These information are used for the subsequent steps of analyzing the fraud apps. So, this is the first and foremost step for the fraud detection approach
Mining foremost events
The Application fraud is usually happens in Foremost Events, therefore identifying fraud mobile Apps is actually to detect fraud within foremost events of mobile Apps. Specifically, proposing a simple yet effective algorithm to identify the foremost events of each App based on its historical usage records. Consider the most frequently downloaded apps.
There is a chance where the frequently downloaded apps may be the area where finding the fraud may occur. As this going to be one part of the detection and later integrating the other two events will provide the fraudfree apps. Then, with the analysis of Apps ranking behaviors, finding the fraudulent Apps often have different usage patterns in each foremost events compared with normal Apps.
There are two main steps for mining Foremost Events. To discover Foremost events from the App’s historical Usage records. To merge adjacent events for constructing foremost event records.
Usage facts
A Foremost session is composed of several foremost events. Therefore, we should first analyze the basic characteristics of leading events for extracting fraud evidences. Ranking fraud in the mobile app market refers to fraudulent or deceptive activities which have a purpose of bumping up the apps in the popularity list. By analyzing the App’s historical usage records, observing that App’s usage behaviors in a foremost event always satisfy a specific ranking pattern, which consists of three different ranking phases, namely rising phase, maintaining phase and recession phase. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area.
Providing a holistic view of ranking fraud and propose a ranking fraud detection system for mobile apps. Specifically, proposing to accurately locate the ranking fraud by mining the active periods, namely leading sessions, of mobile Apps. The data are in the form of numeric values which are the difference in the review dates are taken into account. Such data are located and a mean has been taken for the purpose of segregating the one above the mean to detect the fraud apps. This measure provides the result in the analysis of the usage facts.
Grade facts
The ranking based evidences are useful for ranking fraud detection. However, sometimes, it is not sufficient to only use ranking based evidences. Specifically, after an App has been published, it can be rated by any user who downloaded it. Indeed, user rating is one of the most important features of Apps advertisement. An App which has higher rating may attract more users to download and can also be ranked higher in the leader board. Thus, rating manipulation is also an important perspective of fraud. Intuitively, if an App has ranking fraud in a leading session s, the ratings during the time period may have anomaly patterns compared with its historical ratings, which can be used for constructing rating based evidences.
The mean has been taken for those segregated data sets and the mean which was considered to be the standard values was taken. Then analyzing the data with the mean values which supports to detect the fraud apps. Those are taken into account which was later incorporated to identify the fraud apps.
Evaluation facts
Besides ratings, most of the App stores also allow users to write some textual comments as App reviews. Such reviews can reflect the personal perceptions and usage experiences of existing users for particular mobile Apps. Indeed, review manipulation is one of the most important perspectives of App Usage facts. Specifically, before downloading or purchasing a new mobile App, users often firstly read its historical reviews to ease their decision making, and a mobile App contains more positive reviews may attract more users to download. Therefore, impostors often post fake reviews in the foremost sessions of a specific App in order to inflate the App downloads, and thus propel the App’s ranking position in the leader board. Although some previous works on review spam detection have been reported in recent years, the problems of detecting the local anomaly of reviews in the leading sessions and capturing them as evidences for ranking fraud detection are still under-explored. Usage of Approval of scores by the admin to identify the exact review scores. The stop words will discard the very commonest word like preposition, numbers of a language. The splitting of words will help us to analyze the words .The port stemmer algorithm will provide the root word for the analysis. The opinion words are analyzed by the scores. The stemming is the process of reducing the word to the proper class. The case folding is the process of reducing the word into the lower case. The lemmatizations have to find correct dictionary headword form. They reduce inflections or variant forms to base forms. The Morphology is the analysis of small meaningful units that makeup the words. The two main thing are the stems which is the core meaning bearing units and other is the affixes which are the bits and pieces that adhere to stems. The term frequency for each word is calculated.
Then those nature of words are analyzed and calculate their similarity scores. Then the scores are put into standard values for analyzing the fraud apps.
Facts aggregation
After extracting three types of fraud evidences, the next challenge is how to combine them for ranking fraud detection. Indeed, there are many ranking and evidence aggregation methods in the literature, such as permutation based models, score based models. However, some of these methods focus learning a global ranking for all candidates. Dynamic scoring is a type of software integration that allows the scoring process to be invoked so that the scores be used for analyzing the nature of the review. Other methods are based on supervised learning techniques, which depend on the labeled training data and are hard to be exploited. Instead, we used an unsupervised approach based on fraud similarity to combine these evidences. The combined evidences provides the best and the fraudulent app details.
App recommendation
The recommendation process is very helpful to the mobile user to choose best apps and to avoid fraud apps before to download. The recommendation process compares the evidence aggregated result with the leading session better apps. The best apps are analyzed and provide the way that helps the user to make decisions.
The web application which supports the user by suggesting the good apps in various sectors like gaming, sports, music etc.
Implementation of proposed system
The Google playstore fraud app is detected by aggregating the three evidences such as ranking based, co review based and rating based evidence. The first step is data preprocessing which is a data mining technique that involves transforming raw data into an understandable format. Preprocessing involves transforming the large set of data into table format which can be able to list the details of the particular app. This aids in analysis of data which are used for the decision making. This information is used for the subsequent steps of analyzing the fraud apps.
The second follows identifying the foremost events of each App based on its historical usage records. Then, with the analysis of Apps ranking behaviors, the fraudulent Apps often have different usage patterns in each foremost events compared with normal Apps.
On foremost event, first analyze the basic characteristics of leading events for extracting fraud evidences. By analyzing the App’s historical usage records, observing that App’s usage behaviors in a foremost event always satisfy a specific ranking pattern, which consists of three different ranking phases, namely rising phase, maintaining phase and recession phase. There is a chance where the frequently downloaded apps may be the area where we can find the fraud may occur.
The results were kept later for the analysis after integration of all the three evidences to prove that the app is fraud. There are two main steps for mining Foremost Events. We need to discover Foremost events from the App’s historical Usage records. We provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile apps. we propose to accurately locate the ranking fraud by mining the active periods, namely leading sessions, of mobile Apps. The data are in the form of numeric values which are the difference in the review dates are taken into account. Such data are located and a mean has been taken for the purpose of segregating the one above the mean to detect the fraud apps. it is not sufficient to only use ranking based evidences. Specifically, after an App has been published, it can be rated by any user who downloaded it.
Rating manipulation is also an important perspective of fraud where rating is determined based on the downloads. If an App has ranking fraud in a leading sessions, the ratings during the time period may have anomaly patterns compared with its historical ratings, which can be used for constructing rating based evidences.
The mean has been taken for those segregated data sets and the mean which was considered to be the standard values was taken. Then analyzing the data with the mean values which supports to detect the fraud apps. Those are taken into account which was later incorporated to identify the fraud apps.
Impostors often post fake reviews in the foremost sessions of a specific App in order to inflate the App downloads, and thus propel the App’s ranking position in the leader board. Review manipulation is performed by detecting the local anomaly of reviews in the leading sessions and capturing them as evidences for ranking fraud detection. We are using Approval of scores by the admin to identify the exact review scores. The stop words will discard the very commonest word like preposition, numbers of a language. The splitting of words will help us to analyze the words. The port stemmer algorithm will provide the root word for the analysis. The opinion words are analyzed by the scores. The stemming is the process of reducing the word to the proper class. The case folding is the process of reducing the word into the lower case. The lemmatizations have to find correct dictionary headword form. They reduce inflections or variant forms to base forms. The Morphology is the analysis of small meaningful units that makeup the words. The two main thing are the stems which is the core meaning bearing units and other is the affixes which are the bits and pieces that adhere to stems. The term frequency for each word is calculated. Then those nature of words are analyzed and calculate their similarity scores. Then the scores are put into standard values for analyzing the fraud apps.
After extracting three types of fraud evidences, then the evidences are combined and provides the best and the fraudulent app details. Then it is recommendation process, which is very helpful for the mobile user to choose best apps and avoid fraud apps before downloading. The recommendation process compares the evidence aggregated result with the leading session better apps. The best apps are analyzed and provide the way that helps the user to make decisions. The web application which supports the user by suggesting the good apps in various sectors like gaming, sports, music etc.
Cite this Essay
To export a reference to this article please select a referencing style below