Home Essay Samples Information Science and Technology Website

Malicious Website Collection System Using Machine Learning

Category

Information Science and Technology

Topic

Computer Science, Internet, Website

Words

1177 (3 pages)

Downloads

Download for Free

Important: This sample is for inspiration and reference only

Malicious websites are those sites which contains malicious content or files in it. It lure the user when they click on it either by taking to some other irrelevant site or downloading some malicious content in the user system without the user’s knowledge.

These websites appears to be legitimate websites but they are malicious sites. It contains various content such as spam, phishing, driven-by-download, virus, Ransomware and other etc. These malicious sites even cause huge losses to particular organization or to an individual user. Typically a blacklisting mechanism is used to detect the malicious websites. But these blacklisting mechanism doesn’t worked efficiently to identify all kinds of malicious sites. This blacklisting mechanism can be easily evaded by the attacker. To overcome this blacklisting mechanism a machine learning approach is used to detect and tackle all kind of malicious contents in the web pages. This machine learning approach can’t be evaded by the attacker. Supervised and Unsupervised machine learning approaches are used to detect the malicious websites. Supervised approach is used to detect known attacks whereas Unsupervised learning is used to detect unknown malicious websites. For classification of websites we use Hidden Markov Model(HMM) which is safe and reliable for operating on the internet. This model works efficiently to find inter-dependencies among the resources. It also uses Spark's fast extraction to distinguish their attributes, the Baum-Welch algorithm and Viterbi algorithm in the Markov model can quickly classify unknown domain names accurately to achieve effective detection of malicious websites. Finally, the HMM was compared with the commonly used random forest model through experiments, and the accuracy and recall rate were compared. The results show that the application of HMM improves the performance of the classifier to obtain more accurate detection results.

Introduction

The internet has become the medium of option for public to search for information, conduct business, and enjoy entertainment. At the same time, the internet turns out to be the most important stage used by miscreants to attack users. The most commonly used example is drive by download attack. In this attack, attackers insert different modes of attack in the web pages to which malicious URLs direct and once the victim clicks on a malicious URL, they are taken to that web page without notice. Then the attacker may steal any of the victim‘s information that is saved on the host computer, which may lead to grave financial loss. When malicious URLs are sent by friends, victims are more likely to click them. In addition to drive-by-download exploits, attackers also use social engineering to trick victims into installing or running untrusted software. As an example, consider a webpage that asks users to install a fake video player that is presumably necessary to show a video (when, in fact, it is a malware binary). Another example includes fake anti-virus programs. These programs are expanded by web pages that alert users into thinking that their machine is infected with malware, alluring them to download and execute an actual piece of malware as a remedy to the claimed infection. The web is growing rapidly and is a very large place, in which new pages (both benign and malicious) are added at formidable place.

No time to compare samples?

Hire a Writer

✓Full confidentiality ✓No hidden charges ✓No plagiarism

There has been lot of changes and phases in the history of malicious software since it has been exposed and detected in hosts and networks, preliminary from virus which is a self-Replicating adware but not self-transporting moving to worm, which is a self-replicating and self- transporting and going more for other. The figure of malware attack is increasing sharply with the rapid increase in complexity and interconnection of rising information systems. When the user clicks on the URL it is most likely to become a target. To prevent users from visiting such URL much may be malicious or contain illegal content, large amount of research generated by the security industry is done.

A major percentage of those losses were basis by one mainly infamous group, called as the rock phish gang that uses toolkits to create a large number of unique phishing URLs, putting more pressure on the correctness and precision of blacklist-based antiphishing techniques. New, previously unseen malicious executables, polymorphic malicious executables using encryption and metamorphic malicious executables adopting obfuscation techniques are more complex and difficult to detect. At present, most commonly used malware detection software make use of signature-based method and the heuristic based method to identify threats.

Signatures are strings of bytes which are short and exclusive to the programs. There use is to recognize scrupulous threats in executable files, records of boot, or memory. The disadvantage, this signature based method is not effective next to customized and unidentified malicious executables this is due to the signature extraction and generation process. Heuristic-based method is more complex than signature based detection techniques, the disadvantage of this method is that time consuming and still fails to detect new malicious executables. The traditional method of malicious website detection is to detect malicious domain names through methods such as domain name blacklist, reverse technology, and data mining. However, as more and more new network technologies are applied, the generation and use of malicious domain names have become more and more flexible, traditional detection methods cannot effectively detect these malicious domain names. In addition with the continuous increase in the number of registrations, queries, and system deployments of the global domain name system, the complexity of the DDOS attack scale and attack technology for the domain name system is also significantly improved.

Related Work

Microsoft research project known as Strider Honey Monkey focuses on detection of web sites exploiting Internet Explorer. During this project, Yi-Ming et. al. reported many unknown and zero day vulnerabilities. They deployed internet explorer on differently patched Windows machines and analyzed the state change of each machine when visiting these machines.

Rohit Shukla and Maninder Singh explained the concept of using Python for detecting malicious Urls in their paper “PythonHoneyMonkey: Detecting Malicious Web URLs on Client Side Honeypot Systems” they presented similar project related to Microsoft research project they used snort tool to blacklist all malicious Urls since snort has predefined signatures and A python based utility Beautiful Soup is used on windows OS as web Crawler whereas Lynx is used for linux based OS which is command line based web Crawler. IP blacklist file is Created which stores the IP address by using Snort IDS tool. The snort tool runs in background which logs all activity going in the system via network.

Abubakr Sirageldin, Baharum B. Baharudin, and Low Tang Jung their paper “Malicious Web Page Detection: A Machine Learning Approach” explains about a framework to detect malicious webpages using artificial neural network learning techniques. This framework reduced high false positive rate. This framework is partial rendering method used for URL feature collection. Frank Vanhoenshoven, Gonzalo N´apoles, Rafael Falcon, Koen Vanhoof and Mario K¨oppen they used a machine learning approach in their paper “Detecting Malicious URLs using Machine Learning Techniques” they overcome the problems in the blacklisting method and used various algorithms in machine learning such as random forest and multilayer perception. This algorithm requires more calculations and it is also time consuming process.

You can receive your plagiarism free paper on any topic in 3 hours!

*minimum deadline

Learn more

Cite this Essay

To export a reference to this article please select a referencing style below

Copy to Clipboard

APA
MLA
Harvard
Vancouver

Malicious Website Collection System Using Machine Learning. (2020, July 15). WritingBros. Retrieved July 5, 2025, from https://writingbros.com/essay-examples/malicious-website-collection-system-using-machine-learning/

“Malicious Website Collection System Using Machine Learning.” WritingBros, 15 Jul. 2020, writingbros.com/essay-examples/malicious-website-collection-system-using-machine-learning/

Malicious Website Collection System Using Machine Learning. [online]. Available at: <https://writingbros.com/essay-examples/malicious-website-collection-system-using-machine-learning/> [Accessed 5 Jul. 2025].

Malicious Website Collection System Using Machine Learning [Internet]. WritingBros. 2020 Jul 15 [cited 2025 Jul 5]. Available from: https://writingbros.com/essay-examples/malicious-website-collection-system-using-machine-learning/

Copy to Clipboard

Prof. Lesley

Best in Technology

Finished papers 1635

Customer reviews 1112

100%

"She was polite, courteous and was always available! Incredible work, wow."

writers available online

Hire Now

Malicious Website Collection System Using Machine Learning

Introduction

Related Work

Cite this Essay

Related Essays