Proposal On Drop Out Prediction

1660 (4 pages)
Download for Free
Important: This sample is for inspiration and reference only

Table of contents


In recent years, there has been increasing interest in the use of data mining to investigate scientific questions within educational research, an area of inquiry termed educational data mining. Educational Data Mining (also referred to as “EDM”) is defined as the area of scientific inquiry centered on the development of methods for making discoveries within the unique kinds of data that come from educational settings, and using those methods to better understand students. The inclination that all students should finish college has grown throughout the last decades and continues to be an important goal for all educational levels in this new century. Non-completion has been related to all sorts of social, financial, and psychological. Many studies have attempted to put together a process that will identify students at risk of dropping out by using various research methodologies. The purpose of this study is to investigate college dropping out through the use of data mining of existing data sources with decision trees. Data mining techniques are applied to predict students dropping out from college. This paper presents an architecture that uses educational data mining techniques to predict and identify those who face the risk of dropping out.


College education is the essential element of knowledge based economy: universities are nowadays the most important source of basic research and are therefore crucial for the development of new technologies. Past years have shown a growing interest and concern in several countries about the problem of college failure and the determination of its main contributing factors. The great deal of analysis has been done on characteristic the factors that affect the low performance of students (school failure and dropout) at totally different instructional levels. To identify and find useful information hidden in massive databases is a trouble task. A very promising solution to reach this goal is that the use of information discovery in databases techniques or data mining in education, referred to as instructional data processing, EDM. This new area analysis focuses on the event of methods to better understand students and therefore the settings in which they learn. In fact, there are good samples of the way to apply EDM techniques to make models that predict dropping out and student failure specifically. These works have shown promising results with respect to those social science, economic, or instructional characteristics that may be additional relevant in the prediction of low educational performance.

Background And Related Work

The topic of explanation and prediction of academic performance is widely researched. In the earlier studies, the model of Tinto was the predominant theoretical framework for considering factors in academic success. Tinto considers the process of student attrition as a socio-psychological interplay between the characteristics of the student entering college and the experience at the institute. This interaction between the student's past and the academic environment leads to a degree of integration of the student into this new environment. A higher degree of integration is directly related to a higher commitment to the educational institute and to the goal of study completion. Later studies tried to operationalize this model identifying the factors like peer group interactions, interactions with faculty, faculty concern for student development and teaching, academic and intellectual development, and institutional and goal commitments that affect the students’ integration. These factors proved to have a predictive capacity across different institutions, and showed therefore to be a potential tool in identifying students who might drop out. Many studies included a wide range of potential predictors, including personality factors, intelligence and aptitude tests, academic achievement, previous college achievements, and demographic data and some of these factors seemed to be stronger than others, however there is no consistent agreement among different studies. One of the recent European studies has confirmed that sex (only in technical schools), age at enrollment, score on pre-university examination, type of pre-university education, type of financial support, fathers level of education and whether or not living at the university town may all have an impact on the drop out. All studies show that academic success is dependent on many factors, where grades and achievements, personality and expectations, as well as sociological background all play a role.

No time to compare samples?
Hire a Writer

✓Full confidentiality ✓No hidden charges ✓No plagiarism

Problem Of College Dropouts

According to the National Center for Education Statistics, only 59% of first-time students that began seeking bachelor's degrees or the equivalent in 2005 graduated within six years. Only 34% of males graduated in four years, compared to 42% of females. Similarly, only 22.5% of American Indians and Alaska natives graduated within four years. The study collected over twenty possible reasons or 'shocks' for students dropping out. These include events occurring at the school, such as an assault, conflict with a faculty member or departure of a close friend from the institution. Reasons for dropping out also include events occurring outside of the college or university, such as marriage and other personal issues. According to the study, six critical events are the most common reasons for withdrawals: Students are recruited by a job or other institution, receive an unanticipated bad grade, have conflicts with a roommate, lose financial aid, become clinically depressed or have a substantial increase in tuition or living costs. Developing a substance abuse problem, are very likely to cause a dropout, they occur in low numbers and are therefore limited as major causes.

Rules For Dropout Prediction

Data mining techniques and soft computing techniques are much more effective in the field of learning and prediction. The extracted features are given to the training phase, in this phase the data samples are learned their operation and actions to perform effectively. Once the training method is completed, the models itself learns the state and achieve the accurate result. Based on the trained samples the new data samples are given to the model to test the student performance.

Data Mining Techniques

Nearest Neighbour classifier (k-NN) k-NN algorithm is one of the well-known classification methods. It is based on learning by comparing a given test tuple with training tuples that are similar to it. When a new instance is introduced, k-NN finds the k-nearest neighbors of this new instance and determines the label of the new instance by using these k instances. In this study, closeness is defined in terms of a distance metric called Euclidean distance. Although our data set is mostly consisted of categorical variables, each category has a numerical counterpart; thus used the Euclidean distance. To assign a particular class to the test sample, the most common class among the k nearest neighbours is used and unclassified test sample is classified by a majority vote of its neighbours. A good value for k, the number of neighbours, was determined experimentally. Starting with k=1, 10-fold cross validation technique is used to estimate the error rate of the classifier. This process was repeated for k=10 times and in each iteration by incrementing to allow for one more neighbors. The value k was selected as “3” that gave the minimum error rate. The accuracy rate of classification can be very satisfying in some cases; there are just two parameters to learn, and the classification is very robust to missing values. However, the selection of distance function d might be difficult particularly for educational data sets.


Tree classifier (DT)DT is a powerful classification and prediction technique). There are several popular decision tree algorithms such as ID3, C4.5, and CART (classification and regression trees). DT is in the form of a tree structure, where each node is either a leaf node (indicating the value of the target class of examples) or a decision node (specifying a test to be carried out on a single attribute value, with one branch and sub-tree for each possible outcome of the test). DTs have many advantages such as very fast classification of unknown records, easy interpretation of small-sized trees, robust structure to the outliers‟ effects, and a clear indication of most important fields for prediction but DTs are very sensitive to over-fitting particularly in small data-sets. In this study, to generate a decision tree, the C 4.5 algorithm was used, which is an extension of earlier ID3 algorithm. To construct the tree, entropy measure was used in the determination of nodes. Since the attributes with higher the entropy cause more uncertainty in outcome, they were selected in order of increasing entropy.

Naive Bayes classifier (NB)

A simple probabilistic classifier called as Naive Bayes classifier was also used in student dropout classification. Naive Bayes algorithm as the simplest form of Bayesian network is one of the easiest algorithms to perform and has very satisfactory accuracy and sensitivity rates. The posterior probability of each class, Ci, is obtained by the Naive Bayes classifier using Bayes rule. The classifier makes the simplifying assumption that the attributes, A, are independent given the class, so the likelihood can be obtained by the product of the individual conditional probabilities of each attribute given the class. Thus, the posterior probability P (Ci|A1, An) can be given by the following equation/assumption: P (Ci|A1, A n ) = P(Ci ) P(A1|Ci ) P(An|Ci ) /P(A)This assumption is usually called the Naive Bayes assumption, a Bayesian classifier using this assumption is called the Naive Bayesian classifier, often abbreviated to “Naive Bayes”. Effectively, it means that we are ignoring interactions between attributes within individuals of the same class.

Neural Network classifier (NN)

The prediction of the student dropouts was also performed by feed-forward NN. It is another inductive learning method grounded on computational models of neurons and their networks as in humans‟ central nervous system. NN is a set of connected input/output units where each connection has a distinct weight associated with each other. During the learning phase, the network learns by adjusting the weights so as to be able to predict the correct class of the input samples. In this study, the back propagation algorithm was performed for learning on a multilayer feed-forward neural network. The input layer of the network consisted of nine variables of the students. The hidden layer included 50 neurons and the output layer had one neuron, which was determined by our experimental studies.

You can receive your plagiarism free paper on any topic in 3 hours!

*minimum deadline

Cite this Essay

To export a reference to this article please select a referencing style below

Copy to Clipboard
Proposal On Drop Out Prediction. (2020, July 22). WritingBros. Retrieved June 18, 2024, from
“Proposal On Drop Out Prediction.” WritingBros, 22 Jul. 2020,
Proposal On Drop Out Prediction. [online]. Available at: <> [Accessed 18 Jun. 2024].
Proposal On Drop Out Prediction [Internet]. WritingBros. 2020 Jul 22 [cited 2024 Jun 18]. Available from:
Copy to Clipboard

Need writing help?

You can always rely on us no matter what type of paper you need

Order My Paper

*No hidden charges