How to Have a Continuous Tree Search
Decision tree using continuous variable
Solution 1
1) input variable : continuous / output variable : categorical
C4.5 algorithm solve this situation. C4.5
In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it.
2) input variable : continuous / output variable : continuous
CART(classification and regression trees) algorithm solves this situation. CART
Case 2 is the regression problem. You should enumerate the attribute j
, and enumerate the values s
in that attribute, and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Then you get two areas
Find the best attribute j
and the best split value s
, which
c_1
and c_2
and be solved as follows:
Then when do regression,
where
Solution 2
I can explain the concept at a very high level.
The main goal of the algorithm is to find an attribute that we will use for the first split. We can use various impurity metrics to evaluate the most significant attribute. Those impurity metrics can be Information Gain, Entropy, Gain Ratio, etc. But, if the decision variable is a continuous type variable, then we usually use another impurity metric 'standard deviation reduction'. But, whatever metric you use, depending on your algorithm (i.e. ID3, C4.5, etc) you actually find an attribute that will be used for splitting.
When you have a continuous type attribute, then things get a little tricky. You need to find a threshold value for an attribute that will give you the highest impurity (Entropy, Gain Ratio, Information Gain ... whatever). Then, you find which attribute's threshold value gives that highest impurity, and then chose an attribute accordingly, right?
Now, if the attribute is a continuous type and decision variable is also continuous type, then you can simply combine the above two concepts and generate the Regression Tree.
That means, as the decision variable is continuous type, you will use the metric (like Variance reduction) and chose the attribute which will give you the highest value of the chosen metric (i.e. variance reduction) for the threshold value of all attributes.
You can visualize such a regression tree using a Decision Tree Machine Learning software like SpiceLogic Decision Tree Software Say, you have a data table like this:
The software will generate the Regression tree like this:
Related videos on Youtube
Comments
-
I have a question about Decision tree using continuous variable
I heard that when output variable is continuous and input variable is categorical, split criteria is reducing variance or something. but I don't know how it work if input variable is continuous
-
input variable : continuous / output variable : categorical
-
input variable : continuous / output variable : continuous
About two cases, how we can get a split criteria like gini index or information gain?
When I use rpart in R, whatever input variable and output variable are it works well, but I don't know the algorithm in detail.
-
This is not a technical quesion: consider posting in communities cross-validated or datascience.
-
I'm voting to close this question because it is not about programming as defined in the help center but about ML theory/methodology.
-
Recents
Source: https://9to5answer.com/decision-tree-using-continuous-variable
0 Response to "How to Have a Continuous Tree Search"
Post a Comment