How to Have a Continuous Tree Search

Decision tree using continuous variable

Solution 1

1) input variable : continuous / output variable : categorical
C4.5 algorithm solve this situation. C4.5

In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it.

2) input variable : continuous / output variable : continuous
CART(classification and regression trees) algorithm solves this situation. CART

Case 2 is the regression problem. You should enumerate the attribute j, and enumerate the values s in that attribute, and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Then you get two areas enter image description here

Find the best attribute j and the best split value s, which

enter image description here

c_1 and c_2 and be solved as follows:

enter image description here

Then when do regression,
enter image description here

where

enter image description here

Solution 2

I can explain the concept at a very high level.

The main goal of the algorithm is to find an attribute that we will use for the first split. We can use various impurity metrics to evaluate the most significant attribute. Those impurity metrics can be Information Gain, Entropy, Gain Ratio, etc. But, if the decision variable is a continuous type variable, then we usually use another impurity metric 'standard deviation reduction'. But, whatever metric you use, depending on your algorithm (i.e. ID3, C4.5, etc) you actually find an attribute that will be used for splitting.

When you have a continuous type attribute, then things get a little tricky. You need to find a threshold value for an attribute that will give you the highest impurity (Entropy, Gain Ratio, Information Gain ... whatever). Then, you find which attribute's threshold value gives that highest impurity, and then chose an attribute accordingly, right?

Now, if the attribute is a continuous type and decision variable is also continuous type, then you can simply combine the above two concepts and generate the Regression Tree.

That means, as the decision variable is continuous type, you will use the metric (like Variance reduction) and chose the attribute which will give you the highest value of the chosen metric (i.e. variance reduction) for the threshold value of all attributes.

You can visualize such a regression tree using a Decision Tree Machine Learning software like SpiceLogic Decision Tree Software Say, you have a data table like this:

enter image description here

The software will generate the Regression tree like this:

enter image description here

Comments

I have a question about Decision tree using continuous variable

I heard that when output variable is continuous and input variable is categorical, split criteria is reducing variance or something. but I don't know how it work if input variable is continuous
1. input variable : continuous / output variable : categorical
2. input variable : continuous / output variable : continuous
About two cases, how we can get a split criteria like gini index or information gain?

When I use rpart in R, whatever input variable and output variable are it works well, but I don't know the algorithm in detail.
- This is not a technical quesion: consider posting in communities cross-validated or datascience.
- I'm voting to close this question because it is not about programming as defined in the help center but about ML theory/methodology.

Recents

kimballtheed1964.blogspot.com

Source: https://9to5answer.com/decision-tree-using-continuous-variable

How to Have a Continuous Tree Search

Decision tree using continuous variable

Solution 1

Solution 2

Related videos on Youtube

Comments

Recents

0 Response to "How to Have a Continuous Tree Search"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel