Before getting to the topic of this post, the regression tree, let’s understand the basics of a decision tree. As the name suggests, a decision tree is used for making a decision. The simplest decision could be when an alarm clock decides to ring. An alarm clock will keep track of the time and if the time matches with the pre-set time, it starts ringing. Another famous example could be tossing a coin and getting one of the two results, heads or tails.

The goal of every decision tree is to predict the outcome based on the input of various variables. That’s why it is used in data analysis and modeling. It is also extensively used in computer programming and in algorithms where a computer needs to decide an option based on certain criteria.

## What a Decision Tree Looks Like

Decision trees have two main components: the problem statement (represented by the root of the tree) and a set of consequences or solutions (represented by the branches of the tree). And it can extend at any length representing all options of a problem statement. A key difference between real trees and decision trees is that the decision tree is typically an inverted tree with the root at the top.

Here is a decision tree depicting the possibilities of a coin toss:

# The Two Type of Decision Trees

**Classification Trees:** When the decision tree has categorical target variable. The above tree is an example of a classification tree because we know that there are two options for the result.

**Regression Trees: When the decision tree has a continuous target variable. For example, a regression tree would be used for the price of a newly launched product because price can be anything depending on various constraints.**

Both types of decision trees fall under the Classification and Regression Tree (CART) designation.

## Terminology of Regression Trees

**Root:** This is the beginning of decision tree, which also represents the population sample. For example, say you want to decide the best performing employee in an organization based on various criteria, such as attendance of the employee, number of successful projects, number of employees he/she mentored, etc. So here, the entire population of employees is at the root of the decision tree.

**Leaf:** The terminal node is called the leaf node. In our example, the final best employee would be the leaf node or the terminal node.

**Decision Node:** Here the other nodes are divided into the further categories. In our example, the various criteria would determine a decision node.

**Child Node:** When a node is divided into other subparts, subparts are called child nodes. And the node which is divided is called the **parent node**.

Check out this example of a regression tree where the root node is divided into sub-nodes based on continuous values. It is further divided into other subparts before getting to the leaf node or the terminal node.

## Advantages of Regression Trees

- A user can visualize each step, which can help with making rational decisions.
- You can give the priority to a decision criterion. For example, in our employee example, you can put the attendance criteria on the top of the decision tree if that is the most important criteria.
- Making a decision based on regression is much easier than most other methods. Since most of the undesired data will be filtered outlier each step, you have to work on less data as you go further in the tree.
- It is easy to prepare a regression tree. A user can present it to the higher authorities in a much easier way as it can be represented on a simple chart or diagram.

In this blog post, you learned about regression trees. Stay tuned to Magoosh data science blogs for more!

## Comments are closed.