## From Machine Learning -Tom M. Mitchell

*Machine Learning is at the forefront of advancements in Artificial Intelligence. It’s moving fast with new research coming out each and every day. This post is in continuation of important concepts and notes right from the basics to advance, from the book **Machine Learning**, by **Tom M. Mitchell.*

For Machine Learning Notes 1, please click the link below.

### CHAPTER 2: CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING

**2.1 Concept Learning**

A problem of searching through a predefined space of potential hypothesis for the hypothesis that best fits the training example.

Inferring a boolean-valued function from training examples of its input and output.

### Inductive Learning Hypothesis

Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over the unobserved examples.

### 2.2 Find-S Algorithm

An algorithm to find the most specific hypothesis or best fit hypothesis out of all hypothesis.

The algorithm works only for Positive instances.

In general, any instance in positive features having more than one value should be replaced with a general term <?>, which is suitable for both the values.

Algorithm:

Most specific hypothesis, not allowing any new value to fit.

h ← <**ϕ,ϕ,ϕ,ϕ,ϕ>**

### Drawback: Defined for positive instances only.

**2.3 Candidate Elimination**

Provides us with the most specific & the most general hypothesis for a training example.

Takes consideration for both positive & negative instances.

Return the **General Hypothesis (G)** and the **Specific Hypothesis (S)**

### 2.4 Inductive Bias

Inductive bias talks about the unobserved instances, how does the hypothesis react to it.

The reason why a hypothesis might not work perfectly for the unobserved instances faced in testing data is caused because it never faced it.

To overcome that we ’ve: **UNBIASED LEARNER**

The solution to it is that the hypothesis should able to represent every possible subset of instance X, i.e, for all subset of X, the power set of X. Which makes the learning of hypothesis function V(h) ←h’

where h’ is the hypothesis state after the unobserved instance.

### CHAPTER 3: DECISION TREE LEARNING

### 3.1 Introduction

Decision tree learning is a method for approximating discrete- valued function that is robust to noisy data and capable of learning disjunctive expression it’s a practical method for inductive inference.

Decision tree classifies instances by sorting them down the tree from the root to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, each branch descending from the node corresponds to one of the possible values for the attributes.

### 3.2 Problem sets of decision tree learning

- Instances are represented by an attribute-value pair.
- Target function has a discrete output value.
- A disjunctive description may be required.
- Training data may contain error.
- Training data may contain missing values.

### 3.3 ID3

Information gain, the measure of how well a given attribute separates the train.

In order to define information gain precisely, we define entropy, that characterises the impurity of an arbitrary collection of examples.

More generalized

The measure of the effectiveness of an attribute is information gain.

ID3 are better in shorter trees as compared to larger trees.

**Issues: **Avoiding overfitting of data.

Can be avoided in two major ways:

- the approach that stops growing the tree earlier, to the point where it reaches perfection.
- the approach that allows to overfit the data, then post-prune the tree.
**(more successful)**

Some of the other approaches that can be taken.

- use a separate set of examples, distinct from the training examples.
- use all training data, but apply a statistical test to estimate pruning a particular node to produce an improvement beyond the training set.
- use an explicit measure of complexity.

**Pruning**: removing the subtree rooted at that node. performed(nodes remove) only if tree performs worse than the original.

*Remember to give this post some *👏 *if you liked it. Follow me for more content.*

Visit my github repo: