If you fail to plan, you plan to fail. Every project requires planning. Building a machine learning model is no different. In this article, we will learn how to plan your data mining activities and what are the steps you should perform during Exploratory Data Analysis (EDA). This article is not a ‘how-to’ guide but a reference checklist for data analytics professionals. It will provide you with a list of considerations when building a machine learning model.

We have all heard about CRISP-DM: Cross Industry Standard Process for Data Mining. …

**Simple linear regression suffers from two major flaws:**

- It’s prone to overfitting with many input features and,
- It cannot easily express non-linear/curvy relationships.

One way to tackle these issues is by increasing the model complexity. Model complexity can be increased by using Decision trees and Polynomial regression to represent non-linear relationships.

These algorithms are also prone to overfitting due to increasing complexity. Therefore, in order to represent non-linear functions without overfitting we make use of regularization techniques.

Regularization techniques are used to calibrate the linear/non-linear regression models in order to minimize the adjusted loss function and prevent overfitting.

The two…

Advances in smart assistants like Alexa and Google have brought remarkable convenience into our day to day lives. e.g. seeking a quick weather report, translating languages, listening to world news, and today you can also send virtual hugs to your Alexa contacts. With recent Artificial Intelligence (AI) breakthroughs like AlphaGo, IBM Watson, self-driving cars, and many more, the concern of AI taking over our jobs is real.

Can you imagine the impact of these applications on humans as they advance? Eventually, everything would be done for you by an AI. Now, the question is, what value would you be adding…

In this article we will find answers to the following questions:

- What is a Z-score — Formula and definition.
- How to use Z-score using a toy example.

History: The letter **‘Z’ **in z-score stands for **Zeta** (6th letter of the Greek alphabet) which comes from the Zeta Model that was originally developed by **Edward Altman **to estimate the chances of a public company going bankrupt. Z-scores exist in zones of probability, which indicates the likelihood of a public company going bankrupt.

- z < 1.81 - Distress “Zone”
- 1.81 < z< 2.99 - Grey “Zone”
- z > 2.99 - Safe “Zone”

…

Normalization vs Standardization

In this article we will discover answers to the following questions:

- What is feature scaling and why it is required in Machine Learning (ML)?
- Normalization — pros and cons.
- Standardization — pros and cons.
- Normalization or Standardization. Which one is better.

First things first, let’s hit up an analogy and try to understand why we need feature scaling. Consider building a ML model similar to making a smoothie. And this time you are making a *strawberry-banana* smoothie. Now, you have to carefully mix strawberries and bananas to make the smoothie taste good. If you just mix *one*…

If you are reading this, I am assuming you already know what encoding means. Nevertheless, I’ll give a brief intro for those who are new to data science.

Note — Throughout this article, the terms; features, columns and variables have been used interchangeably.

Data is classified as below:

1. What is Categorical Encoding and why do you need it?

For some Machine Learning algorithms, whenever you have categorical data, you have to convert it to numerical type. The reason you convert categorical columns to numerical is so that the machine learning algorithm is able to understand and process it. …

Machine Learning is an art of teaching computers or letting them learn patterns from the data. This is often done because it helps in making informed decisions and sometimes accurate predictions. I promise that’ all the definition you’ll have in this article.

Always remember, if you can hit upon an analogy of what the unknown concept is like, you are half way there. The rest is to practice explaining to a 6 year old. So let’s take an easy example to understand ML:

**Learning the pattern**

Let’s say a baby panda while playing in the forest sees a burning bright…

Data Scientist and Project Management Professional at Government of Canada. Visit https://swapnilklkar.github.io for more.