Maths and statistics are powerful tools in the world of data science. Math and Statistics are essential because these two fields form the basics of all the machine learning algorithms. And in order to succeed as a Data Scientist, you must know your basics.
Statistics is the use of maths to perform technical analysis on the data to gain meaningful insights. With statistics, we can operate on the data in an information-driven and targeted manner.
So, how is data science different from statistics? While the fields are closely related in the sense that both data scientists and statisticians aim to…
An activation function is an internal state of a neuron that converts an input signal to an output signal.
Basically, a neuron calculates the weighted sum of its inputs, adds the bias, and then inputs the values to the activation function which decides whether it should spit an output or not. Activation functions provide non-linear properties to the neural network. Without the activation function, the output values from the neurons can range between (- infinity) to (+infinity).
We are all aware of feature scaling and why it’s done. Feature scaling is performed during data pre-processing and is done to normalize/standardize…
Polynomial Regression | Data Science | Machine Learning
In this article we will learn what is Bayesian Information Criterion (BIC) and how it is used to choose the degree of a polynomial in a Polynomial Regression.
Sometimes R2 values vary slightly across two different degrees of polynomials. i.e. comparing a R2 score = 88.3% to R2 score = 88.4%. Also, how do we know which is better. R2=88% or R2=90% ?
Let’s study this by creating some dummy data:
Let’s fit the model with Ordinary Least Square (OLS). This package provides detailed stats summary like AIC, BIC etc.
If you fail to plan, you plan to fail. Every project requires planning. Building a machine learning model is no different. In this article, we will learn how to plan your data mining activities and what are the steps you should perform during Exploratory Data Analysis (EDA). This article is not a ‘how-to’ guide but a reference checklist for data analytics professionals. It will provide you with a list of considerations when building a machine learning model.
We have all heard about CRISP-DM: Cross Industry Standard Process for Data Mining. …
Simple linear regression suffers from two major flaws:
One way to tackle these issues is by increasing the model complexity. Model complexity can be increased by using Decision trees and Polynomial regression to represent non-linear relationships.
These algorithms are also prone to overfitting due to increasing complexity. Therefore, in order to represent non-linear functions without overfitting we make use of regularization techniques.
Regularization techniques are used to calibrate the linear/non-linear regression models in order to minimize the adjusted loss function and prevent overfitting.
Advances in smart assistants like Alexa and Google have brought remarkable convenience into our day to day lives. e.g. seeking a quick weather report, translating languages, listening to world news, and today you can also send virtual hugs to your Alexa contacts. With recent Artificial Intelligence (AI) breakthroughs like AlphaGo, IBM Watson, self-driving cars, and many more, the concern of AI taking over our jobs is real.
In this article we will find answers to the following questions:
History: The letter ‘Z’ in z-score stands for Zeta (6th letter of the Greek alphabet) which comes from the Zeta Model that was originally developed by Edward Altman to estimate the chances of a public company going bankrupt. Z-scores exist in zones of probability, which indicates the likelihood of a public company going bankrupt.
Normalization vs Standardization
In this article we will discover answers to the following questions:
First things first, let’s hit up an analogy and try to understand why we need feature scaling. Consider building a ML model similar to making a smoothie. And this time you are making a strawberry-banana smoothie. Now, you have to carefully mix strawberries and bananas to make the smoothie taste good. If you just mix one…
If you are reading this, I am assuming you already know what encoding means. Nevertheless, I’ll give a brief intro for those who are new to data science.
Note — Throughout this article, the terms; features, columns and variables have been used interchangeably.
Data is classified as below:
1. What is Categorical Encoding and why do you need it?
For some Machine Learning algorithms, whenever you have categorical data, you have to convert it to numerical type. The reason you convert categorical columns to numerical is so that the machine learning algorithm is able to understand and process it. …
Machine Learning is an art of teaching computers or letting them learn patterns from the data. This is often done because it helps in making informed decisions and sometimes accurate predictions. I promise that’ all the definition you’ll have in this article.
Always remember, if you can hit upon an analogy of what the unknown concept is like, you are half way there. The rest is to practice explaining to a 6 year old. So let’s take an easy example to understand ML:
Let’s say a baby panda while playing in the forest sees a burning bright…