# How Random Forest Works?

Random Forests: The name can be split into 2 parts Random and Forests.

# What is validation?

We build models using different algorithms and often, the model construct is to learn the data and be able to predict the target variable. Here model is nothing but a mathematical equation that fits the data or even a tree that is built and pruned on data.

How do we…

# What are Tree models?

Tree models are supervised models, that splits the population into subsets based on best splitter or feature. This will create branches for predicting the target variable.

# How Tree models work?

Tree models like Random Forest, XGBoost take a subset of features and subset of data to create many trees and if it is Random…

# What is Pyspark?

As a data scientist, at times it is needed to have some data engineering skills when it comes to creating required features which may span some millions of records. So, with limited VM resources, it becomes difficult to load entire data in python and work. …

What is p-value in layman terms?

In statistical modelling or analysis, we come across this term called as p-value. It becomes important for data scientists or consultant to explain business what exactly is p-value.

Whenever we model, it is very important to explain my confidence about my analysis. In these scenarios p-value helps us understanding the confidence.

p-value is basically the probability that null hypothesis is true. If p-value > 0.05, then we failed to reject null hypothesis by a probability of p.

This is just a simple explanation on p-value.

# MAPE vs sMAPE — When to choose what?

When dealing with regression or forecasting problems, general metrics that we use are RMSE, MAE etc. RMSE, MAE are good if we are dealing with in small ranges and if the variation is very less. …

# PCA — Dimensionality Reduction (Simple numerical example)

Principal Component Analysis (PCA) is one of the widely used dimensionality reduction techniques. …

# How Decision Tree Works (Part -1)

One of the basic machine learning algorithm which every data scientist will use it at least once in lifetime is Decision Tree. The usage might be direct or indirect(Latest algorithms like Random Forest, xgb etc), but the internal working is based on this base funda called as Decision Tree.

Before…

# WOE-IV Understanding and Uses

Mathematically speaking, all feature selection methods try to see how good the feature explains the target. One such estimate is to use Weight Of Evidence and Information Value.

Let’s break the words weight of evidence and information value into smaller pieces that helps us understanding them more. In Weight of…

# Shapley Feature Importance and Understanding

As there are new algorithms coming into market frequently that are changing the whole format of data science modelling in terms of accuracy, but on the other side the explainability of models to business becomes very tricky.

If we are using linear models, then we can at bare minimum can… ## Thiruthuvaraj Rajasekhar

Mining Data For Benefits