{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9eaa5a85",
   "metadata": {},
   "source": [
    "<h3>This tutorial is prepared by Dr. Sanasam Ranbir Singh </h3>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b0da4a3",
   "metadata": {},
   "source": [
    "# Classification using scikit-learn python library\n",
    "\n",
    "[Scikit-learn (or sklearn)](https://scikit-learn.org/stable) is a free software machine learning library for the Python programming language. It supports various machine learning methods such as [feature selection](https://scikit-learn.org/stable/modules/feature_selection.html), [classification](https://scikit-learn.org/stable/supervised_learning.html), [regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html), and [clustering](https://scikit-learn.org/stable/modules/clustering.html).\n",
    "\n",
    "In this lesson, we learn how to build various classification models using <b>sklearn</b> library.\n",
    "\n",
    "How to install <b>sklearn</b> library? [install](https://scikit-learn.org/stable/install.html). <br>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32e6667d",
   "metadata": {},
   "source": [
    "# Naive Bayes Classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "18464a89",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1]\n"
     ]
    }
   ],
   "source": [
    "from sklearn.naive_bayes import GaussianNB     # import Naive Bayes classifier with Gaussian Kernal.\n",
    "import numpy as np                             # import numpy for performing various mathematical functions\n",
    "\n",
    "#Define dataset\n",
    "X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])  # Sample Vector\n",
    "Y = np.array([1, 1, 1, 2, 2, 2])   # labels\n",
    "\n",
    "#Define the classification Model\n",
    "model = GaussianNB()\n",
    "\n",
    "# fit the dataset into the model\n",
    "# Reader is also advise to check (https://scikit-learn.org/0.15/modules/scaling_strategies.html), if you are using large dataset\n",
    "model.fit(X, Y)\n",
    "\n",
    "# predict the output of a random sample [-0.8, -1]\n",
    "print(model.predict([[-0.8, -1]]))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ed50608",
   "metadata": {},
   "source": [
    "# K Nearest Neighbors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ef341579",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1]\n"
     ]
    }
   ],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "import numpy as np                             # import numpy for performing various mathematical functions\n",
    "\n",
    "#Define dataset\n",
    "X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])\n",
    "Y = np.array([1, 1, 1, 2, 2, 2])\n",
    "\n",
    "#Define the classification Model\n",
    "model = KNeighborsClassifier(n_neighbors=3)\n",
    "\n",
    "# fit the dataset into the model\n",
    "model.fit(X, Y)\n",
    "\n",
    "# predict the output of a random sample [-0.8, -1]\n",
    "print(model.predict([[-0.8, -1]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f61dd37",
   "metadata": {},
   "source": [
    "# Decision Tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "b0850028",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1]\n"
     ]
    }
   ],
   "source": [
    "from sklearn.tree import DecisionTreeClassifier\n",
    "import numpy as np                             # import numpy for performing various mathematical functions\n",
    "\n",
    "#Define dataset\n",
    "X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])\n",
    "Y = np.array([1, 1, 1, 2, 2, 2])\n",
    "\n",
    "#Define the classification Model\n",
    "model = KNeighborsClassifier(n_neighbors=3)\n",
    "\n",
    "# fit the dataset into the model\n",
    "model.fit(X, Y)\n",
    "\n",
    "# predict the output of a random sample [-0.8, -1]\n",
    "print(model.predict([[-0.8, -1]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca7db816",
   "metadata": {},
   "source": [
    "# Support Vector Machine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3f4aaf1d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1]\n"
     ]
    }
   ],
   "source": [
    "from sklearn import svm\n",
    "import numpy as np                             # import numpy for performing various mathematical functions\n",
    "\n",
    "#Define dataset\n",
    "X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])\n",
    "Y = np.array([1, 1, 1, 2, 2, 2])\n",
    "\n",
    "#Define the classification Model\n",
    "model = svm.SVC()\n",
    "\n",
    "# fit the dataset into the model\n",
    "model.fit(X, Y)\n",
    "\n",
    "# predict the output of a random sample [-0.8, -1]\n",
    "print(model.predict([[-0.8, -1]]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f486f9f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import tree\n",
    "\n",
    "model = tree.DecisionTreeClassifier()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}