#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Introduction to Machine Learning

Learn what Machine Learning is and how it works

Introduction to Machine Learning

What is Machine Learning?

Machine Learning (ML) is teaching computers to learn from data.

Instead of programming explicit rules, we show examples and the computer figures out patterns.

Traditional Programming vs ML

Traditional:

Rules + Data → Answer

Machine Learning:

Data + Answers → Rules (Model)

Types of Machine Learning

1. Supervised Learning

Learn from labeled data (we know the answers):

  • Classification: Predict categories

    • Is this email spam or not?
    • Is this tumor benign or malignant?
  • Regression: Predict numbers

    • What will the house price be?
    • How many sales next month?

2. Unsupervised Learning

Find patterns in unlabeled data:

  • Clustering: Group similar items
    • Customer segments
    • Similar documents

3. Reinforcement Learning

Learn by trial and error:

  • Game playing AI
  • Self-driving cars

The ML Workflow

1. Collect Data 2. Prepare Data (clean, transform) 3. Split Data (train/test) 4. Choose Model 5. Train Model 6. Evaluate Model 7. Improve & Repeat

Scikit-Learn Basics

The most popular ML library in Python:

code.py
# Install: pip install scikit-learn

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np

# Sample data
X = np.array([[1], [2], [3], [4], [5]])  # Features
y = np.array([2, 4, 6, 8, 10])           # Target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create and train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)
print(predictions)

Key ML Concepts

Features (X)

The input data used to make predictions:

  • Age, income, location (for loan approval)
  • Pixels (for image classification)
  • Words (for text classification)

Target (y)

What we want to predict:

  • Loan approved/rejected
  • Cat/dog
  • Spam/not spam

Training

Showing the model examples so it learns patterns:

code.py
model.fit(X_train, y_train)

Prediction

Using the trained model on new data:

code.py
predictions = model.predict(X_new)

Simple Classification Example

code.py
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import numpy as np

# Simple dataset: predict if someone buys product
# Features: age, income (in thousands)
X = np.array([
    [25, 40], [30, 50], [35, 60], [40, 70],
    [45, 80], [50, 90], [22, 30], [28, 35],
    [55, 95], [60, 100]
])

# Target: 1 = bought, 0 = didn't buy
y = np.array([0, 0, 1, 1, 1, 1, 0, 0, 1, 1])

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Train
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)
print(f"Predictions: {predictions}")
print(f"Actual: {y_test}")

# Accuracy
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.0%}")

Common ML Algorithms

AlgorithmTypeUse Case
Linear RegressionRegressionPrice prediction
Logistic RegressionClassificationYes/No decisions
Decision TreeBothEasy to interpret
Random ForestBothHigh accuracy
KNNBothSimple, no training
SVMBothComplex boundaries

Overfitting vs Underfitting

Overfitting

  • Model learns training data too well
  • Memorizes instead of generalizing
  • Poor on new data

Underfitting

  • Model is too simple
  • Doesn't capture patterns
  • Poor on all data

Goal: Find the right balance!

Complete Example

code.py
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# Load famous iris dataset
iris = load_iris()
X = iris.data
y = iris.target

print(f"Features: {iris.feature_names}")
print(f"Classes: {iris.target_names}")
print(f"Data shape: {X.shape}")

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = DecisionTreeClassifier(max_depth=3)
model.fit(X_train, y_train)

# Evaluate
train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)

print(f"\nTraining accuracy: {train_acc:.0%}")
print(f"Test accuracy: {test_acc:.0%}")

Key Points

  • ML learns patterns from data
  • Supervised: Has labels (classification, regression)
  • Unsupervised: No labels (clustering)
  • Split data into train and test sets
  • Use scikit-learn for ML in Python
  • Watch out for overfitting

What's Next?

Learn how to properly split data for training and testing.