Back to Projects
Machine Learning

Retail Anomaly Detection

"Identifying unusual transaction patterns in e-commerce data using ML and statistical methods"

~5%
Anomaly Rate
0
Detection Methods
0
Transactions
High
Model Precision

░▒▓ Overview

E-commerce platforms process thousands of transactions daily. Identifying anomalous transactions - whether from fraud, data entry errors, or unusual customer behavior - is critical for fraud prevention, data quality, and business intelligence.

This project implements a multi-method anomaly detection system that combines machine learning (Isolation Forest) with statistical approaches (IQR, Z-Score) to flag transactions that warrant further investigation, using the Brazilian E-Commerce Public Dataset by Olist.

░▒▓ Methodology

[D]
Load
Import Olist data
[J]
Join
Merge tables
[F]
Features
Engineer variables
{ML}
Detect
Run algorithms
[V]
Validate
Cross-validate
[#]
Visualize
Dashboard

░▒▓ Detection Methods

ML
{T}

Isolation Forest

Unsupervised algorithm that isolates outliers by randomly selecting features and split values. Anomalies require fewer splits to isolate.

contamination = 0.05
Statistical
[Q]

IQR Method

Flags values outside the interquartile range boundaries, robust to extreme outliers that can skew mean-based methods.

[Q1 - 1.5×IQR, Q3 + 1.5×IQR]
Statistical
[Z]

Z-Score

Measures how many standard deviations a value is from the mean. Flags extreme deviations for normally distributed data.

|z| > 3 standard deviations

░▒▓ Key Findings

~5%
High-Value Outliers
Transactions flagged as anomalous with average order value significantly higher than normal orders.
Geo
Regional Patterns
Certain states show higher anomaly rates, warranting geographic-specific investigation.
2-5AM
Time-Based Anomalies
Orders placed during unusual hours show different characteristics and higher anomaly flags.
10+
Payment Patterns
High installment counts correlate with higher anomaly flags, indicating potential risk.

░▒▓ Dashboard Features

[#]

KPI Cards

Total orders, anomaly count, and revenue at risk displayed prominently for quick executive insights.

[~]

Distribution Analysis

Order amount histogram with anomaly overlay showing the distribution of normal vs flagged transactions.

[/]

Time Series

Monthly trends with dual-axis visualization showing both transaction volume and anomaly rate over time.

[M]

Geographic View

State-level anomaly breakdown enabling regional pattern identification and targeted investigation.

[T]

Drill-Down Table

Detailed view of flagged transactions with filters by state, anomaly type, and amount range for investigation.

░▒▓ Tech Stack

ML/Analysis

Python
Core programming language for analysis
scikit-learn
Isolation Forest and ML utilities
Pandas
Data manipulation and analysis

Data

DuckDB
In-process SQL analytics database
SQL
Data cleaning and aggregation queries

Visualization

Streamlit
Interactive dashboard framework
Plotly
Interactive charts and maps

░▒▓ My Role

>_

Sole Developer

Designed and implemented the complete anomaly detection pipeline from data ingestion to visualization, including all ML models and statistical methods.

{ML}

ML Model Selection

Evaluated multiple anomaly detection approaches and selected Isolation Forest for its effectiveness on high-dimensional transactional data without labeled examples.

[Q]

Multi-Method Validation

Implemented cross-validation using statistical methods (IQR, Z-Score) to validate ML results and increase confidence in anomaly flags.