A comprehensive visual guide to every major analytical method — from basic descriptive stats to advanced machine learning techniques.
Data analytics spans a vast spectrum — from simple counting to deep neural networks. Understanding which type of analysis to apply, and when, is one of the most valuable skills a data professional can have.
How the 100 types are organized across 10 domains
Each domain plotted by implementation difficulty and depth
From "what happened?" to "what should we do?"
Most commonly used analysis types across industries
Typical Analytics Pipeline
The core analytical hierarchy every data professional must master — from understanding the past to recommending the future.
Summarizes historical data using measures like mean, median, mode, and standard deviation to understand what has already happened.
An open-ended investigation using visual and statistical techniques to uncover patterns, spot anomalies, and test assumptions before formal modeling.
Drills into data to identify the root cause of a specific outcome. Answers: "Why did this happen?"
Uses statistical models and machine learning to forecast future outcomes based on historical patterns and trends.
The highest tier — recommends specific actions to achieve desired outcomes, often using optimization and simulation techniques.
Draws conclusions about a large population by analyzing a representative sample, using probability theory and statistical tests.
Establishes genuine cause-and-effect relationships — not just correlations — using controlled experiments or quasi-experimental methods.
Explains the precise biological, chemical, or physical mechanism behind an observed relationship — common in scientific research.
Rigorous mathematical methods for understanding data relationships, distributions, and significance — the backbone of evidence-based decision making.
Models the relationship between a dependent variable and one or more independent variables to predict outcomes and quantify relationships.
Measures the strength and direction of the linear relationship between two variables using Pearson's r or Spearman's rho coefficients.
Formal statistical framework to test assumptions about data using p-values, t-tests, chi-square tests, ANOVA, and other procedures.
Compares actual results to planned or expected results, breaking down differences into price variance, volume variance, and efficiency variance.
Examines the shape, spread, and statistical properties of data — testing for normality, skewness, and kurtosis to inform analytical approach.
Divides data into equal-frequency groups (quartiles, deciles, percentiles) to understand distribution and identify threshold effects.
Counts occurrences of each value or category, producing frequency tables and histograms to understand data distribution across categories.
Creates contingency tables showing joint frequency of two or more categorical variables, often with chi-square independence tests.
Measures how a variable correlates with its own past values (lags), essential for identifying temporal patterns before time series modeling.
Detects when independent variables in a regression model are highly correlated, inflating standard errors and destabilizing coefficients.
Algorithmic approaches that let computers learn patterns from data to make predictions, classifications, and generate insights at scale.
Unsupervised learning that groups data points into clusters based on similarity, with no predefined labels.
Supervised learning that assigns each data point to one of several predefined categories using trained models like decision trees, SVMs, or neural networks.
Identifies data points that deviate significantly from expected patterns — used for fraud detection, quality control, and system monitoring.
Dimensionality reduction that transforms correlated high-dimensional data into uncorrelated principal components retaining maximum variance.
Identifies underlying latent factors explaining observed correlations — widely used in psychology, social science, and survey research.
Techniques like t-SNE, UMAP, and autoencoders that compress high-dimensional data for visualization and modeling.
Analyzes time until an event occurs (death, churn, failure), handling censored data where the event hasn't yet happened.
Analyzes outcomes from agents that learn by trial-and-error, optimizing for cumulative reward — used in robotics, game AI, and dynamic pricing.
Combines multiple models (bagging, boosting, stacking) to improve prediction accuracy and reduce overfitting beyond any single model.
Interprets "black box" ML models to understand which features drive predictions — critical for regulated industries and trust-building.
Techniques focused on understanding how users, customers, and cohorts behave over time — the foundation of modern product and growth analytics.
Groups users by a shared characteristic or start date and tracks behavior over time, revealing how different acquisition periods perform.
Maps conversion rates across sequential steps in a user journey, identifying where and why users drop off at each stage.
Identifies customers likely to stop using a product or service and models the key drivers of departure.
Segments customers by Recency, Frequency, and Monetary value to prioritize marketing efforts and personalize outreach.
Projects total net profit from a customer across their entire relationship with the business, informing acquisition and retention investment.
Studies patterns in how users interact with products — clicks, scrolls, session duration, feature usage — to identify friction and opportunity.
Analyzes the sequential path of clicks a user makes through a website or app, revealing navigation patterns and common journeys.
Maps the complete end-to-end experience across all touchpoints and channels from awareness to advocacy.
Segments data by demographic characteristics to understand how different groups behave differently.
Applies data analysis to HR data — hiring, performance, attrition, engagement — to optimize people decisions.
Techniques for extracting structured insight from unstructured text, audio, and language data — powering everything from chatbots to market research.
Uses NLP to determine emotional tone (positive, negative, neutral) of text at document, sentence, or aspect level.
Extracts meaningful patterns and structured information from large collections of unstructured text using statistical and linguistic methods.
Unsupervised technique (LDA, NMF) that discovers hidden thematic structure in a document corpus without predefined categories.
Assigns text documents to predefined categories using trained ML models, automating manual tagging at scale.
Identifies and classifies named entities (people, organizations, locations, dates) in text — enabling structured extraction from documents.
Systematic, replicable technique for categorizing and quantifying content in text, images, or media for research.
Mines social platforms for brand mentions, trending topics, influencer networks, and public opinion at scale.
Extracts insights from audio recordings through speech-to-text, speaker identification, emotion detection, and acoustic feature analysis.
Uses vector embeddings to find semantically similar content beyond keyword matching, enabling conceptual search across large corpora.
Automatically condenses long documents into key points using extractive or abstractive summarization techniques.
Methods for analyzing data that unfolds across time — identifying trends, seasonality, cycles, and structural changes in temporal datasets.
Analyzes sequentially ordered data to decompose trends, seasonality, and residuals — enabling forecasting and anomaly detection.
Identifies consistent directional movement in data over time, distinguishing genuine trends from noise using moving averages.
Predicts future demand for products or services using historical data, promotional calendars, economic indicators, and ML models.
Studies the same subjects across multiple time points to track change over long periods, controlling for individual differences.
Studies multiple subjects at a single point in time — a snapshot establishing current-state benchmarks and cross-group comparisons.
Combines longitudinal and cross-sectional dimensions, enabling fixed-effects and random-effects econometric models.
Measures the impact of a specific event or policy change on a time series, modeling the step-change effect of an intervention.
Studies whether changes in one variable consistently precede changes in another by a predictable time offset — identifying leading indicators.
Identifies and quantifies regular periodic patterns in time series data, separating seasonal effects from underlying trend.
Analyzes data patterns triggered by specific events, studying behavior before and after defined trigger points.
Quantitative frameworks for evaluating business performance, financial health, strategic decisions, and operational efficiency.
Evaluates financial health by computing liquidity, profitability, efficiency, and leverage ratios from financial statements.
Calculates the exact point where total revenue equals total costs, establishing the minimum sales volume needed for profitability.
Visualizes how sequential positive and negative values contribute to a final total, tracing the path from starting to ending value.
Evaluates a collection of assets, products, or business units to optimize allocation, balance risk, and maximize portfolio-level returns.
Identifies, quantifies, and prioritizes risks using probability-impact matrices, value-at-risk models, and stress testing.
Calculates time required for an investment to recover its initial cost from net cash inflows — a simple, widely-used capital budgeting tool.
Systematically compares current state to desired state, quantifying gaps in performance, capability, or market position.
Applies the 80/20 rule to identify the vital few causes responsible for the majority of effects, focusing improvement efforts optimally.
Compares performance metrics against industry standards, best-in-class competitors, or internal best performers.
Optimizes procurement, inventory, logistics, and supplier relationships using data to reduce costs and increase reliability.
Techniques for analyzing data with geographic, spatial, or relational network structure — unlocking insights invisible in tabular data.
Analyzes data with geographic coordinates to reveal spatial patterns, proximity effects, and location-based insights using GIS tools.
Examines relationships between geographic entities — distance, adjacency, containment, and spatial autocorrelation (Moran's I).
Identifies statistically significant geographic concentrations of events using Kernel Density Estimation and Getis-Ord Gi* statistics.
Studies entities (nodes) and their relationships (edges) using centrality, clustering, path length, and community detection metrics.
Uses graph algorithms (Dijkstra, A*, VRP solvers) to find optimal paths through transportation networks under constraints.
Extracts structured information from images and video using deep learning — classification, detection, segmentation, and tracking.
Defines the geographic catchment area of a retail location or service, measuring trade area penetration and competitive overlap.
Measures distances between geographic features to understand spatial relationships — buffer zones, nearest neighbor, and service area coverage.
Data-driven frameworks for measuring marketing effectiveness, optimizing product experiences, and attributing value across customer touchpoints.
Compares two variants through controlled experiments with proper statistical power, significance testing, and effect size estimation.
Tests multiple page elements simultaneously using fractional factorial designs to find the best-performing combination.
Assigns credit for conversions across marketing touchpoints using models from last-click to data-driven multi-touch attribution.
Uses association rule mining (Apriori, FP-Growth) to find products frequently purchased together, driving cross-sell strategies.
Measures how demand changes in response to price changes, informing optimal pricing strategy and revenue maximization.
Analyzes website traffic, user behavior, content performance, and conversion paths using tools like GA4, Mixpanel, and Amplitude.
Studies user interaction with software products — feature adoption, engagement depth, north star metrics, and activation funnels.
Measures the true incremental causal impact of a treatment, enabling targeting of "persuadable" customers only.
Reveals how customers value different product attributes by analyzing choices among product profiles and estimating willingness-to-pay per feature.
Sophisticated techniques from econometrics, operations research, Bayesian statistics, and simulation — for complex, high-stakes problems.
Updates probability estimates as new evidence arrives, combining prior beliefs with observed data — ideal for small samples and sequential learning.
Models probability distribution of outcomes by running thousands of random simulations, generating ranges rather than point estimates.
Tests how robust model outputs are to changes in input assumptions, identifying which variables most influence outcomes.
Evaluates multiple distinct future scenarios (bull/base/bear) under explicit assumptions, stress-testing strategies against plausible futures.
Applies statistical methods to economic data to establish causal relationships, test economic theories, and evaluate policy impacts.
Controls for selection bias in observational studies by matching treated and control units on their probability to receive treatment.
Tests complex theoretical models involving multiple dependent variables, latent constructs, and mediated effects simultaneously.
Finds the best solution from a set of feasible alternatives using linear programming, integer programming, or meta-heuristics.
Structures complex decisions under uncertainty using decision trees, expected utility theory, and multi-criteria frameworks.
Uses Statistical Process Control — control charts, capability indices (Cpk), and FMEA — to monitor and improve process quality.
Models complex systems with interacting components using discrete-event, agent-based, or system dynamics simulation.
Assesses the internal consistency and reproducibility of measurement instruments — surveys, tests, and multi-item scales.
Uses ROC curves and precision-recall trade-offs to tune binary classifiers, balancing false positive vs. false negative costs.
Systematically identifies the deepest root cause of a problem using 5 Whys, fishbone diagrams, fault trees, and Pareto charts.
Assigns weights to multiple criteria to objectively rank and select among alternatives, bringing rigor to multi-factor decisions.
A quick-reference guide to the most important analysis types — their complexity, typical tools, and best use cases.
| # | Analysis Type | Category | Complexity | Primary Question | Common Tools |
|---|---|---|---|---|---|
| 01 | Descriptive | Foundational | Low | What happened? | Excel, SQL, Tableau |
| 02 | EDA | Foundational | Low | What patterns exist? | Python, R, Jupyter |
| 03 | Diagnostic | Foundational | Medium | Why did it happen? | SQL, Python, BI Tools |
| 04 | Predictive | Foundational | High | What will happen? | Scikit-learn, XGBoost |
| 05 | Prescriptive | Foundational | High | What should we do? | OR-Tools, Gurobi |
| 09 | Regression | Statistical | Medium | How are variables related? | R, Python statsmodels |
| 11 | Hypothesis Testing | Statistical | Medium | Is this result significant? | R, SciPy, SPSS |
| 19 | Clustering | ML & AI | Medium | What groups exist? | Scikit-learn, H2O |
| 21 | Anomaly Detection | ML & AI | High | What's unusual? | Isolation Forest, PyOD |
| 29 | Cohort Analysis | Behavioral | Low | How do groups change? | SQL, Mixpanel, Amplitude |
| 30 | Funnel Analysis | Behavioral | Low | Where do users drop off? | GA4, Mixpanel, SQL |
| 32 | RFM Analysis | Behavioral | Low | Who are our best customers? | SQL, Python, Excel |
| 39 | Sentiment Analysis | NLP | Medium | How do people feel? | VADER, BERT, HuggingFace |
| 49 | Time Series | Time-Based | High | What are the trends? | Prophet, ARIMA, statsmodels |
| 59 | Financial Ratio | Business | Low | Is the business healthy? | Excel, SQL, Bloomberg |
| 69 | Geospatial | Spatial | Medium | Where is it happening? | QGIS, GeoPandas, ArcGIS |
| 77 | A/B Testing | Marketing | Medium | Which variant is better? | Optimizely, VWO, Python |
| 86 | Bayesian Analysis | Advanced | High | How do beliefs update? | PyMC, Stan, JAGS |
| 87 | Monte Carlo | Advanced | High | What's the range of outcomes? | Python, @RISK, Crystal Ball |
| 100 | Weighted Scoring | Advanced | Low | Which option ranks best? | Excel, Python, Decision Tools |