๐ฏ What You'll Learn
- Identify and explain all six categories of data types Power BI supports in 2026
- Distinguish structured, semi-structured, and unstructured data with real-world examples
- Describe the roles of Data Lakes, Data Warehouses, and the Lakehouse hybrid
- Explain what Microsoft Fabric and OneLake are and why they matter for reporting
- Apply the right storage and connection strategy (Import vs. DirectLake) based on architecture
๐ Before You Begin
- Basic familiarity with what Power BI is (you've seen a dashboard before)
- Understanding of what a spreadsheet / table looks like (rows & columns)
- No coding knowledge required โ this is a conceptual overview
๐ Table of Contents
- Part 1: The Power BI Data Universe Sections 1โ6
- โ Structured Data โ The Foundation ~3 min
- โก Semi-Structured Data โ Web & APIs ~3 min
- โข Unstructured Data โ The AI Frontier ~3 min
- โฃ Spatial & Geographic Data ~2 min
- โค Vector Data โ LLM/AI Engine ~2 min
- โฅ Streaming & Real-Time Data ~2 min
- Part 2: The Modern Data Ecosystem Sections 7โ11
- โฆ Data Lakes vs. Warehouses vs. Lakehouse ~3 min
- โง Microsoft Fabric โ The All-in-One Platform ~3 min
- โจ Domains, Workspaces & Data Mesh ~2 min
- โฉ Practice Quiz 10 questions
Part 1 โ The Power BI Data Universe
In 2026, Power BI has evolved into a massive Data Mesh hub. It no longer just handles spreadsheets โ it digests everything from satellite imagery to AI-generated vectors. Here's every data category it supports.
โ Structured Data โ The Foundation
Structured data is highly organised, living in rows and columns. This is what most users work with 90% of the time. It comes from sources like SQL Server, Excel, and Snowflake.
- Decimal Number โ for values like
$1.23or3.14 - Fixed Decimal (Currency) โ always 4 decimal places, ideal for financial data
- Whole Number โ integers only, e.g.
100,42
- Date โ day/month/year only
- Time โ hours, minutes, seconds
- DateTime โ combined date and time
- DateTimeZone โ datetime with timezone offset, essential for global teams
- Text (String) โ standard alphanumeric data like names, IDs, descriptions
- Boolean โ True/False values used for logic gates and filters (e.g. "Is Active?")
โก Semi-Structured Data โ Web & APIs
Semi-structured data doesn't fit a rigid table, but has tags or markers to separate elements. This is the language of the web and modern applications.
{ } JSON
JavaScript Object Notation. Power BI can "flatten" complex JSON hierarchies from APIs like GitHub or Salesforce into usable tables.
</> XML
Extensible Markup Language. Common in legacy enterprise systems. Same flattening capability as JSON.
๐ PDF Data
Using AI, Power BI "scrapes" tables from PDFs โ identifying headers and rows even in scanned documents.
{"user": {"name": "Raj", "city": "Delhi"}} becomes two flat columns: user.name and user.city in Power BI's query editor.
โข Unstructured Data โ The AI Frontier
Unstructured data has no pre-defined model. In 2026, Power BI handles this via Azure Cognitive Services and Fabric AI.
You can store images directly in a Power BI dataset. Vision AI then automatically tags those images โ for example, classifying warehouse photos as "Truck" vs. "Shelf". OCR (Optical Character Recognition) can read text out of photos, converting them into searchable data.
For emails, support tickets, or customer reviews, Power BI uses Sentiment Analysis to convert a block of text into a numerical "Happiness Score" between 0 and 1.
| Customer Review | Sentiment Score |
|---|---|
| "Excellent service, very happy!" | 0.93 ๐ |
| "Delivery was late again" | 0.41 ๐ |
| "Worst experience ever, never again" | 0.04 ๐ |
โฃ Spatial & Geographic Data
Power BI treats location data as a first-class citizen, offering multiple spatial formats.
28.6139ยฐ N, 77.2090ยฐ E (New Delhi). Used for point-based maps โ plotting store locations, delivery points, etc.โค Vector Data โ The LLM/AI Engine
With Generative AI, Vector Data has become critical. Vectors are numerical representations of meaning โ called embeddings.
Search for "Profit" โ only finds rows containing the exact word "Profit"
Ask "How is our financial health?" โ finds data about revenue, margins, costs, losses
Vectors live in a Vector Store inside Microsoft Fabric. Power BI queries this store to surface AI-powered insights based on meaning, not just keywords. External stores like Pinecone are also supported.
โฅ Streaming & Real-Time Data
Not all data is "at rest." Some data is "in flight" โ arriving continuously in real time.
โก Push API
Data pushed directly into Power BI in sub-second intervals. Perfect for stock prices, IoT sensors, or factory machines.
๐ก PubNub
Power BI connects to live event streams. Dashboards update automatically โ no "Refresh" button needed.
๐ Kusto (ADX)
Azure Data Explorer โ optimised for high-volume log analytics and telemetry in real-time.
๐ Part 1 Summary โ Data Coverage
| Category | Data Types | Primary Source |
|---|---|---|
| Structured | Integer, Currency, Date, Text, Boolean | SQL, Excel, Snowflake |
| Semi-Structured | JSON, XML, PDF | APIs, Web Pages |
| Unstructured | Binary, Image, Long-form Text | Blob Storage, OneDrive |
| Spatial | Latitude, KML, Shapefile, GeoJSON | GIS Systems, Azure Maps |
| Vector / AI | Embeddings, Vector Stores | Microsoft Fabric, Pinecone |
| Streaming | Push API, Event Stream | PubNub, Kusto, IoT Hub |
Part 2 โ The Modern Data Ecosystem
To excel at data visualisation, you must understand where data lives, how it is organised, and how it flows into your reports. The terminology has shifted from "spreadsheets" to complex "architectures."
โฆ Data Lakes vs. Warehouses vs. Lakehouse
Every visualisation starts with storage. Here's how the three main storage paradigms compare:
| Storage Type | What It Holds | Structure | Best For |
|---|---|---|---|
| ๐ Data Lake | Everything raw โ structured, semi, unstructured | None (schema-on-read) | Storing data now, figuring out meaning later |
| ๐ Data Warehouse | Cleaned, processed, organised data | Rigid (schema-on-write) | Specific business reporting (e.g. Finance) |
| ๐ Data Lakehouse | Both raw and processed data | Flexible hybrid | Modern analytics at scale โ best of both |
โง Microsoft Fabric โ The All-in-One Platform
Microsoft Fabric is the biggest shift in the ecosystem. It's an end-to-end SaaS platform that unifies data engineering, warehousing, science, and visualisation under one roof.
โ๏ธ OneLake
Think "OneDrive for data." All your organisation's data โ regardless of team โ lives in a single unified lake. One copy, governed centrally.
โก DirectLake Mode
A breakthrough: Power BI reads data directly from the lake without importing it, enabling lightning-fast reports on billions of rows.
โจ Domains, Workspaces & Data Mesh
As organisations grow, they need to divide data logically to avoid a "Data Swamp." Here's the three-tier hierarchy:
| Concept | Simple Definition | Role in Visualisation |
|---|---|---|
| Fabric | The entire platform | The "workspace" where you build everything |
| OneLake | Central storage | Where your raw materials are kept |
| Domain | Business department | Defines who owns and validates the data |
| Workspace | Project folder | Where you collaborate with teammates |
| Lakehouse | Structured data lake | Provides clean data for your charts |
โฉ Practice Quiz โ Test Your Knowledge
10 questions covering both parts. At least 6 test practical application.
๐ Key Takeaways
- Power BI handles 6 data categories: Structured, Semi-structured, Unstructured, Spatial, Vector, and Streaming.
- The VertiPaq engine compresses and optimises all these types for instant visualisation.
- A Data Lake stores everything raw; a Data Warehouse stores clean, structured data; a Lakehouse combines both.
- Microsoft Fabric with OneLake is the modern unified platform replacing fragmented data stacks.
- DirectLake Mode lets Power BI query billions of rows live โ no import, no stale data.
- Data Mesh is a strategy that gives domain teams ownership of their own data products.