Lesson 2: Prepare Data for Analysis with PowerBI
Introduction
For over two decades, Microsoft continues to make deep investments in enterprise business intelligence (BI). Azure Analysis Services (AAS) and SQL Server Analysis Services (SSAS) are based on mature BI data modeling technology used by countless enterprises. The same technology is also at the heart of Power BI data models.
Power BI offers you a choice when designing your model. You can use Power BI Desktop to develop your model, and you can develop it by using different frameworks. These frameworks help to deliver fast performance, near real-time results, or both.
This module introduces the frameworks, their benefits and limitations, and features to help optimize your models. Lastly, it provides you with guidance to help you choose the right framework and features for your project.
Learning objectives
By the end of this module, you’ll be able to:
- Describe Power BI model fundamentals.
- Determine when to develop an import model.
- Determine when to develop a DirectQuery model.
- Determine when to develop a composite model.
- Choose an appropriate Power BI model framework.
1.0 What is a Semantic Model in Power BI?
2.0 Schema Design
3.0 The Relationship Between Fact and Dimension Tables
4.0 Describe Power BI model fundamentals
Module Assessment
Chapter Quiz: Power BI Model Framework
Test your understanding of semantic models, schema design, and model frameworks
Section A: Semantic Models & Schema Design
Section B: Storage Modes & Model Frameworks
Section C: Analytic Queries & Performance
Section D: Model Framework Decision Making
Section E: Advanced Concepts
Section F: Practical Application
Answer Key & Scoring Guide
Section A Answers:
- B - The structure or blueprint behind data
- False - Single-table models are not optimal
- Star, fact, dimension
- B - Store quantitative data with foreign keys
- 1-B, 2-C, 3-A
Section B Answers:
- B - Import, DirectQuery, Dual
- True, True, True, False
- Sales: DirectQuery (real-time, huge data), Product: Dual/Import (small, stable)
Section C Answers:
- Filter, Group, Summarize
- B - 1 GB
- False - 48 times is for dedicated capacity, 8 times for shared
Section D Answers:
- B - Small to medium datasets
- C - Combine different storage modes
- DirectQuery, Import, Import, DirectQuery
Section E Answers:
- Sample: Star schema has flat dimension tables directly connected to fact tables. Snowflake schema normalizes dimensions into multiple related tables.
- B - No guaranteed "one" side between source groups
- False - It's the opposite
Section F Answer:
Sample recommendations:
- Daily Sales: DirectQuery (volume, real-time needs)
- Product Catalog: Dual (moderate size, shared across models)
- Store Info: Import (small, stable)
- Customer Data: Import/Dual (depends on size)
- Financial Targets: Import (small, quarterly updates)
📊 Scoring:
- 18-16 correct: Excellent - You have mastered Power BI model frameworks!
- 15-12 correct: Good - Solid understanding, review a few concepts
- 11-8 correct: Fair - Need to revisit key concepts
- Below 8: Review Required - Go through the chapter again
🎓 Congratulations on completing the Power BI Model Framework chapter!
📋 Detailed Answer Explanations
Section A: Semantic Models & Schema Design
Q1: What is a Semantic Model?
Answer: B - The structure or blueprint behind data that organizes it for meaningful visualizations
Explanation: A semantic model serves as the foundational structure that transforms raw data into a format optimized for analytics. It defines relationships, hierarchies, and business logic that make data understandable and queryable for report creation.
Q2: Single-table models are optimal
Answer: False
Explanation: While single-table models work for exploratory purposes, they lack the efficiency and flexibility of well-structured multi-table schemas. Optimal models follow star schema design with separate fact and dimension tables for better performance and maintenance.
Q3: Schema design terminology
Answer: Star schema, fact tables, dimension tables
Explanation: Star schema gets its name from the visual appearance - the central fact table resembles the center of a star, while dimension tables extend outward like star points, creating direct relationships without complex joins.
Q4: Fact Tables definition
Answer: B - Store quantitative data used for analysis with foreign keys to dimension tables
Explanation: Fact tables are the backbone of analytical models, containing measurable, numeric data (like sales amounts, quantities) and foreign keys that connect to dimension tables for context and filtering.
Q5: Fact Table Types
Answers: 1-B, 2-C, 3-A
Explanation:
- Transaction: Captures every individual event (each sale, each click)
- Periodic Snapshot: Takes a "picture" at regular intervals (month-end balances)
- Accumulating Snapshot: Tracks a process journey (order → ship → deliver)
Section B: Storage Modes & Model Frameworks
Q6: Three storage modes
Answer: B - Import, DirectQuery, Dual
Explanation: These three modes determine where and how data is stored and accessed:
- Import: Data copied into Power BI's memory
- DirectQuery: Data stays at source, queries sent live
- Dual: Power BI chooses optimal method per query
Q7: Storage mode characteristics
Answers: True, True, True, False
Explanation: The first three statements correctly describe each mode. Import mode does NOT provide automatic real-time updates - it requires scheduled refresh to get new data.
Q8: Scenario storage mode selection
Sample Answer:
- Sales Transactions: DirectQuery because 2 billion rows changing every minute requires real-time access without overwhelming memory
- Product Catalog: Import/Dual because it's small and stable, allowing fast query performance while supporting hybrid scenarios
Section C: Analytic Queries & Performance
Q9: Analytic query phases
Answer: Filter, Group, Summarize
Explanation: This sequence is logical - first narrow data (filter), then organize it (group), finally calculate results (summarize). This order ensures efficient processing and meaningful results.
Q10: Shared capacity dataset limit
Answer: B - 1 GB
Explanation: Power BI imposes this limit on shared capacity to ensure resource availability for all users. Premium capacity allows larger datasets (10GB+) with proper configuration.
Q11: Shared capacity refresh frequency
Answer: False
Explanation: Shared capacity allows up to 8 refreshes per day, while Premium/dedicated capacity supports up to 48 refreshes daily.
Section D: Model Framework Decision Making
Q12: When to choose Import Model
Answer: B - Small to medium datasets that don't need real-time updates
Explanation: Import models excel with manageable data volumes where blazing-fast performance matters more than real-time updates. They offer maximum flexibility and best query performance.
Q13: Composite Model benefits
Answer: C - Combine different storage modes for optimal performance
Explanation: Composite models provide the flexibility to use Import for small, stable dimensions and DirectQuery for large, changing fact tables, optimizing both performance and data freshness.
Q14: Scenario storage modes
Sample Answers:
- Sales Records: DirectQuery (massive volume, hourly updates)
- Employee Directory: Import (small, rarely changes)
- Budget Targets: Import (small, quarterly updates)
- Inventory Levels: DirectQuery (real-time requirements)
Section E: Advanced Concepts
Q15: Star vs Snowflake Schema
Sample Answer: Star schema uses flat dimension tables directly connected to the fact table, prioritizing query performance and simplicity. Snowflake schema normalizes dimensions into multiple related tables, reducing redundancy but increasing complexity.
Q16: Limited Relationship
Answer: B - A relationship with no guaranteed "one" side, often between different source groups
Explanation: Limited relationships occur in composite models where tables from different source groups connect, or when using many-to-many relationships. They can affect query evaluation and performance.
Q17: Storage mode conversion
Answer: False
Explanation: You can convert DirectQuery tables to Import, but not the reverse. This is because converting Import to DirectQuery would require rebuilding the entire model structure and data source connections.
Section F: Practical Application
Q18: Case Study Storage Mode Recommendations
Sample Solution:
Table | Storage Mode | Justification |
Daily Sales | DirectQuery | 10M daily transactions = massive volume requiring real-time access; Import would hit size limits and refresh challenges |
Product Catalog | Dual | 50K products is manageable size; weekly updates allow caching; Dual mode supports both fast queries and real-time when needed |
Store Info | Import | Only 500 stores, rarely changes; Perfect for Import to maximize query performance for filtering and grouping |
Customer Data | Import/Dual | 2M customers is borderline; Import if under size limits, Dual if approaching limits; Occasional address changes acceptable with scheduled refresh |
Financial Targets | Import | Small dataset, quarterly updates; No real-time requirements; Import provides best performance for target vs actual analysis |
🔧 Performance Optimization Tips
For DirectQuery Tables:
- Create appropriate indexes on source database
- Use materialized views for common aggregations
- Implement query folding optimizations
- Monitor source system performance impact
For Import Tables:
- Apply data reduction techniques (remove unnecessary columns/rows)
- Optimize data types (prefer numeric over text)
- Use incremental refresh for large tables
- Schedule refreshes during off-peak hours
For Composite Models:
- Set shared dimensions to Dual mode
- Use Import aggregation tables for performance
- Monitor cross-source-group relationship performance
- Implement proper security across all source groups