PowerBI_Data_Loading_Detailed_Guide
DATA LOADING - Detailed Explanation
Let me break down each step with practical details, examples, and best practices:
1. Identify all data sources needed for the project
What this means:
Before you start connecting to any data, you need to map out ALL the data sources your project requires. This is strategic planning before execution.
Why it matters:
- Prevents multiple trips back to stakeholders asking for more data
- Helps you understand data dependencies early
- Allows you to plan for different connection methods
- Identifies potential data quality issues before you start building
How to do it
- Step 1: Understand the business requirements
- What questions need to be answered?
- What metrics need to be tracked?
- What time periods are needed?
- Who are the end users and what do they need to see?
- Step 2: List out all required data entities Example for a Sales Report:
- Sales transactions (fact data)
- Products/inventory information
- Customer details
- Sales territories
- Dates (you’ll create this, but note it)
- Targets/budgets
- Employee/sales rep information
Step 3: Map each entity to its source location Example mapping:
Sales Transactions → SQL Server Database (Sales_DB → dbo.Transactions)
Products → Excel file (SharePoint → ProductMaster.xlsx)
Customers → CRM System (via API or export)
Territories → Excel file (Local → Territories.xlsx)
Targets → Excel file (SharePoint → Monthly_Targets.xlsx)
Employees → SQL Server (HR_DB → dbo.Employees)- Step 4: Document access requirements
- Do you have access to each source?
- What credentials are needed?
- Are any sources behind firewalls?
- Do you need approval or special permissions?
Common Pitfall:
Starting to build before identifying all sources, then having to rebuild the model when new data sources are discovered.
2. Use appropriate connectors for each source
What this means:
Power BI has specific connectors optimized for different data sources. Using the right connector ensures better performance, query folding, and fewer errors.
Available Connector Types:
File-based connectors:
- Excel: For .xlsx, .xls files
- CSV/Text: For .csv, .txt files
- XML: For XML files
- JSON: For JSON files
- PDF: For extracting tables from PDFs
- Folder: For loading multiple files from a directory
Database connectors:
- SQL Server: For Microsoft SQL databases
- Oracle: For Oracle databases
- MySQL/PostgreSQL: For these database systems
- Azure SQL Database: For cloud-based SQL
- Access: For Microsoft Access databases
Cloud & Online Services:
- SharePoint Online List/Folder: For SharePoint data
- OneDrive: For files stored in OneDrive
- Dataverse: For Power Platform data
- Dynamics 365: For Dynamics data
- Salesforce: For CRM data
Web-based:
- Web: For web pages and APIs
- OData Feed: For OData services
- REST APIs: Using Web connector with authentication
How to choose the right connector:
Example 1: Excel file stored on SharePoint
- Wrong approach: Download file locally and use Excel connector
- Right approach: Use SharePoint Folder connector
- Why: Direct connection means automatic updates, no manual downloads
Example 2: SQL Server data
- Wrong approach: Export to Excel then load Excel
- Right approach: Use SQL Server connector directly
- Why:
- Query folding (transformations happen in database, much faster)
- Always up-to-date
- Can handle much larger datasets
- Better performance
Example 3: Multiple Excel files in a folder with same structure
- Wrong approach: Load each file separately
- Right approach: Use Folder connector, then combine files
- Why: Automatic inclusion of new files, one transformation for all
How to access connectors:
Method 1: Home Ribbon - Open Power BI Desktop - Home tab → Get Data → More… - Search for your connector type - Click Connect
Method 2: Recent Sources - Home tab → Get Data → Recent Sources - Quick access to previously used sources
Connector Best Practices:
Import vs DirectQuery: - Import Mode (default): Data is loaded into Power BI’s memory - Use when: Dataset is under 1GB, data doesn’t change frequently, need fast performance
- DirectQuery Mode: Queries go to source each time visual is refreshed
- Use when: Dataset is very large (multi-GB), need real-time data, data changes constantly
Connection Settings: When connecting, you’ll see options like:
SQL Server connection example:
Server: myserver.database.windows.net
Database: Sales_DB
Data Connectivity mode:
○ Import (recommended)
○ DirectQuery
Advanced options:
- SQL statement (optional)
- Command timeout
- Relationship columns3. Load data to Power Query (not directly to model) for transformation
What this means:
When you connect to a data source, Power BI gives you two options: 1. Load: Brings data straight into the model 2. Transform Data: Opens Power Query Editor first
Always choose Transform Data (or at minimum, load to Power Query before loading to model).
Why this is critical:
Reason 1: Data is rarely clean Raw data usually has: - Wrong data types (dates stored as text) - Extra rows (headers, footers, blank rows) - Inconsistent formatting (spaces, mixed case) - Unnecessary columns - Null values that need handling
Reason 2: Performance optimization Transforming in Power Query is more efficient than calculated columns in DAX later.
Reason 3: Documentation and maintenance Power Query shows all transformation steps, making it easy to see what was done and modify later.
How to do it:
When first connecting:
1. Click "Get Data" → Select your source
2. Navigate to your file/table
3. You'll see a preview window
4. At the bottom, you have options:
- Load (DON'T click this)
- Transform Data (CLICK THIS)
- Load To... (for advanced scenarios)If you accidentally clicked Load:
1. Right-click the query in the Queries pane (right side)
2. Select "Edit Query"
3. Power Query Editor will openUnderstanding the Power Query Editor:
When you open Transform Data, you see: - Left Pane: List of all queries - Center: Data preview - Right Pane: Query Settings - Properties (query name) - Applied Steps (all transformations)
Applied Steps Example:
Applied Steps:
1. Source (connected to data)
2. Navigation (selected specific sheet/table)
3. Promoted Headers (first row became column names)
4. Changed Type (auto-detected data types)
← These are automatic
Below this, you add your own steps:
5. Removed Columns
6. Filtered Rows
7. Replaced Values
... and so onThe Power Query Workflow:
Data Source → Power Query Editor → Apply Transformations → Close & Apply → Data ModelNOT this:
Data Source → Data Model (too late to transform efficiently)4. Check data preview to understand structure and data types
What this means:
Before you start any transformations, you need to thoroughly examine your data to understand what you’re working with.
Why it matters:
- Helps you plan your transformations
- Identifies data quality issues early
- Prevents errors downstream
- Informs your data model design
What to look for:
1. Column Headers: - Are headers in the first row, or do you need to promote them? - Are header names clear and descriptive? - Are there duplicate column names? - Are there special characters or spaces in names?
Example issue:
Before promoting headers:
Column1 Column2 Column3
ProductID Product Price
12345 Widget A 50
After promoting headers:
ProductID Product Price
12345 Widget A 502. Data Types (CRITICAL): Look at the icon next to each column name: - ABC = Text - 123 = Whole Number - 1.2 = Decimal Number - 📅 = Date - 🕐 = Date/Time - ✓/✗ = True/False - $ = Currency (if explicitly set)
Common data type issues:
3. Data Quality Indicators:
Power BI shows three indicators for each column: - Column Quality: Valid, Error, Empty percentages - Column Distribution: Histogram showing value distribution - Column Profile: Detailed statistics
How to enable: - Go to View tab in Power Query - Check: Column Quality, Column Distribution, Column Profile
Example interpretation:
Product Name column:
Quality: 98% valid, 0% error, 2% empty
Distribution: Shows most products appear 50-100 times
Profile: 1,250 distinct products out of 50,000 rows
Action needed: Decide how to handle 2% empty values4. Sample Data: Scroll through the preview to see: - Are there blank rows that need removal? - Are there subtotal/total rows mixed in? - Is the data consistent or are there format changes midway? - Are there any obvious errors or outliers?
5. Row Count: Bottom left shows “Showing top 1000 rows” - Power Query preview shows only first 1000 rows by default - Your transformations apply to ALL rows, not just preview
Practical Checklist:
☐ Headers are in the correct row
☐ No blank columns
☐ No blank rows
☐ All data types are logical for each column
☐ Date columns are Date type (not text)
☐ Number columns are Number type (not text)
☐ ID columns are Text type (not number)
☐ No obvious errors or outliers
☐ Understand what nulls/blanks represent
☐ Know the grain of the data (what does each row represent?)5. Apply folder/file filtering if loading from folders
What this means:
When you use the Folder connector to load multiple files, you often need to filter which files are actually loaded to avoid bringing in unwanted data.
When you use Folder connector:
Scenario 1: Multiple sales files
Folder structure:
📁 Sales Reports
├─ Sales_January_2024.xlsx
├─ Sales_February_2024.xlsx
├─ Sales_March_2024.xlsx
├─ Budget_2024.xlsx (don't want this)
└─ Archive_2023.xlsx (don't want this)Scenario 2: Daily exports
📁 Daily Exports
├─ Export_2024-01-01.csv
├─ Export_2024-01-02.csv
├─ Export_2024-01-03.csv
... hundreds of files
└─ Export_2024-12-31.csvHow to connect to folder:
Step 1: Get Data → Folder
1. Click Get Data
2. Select "Folder"
3. Browse or paste folder path
4. Click OKStep 2: You’ll see a list of all files The preview shows columns: - Content (binary file) - Name (filename) - Extension (.xlsx, .csv, etc.) - Date accessed - Date modified - Date created - Attributes - Folder Path
Step 3: Filter before combining
Filtering Techniques:
Method 1: Filter by extension
1. Click dropdown on "Extension" column
2. Uncheck extensions you don't want
3. Example: Keep only .xlsx, uncheck .pdf, .docx, etc.Method 2: Filter by filename pattern
Method 3: Filter by date
1. Click dropdown on "Date modified" column
2. Date Filters → After
3. Select date (e.g., 01/01/2024 to get only current year files)
Useful for:
- Loading only recent files
- Excluding old versions
- Incremental loadsMethod 4: Keep/Remove specific files
1. Click on Name column values
2. Right-click → Remove (to exclude specific files)
3. Or use "Keep Only Selected Rows"After Filtering - Combine Files:
Step 4: Combine Files
1. After filtering, click "Combine Files" button at bottom
2. Power BI shows "Combine Files" dialog
3. Select the sheet/table to combine from
4. Click OKPower BI will: - Create a function to extract data from each file - Apply same transformations to all files - Combine all files into one table - Add “Source.Name” column showing which file each row came from
Example Result:
Order ID | Product | Amount | Source.Name
1001 | Widget | 100 | Sales_January_2024.xlsx
1002 | Gadget | 200 | Sales_January_2024.xlsx
1003 | Widget | 150 | Sales_February_2024.xlsx
1004 | Tool | 300 | Sales_February_2024.xlsxCommon Folder Loading Patterns:
Pattern 1: Same structure monthly files
File naming: Sales_YYYY_MM.xlsx
Filter: Contains "Sales_"
Result: All sales files combined automatically
When new month file is added, it's automatically included in next refreshPattern 2: Daily transaction files
File naming: Transactions_YYYY-MM-DD.csv
Filter: Extension = .csv AND Date modified > last year
Result: Only recent CSV files loaded
Old files ignored automaticallyPattern 3: Multiple product category files
Files: Electronics.xlsx, Clothing.xlsx, Furniture.xlsx, Template.xlsx
Filter: Does not contain "Template"
Result: All product files except templateBest Practices:
1. Filter early Don’t combine all files then filter later. Filter the file list BEFORE combining to improve performance.
2. Use consistent file naming
Good naming convention:
Sales_2024_01.xlsx
Sales_2024_02.xlsx
Sales_2024_03.xlsx
Bad naming convention:
January Sales.xlsx
Feb2024sales.xlsx
March_SalesData.xlsx3. Keep file structure consistent All files should have: - Same columns - Same column names - Same data types - Headers in same position
4. Test with sample Before loading hundreds of files, test your filter logic with just a few files first.
5. Document your filters Add a note explaining why certain files are excluded (for future reference).
6. Document source connections
What this means:
Keep a clear record of WHERE your data comes from and HOW to access it. This is essential for maintenance, troubleshooting, and handover to other team members.
Why it matters:
- When refresh fails: You know exactly where to check
- When moving between environments: (Dev to Production) you can easily update connections
- When someone else takes over: They understand the data sources
- For audit purposes: Trace where data originates
- For troubleshooting: Quickly identify which source is causing issues
What to document:
For Each Data Source, Record:
1. Source Type - SQL Server - Excel file - SharePoint list - API endpoint - Web URL - etc.
2. Connection Details
For SQL Server:
Server Name: sql-prod-server.database.windows.net
Database Name: SalesDB
Authentication Type: Windows Authentication / SQL Server / Azure AD
Service Account: domain\serviceaccount (if using service account)
Tables/Views Used:
- dbo.FactSales
- dbo.DimProduct
- dbo.DimCustomerFor Excel/CSV Files:
File Location: \\sharepoint\sites\Sales\Documents\
File Name: ProductMaster.xlsx
Sheet/Table Name: Products
Last Updated: 2024-01-15
Owner: John Doe (john.doe@company.com)
Update Frequency: Monthly (1st of each month)For SharePoint:
Site URL: https://company.sharepoint.com/sites/Sales
Library: Shared Documents
Folder Path: /Reports/Monthly/
File Pattern: Sales_*.xlsx
Authentication: Organizational AccountFor APIs:
API Endpoint: https://api.company.com/v2/sales
Authentication Method: API Key / OAuth2
API Key Location: Stored in Azure Key Vault
Rate Limits: 100 requests/hour
Documentation: https://api.company.com/docs3. Access Credentials
Document credential TYPE (not the actual passwords):
SQL Server: Uses SQL Authentication
- Username: svc_powerbi (stored in gateway)
- Password: Managed by IT Security team
SharePoint: Uses OAuth
- Account: powerbi@company.com
- MFA: Enabled
API: Uses Bearer Token
- Token refresh: Every 24 hours
- Token source: Azure AD4. Dependencies and Requirements
On-Premises Gateway Required: Yes/No
Gateway Name: Production-Gateway-01
Firewall Rules: IP whitelist required
VPN Connection: Required for SQL Server access
File Permissions: Read-only access needed5. Data Refresh Schedule
Source Data Updates: Daily at 6:00 AM
Power BI Refresh: Daily at 7:00 AM
Lag Time: 1 hour
Historical Data: 2 years rollingWhere to document:
Option 1: In Power BI Desktop (Recommended)
In Power Query:
1. Right-click query
2. Properties
3. Add detailed description
Example description:
"Source: SQL Server (sql-prod.company.com\SalesDB)
Table: dbo.FactSales
Updated: Daily at 6 AM
Owner: Data Team (data@company.com)
Contains: Sales transactions from Jan 2022 onwards
Refresh: Scheduled daily at 7 AM via Production Gateway"Option 2: External Documentation Create a document (Word, Excel, OneNote, SharePoint) with a connection inventory:
| Query Name | Source Type | Connection String | Update Frequency | Owner | Notes |
|------------|-------------|-------------------|------------------|-------|-------|
| Sales | SQL Server | sql-prod.company | Daily | IT | Prod |
| Products | SharePoint | /sites/Sales/... | Monthly | Sales | ... |Option 3: README in project folder If storing .pbix file in OneDrive/SharePoint, include README.md:
# Sales Dashboard - Data Sources## Sales Data- **Source:** SQL Server
- **Server:** sql-prod.company.com
- **Database:** SalesDB
- **Table:** dbo.FactSales
- **Refresh:** Daily at 7 AM
## Product Master- **Source:** Excel file
- **Location:** SharePoint /sites/Sales/ProductMaster.xlsx
- **Owner:** sales.team@company.com
- **Update:** Monthly, 1st of monthDocumentation Template:
Best Practices:
1. Update documentation when sources change Don’t let documentation become stale.
2. Include contact information Always note who to contact for each data source.
3. Note known issues
"Note: SQL Server goes offline for maintenance every Sunday 2-4 AM"
"File may be locked if Sales team is editing during business hours"4. Document workarounds
"If SharePoint connection fails, use local copy at C:\Backup\"
"If API rate limit exceeded, increase interval between refreshes"5. Version control for connection strings If moving between Dev/Test/Prod, document all versions:
DEV: sql-dev.company.com
TEST: sql-test.company.com
PROD: sql-prod.company.com7. Set up parameters for dynamic file paths or server connections
What this means:
Instead of hardcoding connection details (like server names, file paths, database names) directly into your queries, you store them in parameters that can be easily changed without editing every query.
Why this is crucial:
Problem without parameters:
You build a report connecting to:
Server: sql-dev-server.database.windows.net
Database: SalesDB_Dev
When moving to production, you need to:
1. Find every query connected to the dev server
2. Edit each connection manually
3. Change to: sql-prod-server.database.windows.net
4. Change database to: SalesDB_Prod
5. Hope you didn't miss any!Solution with parameters:
You create parameters:
ServerName = sql-dev-server.database.windows.net
DatabaseName = SalesDB_Dev
All queries reference these parameters
To move to production:
1. Change ServerName parameter to sql-prod-server
2. Change DatabaseName parameter to SalesDB_Prod
3. Done! All queries automatically use new valuesWhen to use parameters:
- Server names (Dev vs Production)
- Database names
- File paths (local vs SharePoint vs network drive)
- Date ranges (StartDate, EndDate for filtered queries)
- API endpoints (different environments)
- Folder locations (when using Folder connector)
How to create parameters:
Method 1: Create Parameter Manually
Step 1: Open Power Query Editor (Transform Data)
Step 2: Home tab → Manage Parameters → New Parameter
Step 3: Fill in parameter details:Parameter Configuration:
Name: ServerName
Description: SQL Server name for database connection
Required: Yes (must have a value)
Type: Text
Suggested Values: Any value
Current Value: sql-dev-server.database.windows.netCommon Parameter Types:
1. Server/Connection Parameters:
Parameter: ServerName
Type: Text
Current Value: sql-prod-server.database.windows.net
Parameter: DatabaseName
Type: Text
Current Value: SalesDB
Parameter: Port
Type: Number
Current Value: 14332. File Path Parameters:
Parameter: FolderPath
Type: Text
Current Value: \\sharepoint\sites\Sales\Documents\
Parameter: FileName
Type: Text
Current Value: ProductMaster.xlsx3. Date Range Parameters:
Parameter: StartDate
Type: Date
Current Value: 2024-01-01
Parameter: EndDate
Type: Date
Current Value: 2024-12-314. Environment Parameters:
Parameter: Environment
Type: Text
Suggested Values: List of values
- Development
- Testing
- Production
Current Value: ProductionHow to use parameters in queries:
Example 1: SQL Server Connection
Without Parameters (Hardcoded):
Source = Sql.Database("sql-prod-server.database.windows.net", "SalesDB")With Parameters:
Source = Sql.Database(ServerName, DatabaseName)Step-by-step to replace with parameters:
1. In Power Query, find the Source step
2. Click the gear icon next to "Source" in Applied Steps
3. This opens the connection dialog
4. Instead of typing server name, click "fx" button
5. Select parameter from dropdown
6. Repeat for database name
7. Click OKExample 2: File Path
Without Parameters:
Source = Excel.Workbook(File.Contents("C:\Users\YourName\Documents\Sales.xlsx"))With Parameters:
Source = Excel.Workbook(File.Contents(FolderPath & FileName))Here, & concatenates the folder path and file name.
Example 3: Folder Connector
Without Parameters:
Source = Folder.Files("\\sharepoint\sites\Sales\Reports\2024\")With Parameters:
Source = Folder.Files(FolderPath)Example 4: Date Filtering in SQL Query
Without Parameters:
Source = Sql.Database("server", "db", [Query="
SELECT *
FROM Sales
WHERE OrderDate >= '2024-01-01'
AND OrderDate <= '2024-12-31'
"])With Parameters:
Source = Sql.Database(ServerName, DatabaseName, [Query="
SELECT *
FROM Sales
WHERE OrderDate >= '" & Date.ToText(StartDate, "yyyy-MM-dd") & "'
AND OrderDate <= '" & Date.ToText(EndDate, "yyyy-MM-dd") & "'
"])Advanced Parameter Scenarios:
Scenario 1: Environment-Based Connection
Create a parameter that drives multiple connection strings:
Now you only change ONE parameter (Environment) and all connections update.
Scenario 2: Dynamic Date Parameters
Instead of manually setting dates, make them relative:
Parameter: MonthsBack
Type: Number
Current Value: 6
Then in query:
StartDate = Date.AddMonths(DateTime.LocalNow(), -MonthsBack)
EndDate = DateTime.LocalNow()This automatically gives you last 6 months without manual updates.
Scenario 3: Multiple File Patterns
Parameter: FilePattern
Type: Text
Current Value: Sales_
Then in Folder query:
Source = Folder.Files(FolderPath),
FilteredRows = Table.SelectRows(Source, each Text.Contains([Name], FilePattern))Change FilePattern to load different file sets.
Parameter Best Practices:
1. Naming Convention
Good:
- ServerName (clear what it is)
- FolderPath (descriptive)
- StartDate (obvious purpose)
Bad:
- Param1 (not descriptive)
- Server (too generic)
- S (unclear)2. Add Descriptions Always fill in the Description field:
Name: ServerName
Description: SQL Server hostname for Sales database connection.
Use sql-dev for development, sql-prod for production.3. Use Suggested Values for limited options
Parameter: Environment
Suggested Values: List of values
- Development
- Testing
- Production
This creates a dropdown instead of free text entry.4. Document parameter dependencies
In query description:
"This query uses parameters: ServerName, DatabaseName, StartDate
Change parameters before moving to production environment"5. Group parameters logically Create a naming structure:
Environment parameters:
- ENV_ServerName
- ENV_DatabaseName
Date parameters:
- DATE_StartDate
- DATE_EndDate
File parameters:
- FILE_FolderPath
- FILE_PatternHow Parameters Help in Deployment:
Development Environment:
ServerName = sql-dev-server.database.windows.net
DatabaseName = SalesDB_Dev
FolderPath = C:\Dev\Data\Test Environment:
ServerName = sql-test-server.database.windows.net
DatabaseName = SalesDB_Test
FolderPath = \\test-share\Data\Production Environment:
ServerName = sql-prod-server.database.windows.net
DatabaseName = SalesDB
FolderPath = \\sharepoint\sites\Production\Data\Simply change the parameter values, refresh, and you’re in a different environment - no query editing required!
Parameters in Power BI Service:
After publishing to Power BI Service:
1. Go to workspace
2. Find your dataset
3. Settings → Parameters
4. Change parameter values
5. Apply
6. Refresh datasetThis lets you change connections without re-publishing the report!
8. Avoid loading unnecessary columns at this stage
What this means:
When connecting to a data source, only bring in the columns you actually need for your analysis. Leave behind anything that won’t be used in your report.
Why this is critical:
Performance Impact:
Scenario: SQL table with 50 columns, 10 million rows
Loading all 50 columns:
- Model size: 800 MB
- Refresh time: 15 minutes
- Query performance: Slow
Loading only 10 needed columns:
- Model size: 200 MB
- Refresh time: 4 minutes
- Query performance: FastMemory Impact: Power BI loads entire dataset into RAM. Unnecessary columns waste memory that could be used for actual analysis.
Maintainability: Fewer columns = cleaner model = easier to understand and maintain.
When to filter columns:
Best Time: During initial connection (BEFORE Power Query transformations)
When you connect to a source, you often see a “Navigator” or preview screen. This is your chance to select only needed columns.
How to select specific columns:
Method 1: During Initial Connection (SQL Server Example)
Method 2: Using SQL Query (Advanced)
Instead of loading entire table, write a SELECT statement:
Instead of loading table "Sales" with all columns:
Write custom SQL:
SELECT
OrderID,
OrderDate,
CustomerID,
ProductID,
Quantity,
UnitPrice,
TotalAmount
FROM dbo.Sales
WHERE OrderDate >= '2022-01-01'This approach: - Only loads specified columns - Can also filter rows in the database (faster) - Takes advantage of query folding
How to use SQL query:
When connecting to SQL Server:
1. Advanced options → SQL statement
2. Enter your SELECT query
3. Click OKMethod 3: In Power Query Editor
If columns are already loaded:
Which columns to remove:
Definitely Remove:
1. Internal system columns
- CreatedBy (system user who created record)
- ModifiedBy (system user who updated record)
- CreatedTimestamp (not needed if you have OrderDate)
- ModifiedTimestamp (rarely needed in reporting)
- RowVersion (for database concurrency)
- RecordID (internal tracking)2. Redundant columns
If you have FirstName and LastName, you don't need:
- FullName (can create this in Power Query if needed)
If you have Quantity and UnitPrice, you don't need:
- TotalAmount (calculate with DAX measure instead)3. Overly detailed columns you won’t use
From Customer table:
- Remove: BillingAddress2, BillingAddress3, MiddleName, Suffix
- Keep: CustomerID, CustomerName, City, State, Country4. Binary/blob columns
- Photos, images (unless specifically needed)
- Attachments
- Large text fields (notes, comments) if not used5. Denormalized columns (that will come from dimension tables)
In Sales fact table:
- Remove: ProductName, ProductCategory, ProductColor
- Keep: ProductID only
- Why: You'll get product details from Product dimension tableColumns to ALWAYS keep:
1. Keys for relationships
In Sales table:
- Keep: OrderID, CustomerID, ProductID, DateKey, StoreID
- These create relationships to dimension tables2. Metrics/measures (numbers to analyze)
- Quantity
- UnitPrice
- DiscountAmount
- TaxAmount
- TotalAmount3. Date/time fields
- OrderDate
- ShipDate
- DeliveryDate4. Status/flag fields if used in filtering
- OrderStatus (Pending, Shipped, Delivered)
- IsCancelled
- PaymentStatusExample Comparison:
Bad: Loading Everything
Good: Loading Only What’s Needed
Sales Table (loaded 8 columns):
OrderID, OrderDate, CustomerID, ProductID, Quantity, UnitPrice,
OrderStatus, TotalAmount
Result:
- Model size: 120 MB
- Refresh: 3 minutes
- Clean field list
- Other details come from Customer and Product dimension tablesSpecial Case: Choose Columns vs Remove Columns
Use “Choose Columns” when: - You know exactly which columns you need - You want to keep only 5-10 columns from a table with 50 columns - Starting fresh with a clear list
Use “Remove Columns” when: - You want to keep most columns and remove just a few - You’re removing 3-5 specific columns from a 20-column table
Use “Remove Other Columns” when: - You want to keep specific columns and don’t care what else exists - Working with unfamiliar tables and found the columns you need
Column Selection Patterns by Table Type:
Fact Tables (Transaction Tables):
Keep:
✓ All foreign keys (IDs for relationships)
✓ Date columns
✓ Numeric measures (quantities, amounts)
✓ Status flags (if used for filtering)
Remove:
✗ Descriptive text from related tables
✗ Calculated fields that can be DAX measures
✗ Audit columns (CreatedBy, ModifiedDate)
✗ Internal system columnsDimension Tables (Lookup Tables):
Keep:
✓ Primary key (ID)
✓ Display name/description
✓ Category/group fields
✓ Sort order columns (if needed)
✓ Active/inactive status
Remove:
✗ Detailed descriptions (unless used in report)
✗ Internal codes (unless needed)
✗ Audit columns
✗ Redundant text columnsQuery Folding Consideration:
Important: If you remove columns in Power Query, check if query folding still works.
Query folding = Power Query pushes transformations back to the data source (database does the work, not Power BI).
Check query folding:
1. Right-click on last step in Applied Steps
2. If "View Native Query" is available → Query folding works ✓
3. If greyed out → Query folding broken ✗Best practice: Remove columns using “Choose Columns” or “Remove Columns” maintains query folding. More complex transformations might break it.
The Selection Process:
Before loading data, ask yourself for each column:
1. Will this column be used in any visual? NO → Remove
2. Is this column needed for a relationship? NO → Remove
3. Is this column needed for filtering? NO → Remove
4. Will this column be used in any calculation? NO → Remove
5. Is this column for context/tooltip? NO → Remove
If answer is NO to all → Remove the column!Performance Testing:
After removing columns, check the improvement:
Before:
- Query: [Sales - All Columns]
- Refresh time: 12 minutes
- File size: 450 MB
After removing unnecessary columns:
- Query: [Sales - Optimized]
- Refresh time: 4 minutes
- File size: 180 MB
Improvement: 67% faster, 60% smallerSummary: The Data Loading Checklist
Before you click “Close & Apply” in Power Query, verify:
This detailed breakdown should give you a complete understanding of each data loading step. Each principle here helps you build a more efficient, maintainable, and performant Power BI solution from the very beginning.