https://greyskullanalytics.com/power-query-duplicate-v-reference-queries/
Study Notes: Duplicate vs. Reference Queries in Power Query
Key Concepts
- Duplicate Queries:
- Creates a full copy of the query, including all applied steps.
- Changes made to the duplicate query do not affect the original query (and vice versa).
- Independent queries.
- Reference Queries:
- Does not copy steps; instead, it references the end result of the original query.
- Changes to the original query cascade to the reference query.
- The two queries remain linked.
Query Efficiency Misconception
- Common Misunderstanding:
- Reference queries are believed to reduce source data calls or utilize cached results from the original query.
- Reality:
- For relational databases, this is not true. Both queries execute fully, and no automatic caching occurs during data loading.
- Reference queries work by referencing the code of the original query, not its data.
Use Cases
- When to Use Duplicate Queries:
- Testing changes or transformations without risking the original query.
- Useful in the Advanced Editor to avoid accidental mistakes while experimenting.
- Keeps a copy of the original query steps as a backup.
- When to Use Reference Queries:
- Dimensional Modeling:
- Derive multiple related tables (fact and dimension) from a single source table.
- Update the source in one place (the original query), and all reference queries reflect the change.
- Data Lineage Setup:
- Organize queries for clarity and maintainability using a folder structure:
- Extract Folder: Load raw source data (no transformations).
- Transform Folder: Reference and transform data (e.g., merge, append, aggregate).
- Load Folder: Reference the transformed data to create the final output (cleaned, renamed, etc.).
- Disable "Enable Load" for intermediate queries in Extract and Transform folders to improve performance.
Folder Structure Example
- Extract: Raw data (untouched).
- E.g., Source tables:
Products
,Orders
,Customers
. - Transform: Reference and shape data.
- Perform actions such as merges and aggregations.
- E.g., Combine
Products
andCategories
into a denormalizedProduct Dimension
table. - Load: Final query for reporting or dashboards.
- Rename columns, remove unused fields, or create summaries.
- Enable only this layer for loading.
Caching Notes
- Development Stage:
- Caching occurs during query preview in Power Query for faster editing.
- Load Stage:
- No caching occurs during dataset loads unless explicitly designed (e.g., using APIs).
Tips and Best Practices
- Duplicate Queries: Ideal for testing and creating backups.
- Reference Queries: Follow DRY Principle (Don’t Repeat Yourself) by reusing queries to simplify updates and maintain consistency.
- Use clear folder naming conventions for query organization:
- Traditional: Extract → Transform → Load
- Modern (Lakehouse): Bronze → Silver → Gold or Raw → Base → Curated.
- Enable Load: Only for queries required in the final dataset to improve efficiency.
Practical Application
- Testing Changes: Use duplicates for safe trial-and-error in query adjustments.
- Building Fact and Dimension Tables: Use references to create a clear lineage of transformations.
- Organizing Workflows: Set up Extract/Transform/Load (ETL) pipelines for structured and scalable data transformations.
Conclusion
- Reference queries do not inherently reduce data processing but provide a way to reuse and link query logic.
- Duplicate queries ensure safe experimentation and backup of query steps.
- Understanding these distinctions is crucial for efficient Power Query workflows.
Your Practice Task:
- Create a folder structure (Extract, Transform, Load) for a sample dataset.
- Experiment with both duplicate and reference queries to observe their behavior.
- Try enabling/disabling the "Enable Load" option to see its impact on performance.
4o
https://greyskullanalytics.com/power-query-duplicate-v-reference-queries/