https://greyskullanalytics.com/power-query-duplicate-v-reference-queries/
Study Notes: Duplicate vs. Reference Queries in Power Query
Key Concepts
- Duplicate Queries:
 - Creates a full copy of the query, including all applied steps.
 - Changes made to the duplicate query do not affect the original query (and vice versa).
 - Independent queries.
 - Reference Queries:
 - Does not copy steps; instead, it references the end result of the original query.
 - Changes to the original query cascade to the reference query.
 - The two queries remain linked.
 
Query Efficiency Misconception
- Common Misunderstanding:
 - Reference queries are believed to reduce source data calls or utilize cached results from the original query.
 - Reality:
 - For relational databases, this is not true. Both queries execute fully, and no automatic caching occurs during data loading.
 - Reference queries work by referencing the code of the original query, not its data.
 
Use Cases
- When to Use Duplicate Queries:
 - Testing changes or transformations without risking the original query.
 - Useful in the Advanced Editor to avoid accidental mistakes while experimenting.
 - Keeps a copy of the original query steps as a backup.
 - When to Use Reference Queries:
 - Dimensional Modeling:
 - Derive multiple related tables (fact and dimension) from a single source table.
 - Update the source in one place (the original query), and all reference queries reflect the change.
 - Data Lineage Setup:
 - Organize queries for clarity and maintainability using a folder structure:
 - Extract Folder: Load raw source data (no transformations).
 - Transform Folder: Reference and transform data (e.g., merge, append, aggregate).
 - Load Folder: Reference the transformed data to create the final output (cleaned, renamed, etc.).
 - Disable "Enable Load" for intermediate queries in Extract and Transform folders to improve performance.
 
Folder Structure Example
- Extract: Raw data (untouched).
 - E.g., Source tables: 
Products,Orders,Customers. - Transform: Reference and shape data.
 - Perform actions such as merges and aggregations.
 - E.g., Combine 
ProductsandCategoriesinto a denormalizedProduct Dimensiontable. - Load: Final query for reporting or dashboards.
 - Rename columns, remove unused fields, or create summaries.
 - Enable only this layer for loading.
 
Caching Notes
- Development Stage:
 - Caching occurs during query preview in Power Query for faster editing.
 - Load Stage:
 - No caching occurs during dataset loads unless explicitly designed (e.g., using APIs).
 
Tips and Best Practices
- Duplicate Queries: Ideal for testing and creating backups.
 - Reference Queries: Follow DRY Principle (Don’t Repeat Yourself) by reusing queries to simplify updates and maintain consistency.
 - Use clear folder naming conventions for query organization:
 - Traditional: Extract → Transform → Load
 - Modern (Lakehouse): Bronze → Silver → Gold or Raw → Base → Curated.
 - Enable Load: Only for queries required in the final dataset to improve efficiency.
 
Practical Application
- Testing Changes: Use duplicates for safe trial-and-error in query adjustments.
 - Building Fact and Dimension Tables: Use references to create a clear lineage of transformations.
 - Organizing Workflows: Set up Extract/Transform/Load (ETL) pipelines for structured and scalable data transformations.
 
Conclusion
- Reference queries do not inherently reduce data processing but provide a way to reuse and link query logic.
 - Duplicate queries ensure safe experimentation and backup of query steps.
 - Understanding these distinctions is crucial for efficient Power Query workflows.
 
Your Practice Task:
- Create a folder structure (Extract, Transform, Load) for a sample dataset.
 - Experiment with both duplicate and reference queries to observe their behavior.
 - Try enabling/disabling the "Enable Load" option to see its impact on performance.
 
4o
https://greyskullanalytics.com/power-query-duplicate-v-reference-queries/