Read More:

Column Distribution:

  • Description: Column distribution provides a visual representation and summary of how data is distributed within a specific column. It includes the counts of distinct and unique values, giving you insights into the composition of the data.

Distinct Values:

  • Definition: Distinct values are all the different values present in a column, including duplicates and null (missing) values. It counts each occurrence of every unique value, regardless of how many times each value appears.
  • Purpose: The count of distinct values helps you understand the total number of different entries in a column, offering a sense of the column's diversity.
  • Example: If a column contains the values [1, 2, 2, 3, null], the distinct values are [1, 2, 3, null]. The count of distinct values is 4.

Unique Values:

  • Definition: Unique values are the values that appear only once in a column, excluding duplicates and null values. It identifies values that are not repeated.
  • Purpose: The count of unique values helps you identify how many individual entries are truly unique, providing insight into the uniqueness of the data.
  • Example: Using the same column values [1, 2, 2, 3, null], the unique values are [1, 3]. The count of unique values is 2 because '2' is a duplicate and 'null' is not considered in this count.

Summary:

  • Distinct Values: Reflects the total number of different values, including duplicates and nulls. It tells you how many distinct entries exist in the column.
  • Unique Values: Reflects the number of values that appear only once, excluding duplicates and nulls. It tells you how many entries are unique.

Understanding the difference between distinct and unique values in column distribution helps you better analyze and interpret your data. Distinct values give a broader view of the variety in the data, while unique values focus on the singular occurrences, highlighting any potential uniqueness or repetition within the dataset.

4o