2. DataFrame

Created

Sep 1, 2025 10:14 PM

Multi-select

Status

In progress

1. Basic DataFrame Properties 🛠️

Understand your DataFrame’s structure at a glance.

df.index → Returns the row index labels of the DataFrame.
df.columns → Returns the column labels as an Index.
df.dtypes → Returns a Series with each column’s data type.
df.axes → Returns a list with the row axis and column axis.
df.shape → Returns a tuple (rows, columns) representing dimensions.
df.ndim → Returns the number of dimensions (always 2 for DataFrame).
df.size → Returns total number of elements (rows × columns).
df.memory_usage() → Shows memory usage for each column plus index.
df.empty → Returns True if DataFrame has no elements.
df.attrs → Dictionary for storing custom user metadata.
df.info() → Prints concise summary: index dtype, columns, non-null counts.

🀄

1. DataFrame: Basic DataFrame Properties

2. Data Type & Object Management 🛠️

Check, infer, or change your DataFrame’s column data types.

df.astype() → Convert the dtype of one or more columns to a specified type.
df.convert_dtypes() → Convert columns to best possible dtypes automatically.
df.infer_objects() → Infer better dtypes for object columns.
df.copy() → Create a deep or shallow copy to avoid modifying original data.
df.set_flags() → Set internal metadata flags.
df.flags → Inspect DataFrame’s flags (rarely used).
df.attrs → Same as above; user-defined metadata for extra context.

🀄

2. DataFrame: Data Type & Object Management

3. Label-Based & Positional Access 🛠️

Access cells, rows, or slices by label or position.

df.at[row_label, col_label] → Fast label-based scalar access (single cell).
df.iat[row_index, col_index] → Fast position-based scalar access.
df.loc[] → Select rows/columns by label; supports slices, conditions, or lists.
df.iloc[] → Select rows/columns by integer position.

Key: at/iat → single cell (fast). loc/iloc → slices or multiple rows/columns.

🀄

3. DataFrame: Label-Based & Positional Access

4. Iteration & Basic Loops 🛠️

Loop through columns or rows.

df.__iter__ → Dunder method for iterating over columns (not used directly).
df.items() → Iterate over (column_name, Series) pairs.
df.keys() → Alias for df.columns; returns column labels.
df.iterrows() → Iterate over (index, row Series) pairs; convenient but slower.
df.itertuples() → Iterate rows as namedtuples; faster than iterrows.

🀄

4. DataFrame: Iteration & Basic Loops

5. Quick Inspection & Conversion 🛠️

Peek at data or convert to NumPy.

df.head(n) → Return first n rows (default 5).
df.tail(n) → Return last n rows.
df.values → Return DataFrame values as a NumPy array (legacy).
df.to_numpy() → Preferred way to convert DataFrame to NumPy array.

🀄

5. DataFrame: Quick Inspection & Conversion

6. Math, Binary Operations & Comparison 🛠️

Element-wise math, matrix dot products, and value-wise comparison.

Arithmetic

df.add() or df.__add__ → Add element-wise; supports fill_value.
df.sub() or df.__sub__ → Subtract element-wise.
df.mul() or df.__mul__ → Multiply element-wise.
df.div() or df.truediv() → Divide element-wise (true division).
df.floordiv() → Floor division.
df.mod() → Modulo.
df.pow() → Exponentiate.
df.dot() → Matrix multiplication.
df.radd(), df.rsub(), df.rmul(), df.rdiv(), df.rtruediv(), df.rfloordiv(), df.rmod(), df.rpow() → Reverse operations.

Comparison

df.lt() → Element-wise less than.
df.gt() → Greater than.
df.le() → Less than or equal.
df.ge() → Greater than or equal.
df.eq() → Equal to.
df.ne() → Not equal to.

Combine

df.combine(other, func) → Combine two DataFrames element-wise using a function.
df.combine_first(other) → Fill missing values with other.

🀄

6. DataFrame: Math, Binary Operations & Comparison

7. Function Application

Apply functions row-wise, column-wise, element-wise, or via a clean pipe.

df.apply(func, axis=0) → Apply function along an axis (0 = columns, 1 = rows).
df.applymap(func) → Apply function element-wise.
df.agg() or df.aggregate() → Aggregate using one or more operations.
df.transform() → Transform rows/columns; shape is preserved.
df.pipe(func) → Pipe DataFrame through a custom function.

🀄

7. DataFrame: Function Application

8. Aggregation & Descriptive Statistics

Describe or summarize your data.

df.sum() → Sum of values.
df.mean() → Mean value.
df.std() → Standard deviation.
df.var() → Variance.
df.count() → Count non-NA cells.
df.min() → Minimum value.
df.max() → Maximum value.
df.median() → Median value.
df.mode() → Mode(s).
df.prod() or df.product() → Product of values.
df.cumsum() → Cumulative sum.
df.cumprod() → Cumulative product.
df.cummax() → Cumulative max.
df.cummin() → Cumulative min.
df.rank() → Rank values.
df.quantile() → Return value at specified quantile.
df.pct_change() → Percent change over previous row.
df.kurt() or df.kurtosis() → Kurtosis.
df.skew() → Skewness.
df.sem() → Standard error of mean.
df.describe() → Generate descriptive statistics summary.
df.corr() → Correlation matrix.
df.cov() → Covariance matrix.
df.corrwith(other) → Correlation with another DataFrame.
df.nunique() → Count distinct elements.
df.value_counts() → Count unique value frequencies.

🀄

8. DataFrame: Aggregation & Descriptive Statistics

9. Filtering & Conditional

Filter rows conditionally.

df.isin(values) → Check if each element is in values.
df.where(cond) → Replace where condition is False.
df.mask(cond) → Replace where condition is True.
df.query(expr) → Query DataFrame with string expression.

🀄

9. DataFrame: Filtering & Conditional

10. Reshaping & Pivoting

Switch between wide and long forms.

df.melt() → Unpivot columns to rows (wide → long).
df.pivot() → Reshape long to wide; unique index/column pairs.
df.pivot_table() → Spreadsheet-style pivot with aggregation.
df.stack() → Pivot columns into index (wide → long).
df.unstack() → Pivot index levels into columns (long → wide).
df.explode() → Transform list-like values to separate rows.

🀄

10. DataFrame: Reshaping & Pivoting

11. Missing Data & Cleaning

Detect, drop, or fill NaNs and duplicates.

df.isna() or df.isnull() → Detect missing values.
df.notna() or df.notnull() → Detect non-missing.
df.fillna(value) → Fill NaNs with a value.
df.dropna() → Drop rows/columns with NaNs.
df.ffill() or df.pad() → Forward-fill missing.
df.bfill() or df.backfill() → Backward-fill.
df.duplicated() → Mark duplicate rows.
df.drop_duplicates() → Drop duplicate rows.

🀄

11. DataFrame: Missing Data & Cleaning

12. Merge, Join & Combine

Combine multiple DataFrames.

df.merge() → SQL-style joins.
df.join() → Join columns using index or key.
df.update() → Update in place using non-NA values from another DataFrame.

🀄

12. DataFrame:Merge, Join & Combine

13. Export & IO

Save DataFrame or convert to Python objects.

df.to_csv() → Save as CSV.
df.to_excel() → Save as Excel file.
df.to_json() → Save as JSON.
df.to_pickle() → Serialize as pickle.
df.to_sql() → Write to SQL database.
df.to_dict() → Convert to dictionary.
df.to_numpy() → Convert to NumPy array.

🀄

13. DataFrame: Export & IO

Aliases & Dunder Reminders

agg = aggregate
kurt = kurtosis
prod = product
ffill = pad
bfill = backfill
isna = isnull
notna = notnull
div = truediv
__add__ etc. = dunder methods; use add(), sub(), etc. instead.

Pandas DataFrame User Guide

Status	Name
Done	🧠 1. DataFrame: Basic DataFrame Properties
Done	🧠 2. DataFrame: Data Type & Object Management
Done	🧠 3. DataFrame: Label-Based & Positional Access
Done	🧠 4. DataFrame: Iteration & Basic Loops
Done	🧠 5. DataFrame: Quick Inspection & Conversion
Done	🧠 6. DataFrame: Math, Binary Operations & Comparison
Not started	🧠 7. DataFrame: Function Application
Not started	🧠 8. DataFrame: Aggregation & Descriptive Statistics
Not started	🧠 9. DataFrame: Filtering & Conditional
Not started	🧠 10. DataFrame: Reshaping & Pivoting
Not started	🧠 11. DataFrame: Missing Data & Cleaning
Not started	🧠 12. DataFrame: Merge, Join & Combine
Not started	🧠 13. DataFrame: Export & IO