Status
Done
Understand your DataFrame’s structure at a glance.
df.index
→ Returns the row index labels of the DataFrame.df.columns
→ Returns the column labels as an Index.df.dtypes
→ Returns a Series with each column’s data type.df.axes
→ Returns a list with the row axis and column axis.df.shape
→ Returns a tuple(rows, columns)
representing dimensions.df.ndim
→ Returns the number of dimensions (always 2 for DataFrame).df.size
→ Returns total number of elements (rows × columns
).df.memory_usage()
→ Shows memory usage for each column plus index.df.empty
→ ReturnsTrue
if DataFrame has no elements.df.attrs
→ Dictionary for storing custom user metadata.df.info()
→ Prints concise summary: index dtype, columns, non-null counts.
‣
1. DataFrame Creation
1. DataFrame Properties Reference
‣
1. df.index
‣
2. df.columns
‣
3. df.dtypes
‣
4. df.axes
‣
5. df.shape
‣
6. df.ndim
‣
7. df.size
‣
8. df.memory_usage()
‣
9. df.empty
‣
10. df.attrs
‣
11. df.info()
Professional Workflow Guide
Data Inspection Protocol
- Initial Check:
- Memory Optimization:
- Type Validation:
- Production-Grade Checks:
python
df.info() # Verify structure and nulls
print(df.shape) # Confirm expected size
# Convert appropriate columns to category
cat_cols = df.select_dtypes(include='object').nunique()
for col in cat_cols[cat_cols < 100].index:
df[col] = df[col].astype('category')
expected_dtypes = {
'Date': 'datetime64[ns]',
'Price': 'float64'
}
for col, dtype in expected_dtypes.items():
assert df[col].dtype == dtype, f"{col} has wrong dtype"
def validate_dataframe(df):
assert not df.empty, "Empty DataFrame"
assert df.index.is_unique, "Non-unique index"
assert df.columns.is_unique, "Duplicate columns"
return True
Performance Considerations
Property | Time Complexity | Space Complexity | Notes |
index | O(1) | O(n) | Cached after first access |
columns | O(1) | O(k) | k = number of columns |
dtypes | O(k) | O(k) | Must check each column |
shape | O(1) | O(1) | Precomputed value |
memory_usage | O(k) | O(k) | Deep scan for object dtypes |
Best Practice: For large DataFrames, avoid repeated calls to memory_usage(deep=True)
in production code.