Table of content
1. Series Structure
1. Basic Series Properties
df['series'].index→ Returns the index (row labels) of the Series.df['series'].name→ Returns the name of the Series (if assigned).df['series'].dtype→ Returns the data type of the Series elements.df['series'].dtypes→ Alias fordtype(same output).df['series'].shape→ Returns a tuple (n_rows,) representing dimensions.df['series'].ndim→ Returns the number of dimensions (1 for Series).df['series'].size→ Returns the number of elements in the Series.df['series'].values→ Returns the data as a NumPy array.df['series'].array→ Returns the underlyingPandasArray(extension array).
2. Boolean Checks
df['series'].empty→ ReturnsTrueif the Series is empty.df['series'].hasnans→ ReturnsTrueif the Series contains any NaN values.
3. Advanced Attributes (Less Common)
df['series'].flags→ Returns aFlagsobject with memory/performance settings.df['series'].set_flags→ Modifies theflags(e.g.,copy_on_write).df['series'].attrs→ Returns a dictionary of custom metadata attributes.
Key Notes:
- Most frequently used:
index,name,dtype,shape,size,values. - For missing values:
hasnansis useful for quick NaN checks. - Advanced use cases:
flags,set_flags, andattrsare rarely needed in everyday workflows.
1. Pandas Series: Series Structure
2. Access & Selection
1. Label-Based & Positional Access
df['series'].at→ Fast label-based scalar access (single value, like a dictionary).df['series'].iat→ Fast integer-position-based scalar access (likeatbut for positions).df['series'].loc→ Label-based indexing (slicing, single/multiple values).df['series'].iloc→ Integer-position-based indexing (slicing, single/multiple values).
Key Difference:
- Use
at/iatfor single values (optimized for speed). - Use
loc/ilocfor slices or multiple values.
2. Value Retrieval & Conversion
df['series'].get(key)→ Safe value access (returnsNone/default if key missing, likedict.get).df['series'].item()→ Extract the single value from a 1-element Series (raises error if size ≠ 1).df['series'].to_numpy()→ Convert Series to a NumPy array (preferred over.valuesin modern Pandas).df['series'].array→ Returns the underlying Pandas extension array (e.g.,IntegerArray,StringArray).
3. Data Removal & Iteration
df['series'].pop(key)→ Remove and return a value by label (modifies the Series in-place).df['series'].items()→ Iterate over (index, value) pairs (likedict.items).df['series'].iter()→ Alias for.items()(older Pandas versions).df['series'].keys()→ Alias for.index(returns index labels).
4. Advanced Selection
df['series'].xs(key)→ Cross-section (select value by label, similar to.locbut less common).
When to Use What?
- Need speed for single values? →
at(label) /iat(position). - Slicing or filtering? →
loc(label) /iloc(position). - Safe access with fallback? →
get(). - Convert to NumPy? →
to_numpy()(modern) or.values(legacy). - Iterate? →
items()oriter().
- 2. Pandas Series: Access & Selection
3. Math Operations
1. Basic Arithmetic Operations
df['series'].add(other)→ Addition (+operator equivalent).df['series'].sub(other)→ Subtraction ( operator equivalent).df['series'].mul(other)→ Multiplication ( operator equivalent).df['series'].div(other)→ Division (/operator, deprecated, usetruediv).df['series'].truediv(other)→ True division (float division,/operator).df['series'].floordiv(other)→ Floor division (integer division,//operator).df['series'].mod(other)→ Modulo/remainder (%operator).df['series'].pow(other)→ Exponentiation (*operator).
2. Reverse Arithmetic Operations (Reflected)
These reverse the operand order (e.g., radd computes other + series instead of series + other).
df['series'].radd(other)→ Reverse addition (other + series).df['series'].rsub(other)→ Reverse subtraction (other - series).df['series'].rmul(other)→ Reverse multiplication (other * series).df['series'].rdiv(other)→ Reverse division (other / series, deprecated, usertruediv).df['series'].rtruediv(other)→ Reverse true division (other / series).df['series'].rfloordiv(other)→ Reverse floor division (other // series).df['series'].rmod(other)→ Reverse modulo (other % series).df['series'].rpow(other)→ Reverse exponentiation (other ** series).
3. Comparison Operations
df['series'].eq(other)→ Equal to (==operator).df['series'].ne(other)→ Not equal to (!=operator).df['series'].lt(other)→ Less than (<operator).df['series'].le(other)→ Less than or equal to (<=operator).df['series'].gt(other)→ Greater than (>operator).df['series'].ge(other)→ Greater than or equal to (>=operator).df['series'].between(left, right)→ Check if values are betweenleftandright(inclusive).
- 3. Pandas Series:Math Operations
4. Statistical Summaries
- Summary Statistics
df['series'].describe()→ Quick summary (count, mean, std, min, max, quartiles).df['series'].count()→ Count of non-NA values.df['series'].sum()→ Sum of values.df['series'].mean()→ Arithmetic mean.df['series'].median()→ Median (50th percentile).df['series'].mode()→ Most frequent value(s).df['series'].min()→ Minimum value.df['series'].max()→ Maximum value.df['series'].std()→ Standard deviation.df['series'].var()→ Variance.
2. Unique Values & Counting
df['series'].nunique()→ Number of unique values.df['series'].unique()→ Array of unique values.df['series'].value_counts()→ Counts of each unique value.
3. Index Locations
df['series'].idxmax()→ Index of first max value.df['series'].idxmin()→ Index of first min value.
Key Notes:
- Use
describe()for an instant overview. value_counts()is ideal for categorical data.idxmax/idxminreturn positions, not values.
- 4. Pandas Series: Statistical Summaries
5. Cumulative, Ranking & Rolling
1. Cumulative Calculations
df['series'].cumsum()→ Cumulative sumdf['series'].cumprod()→ Cumulative productdf['series'].cummax()→ Cumulative maximumdf['series'].cummin()→ Cumulative minimum
2. Ranking & Percent Changes
df['series'].rank()→ Rank values (with tie-breaking methods)df['series'].pct_change()→ Percentage change between elementsdf['series'].quantile(q)→ Value at quantile q (0-1)
3. Extreme Value Selection
df['series'].nlargest(n)→ Top n largest valuesdf['series'].nsmallest(n)→ Top n smallest values
4. Window Calculations
df['series'].rolling(window)→ Rolling window calculations (mean, sum, etc.)df['series'].expanding()→ Expanding window calculationsdf['series'].ewm(span)→ Exponentially weighted moving average
Key Notes:
- Cumulative methods (
cum*) are useful for running totals/products rolling()/expanding()/ewm()return objects for further calculationsnlargest()/nsmallest()preserve original indices by default
5. Pandas Series: Cumulative, Ranking & Rolling
6. Advanced Statistics
1. Relationship & Dispersion Metrics
df['series'].corr(other)→ Pearson correlation with another Series (-1 to 1)df['series'].cov(other)→ Covariance with another Seriesdf['series'].autocorr(lag=1)→ Autocorrelation at specified lag (for time series)
2. Shape & Distribution Metrics
df['series'].skew()→ Skewness (measure of asymmetry)df['series'].kurt()/kurtosis()→ Kurtosis (tailedness; aliaskurtosis)
3. Aggregation & Error Metrics
df['series'].prod()→ Product of all valuesdf['series'].sem()→ Standard error of the mean (σ/√n)
Key Notes:
corr()/cov()require equal-length Series- Positive
skew()= right-tailed, negative = left-tailed - High
kurtosis()= heavy tails (leptokurtic) autocorr()helps detect periodicity in time series
- 6. Pandas Series: Advanced Statistics
7. Missing Data Handling
- Missing Value Detection
df['series'].isna()→ Boolean mask of missing values (alias:isnull())df['series'].notna()→ Boolean mask of non-missing values (alias:notnull())
2. Missing Value Removal
df['series'].dropna()→ Remove missing values (returns new Series)
3. Missing Value Filling
df['series'].fillna(value)→ Fill NA with specified value/methoddf['series'].ffill()→ Forward fill (alias:pad())df['series'].bfill()→ Backward fill (alias:backfill())df['series'].interpolate()→ Fill NA via interpolation
Key Notes:
isna/isnullandnotna/notnullare identical (use whichever you prefer)ffill/padpropagate last valid observation forwardbfill/backfillpropagate next valid observation backwardinterpolateoffers multiple methods (linear, polynomial, etc.)
- 7. Pandas Series: Missing Data Handling
8. Conditional Logic & Boolean Masking
1. Conditional Replacement
df['series'].where(cond, other)→ Keep values wherecondis True, else replace withother- (Preserves original where condition is met)
df['series'].mask(cond, other)→ Replace values wherecondis True withother- (Opposite of
where- modifies where condition is met)
2. Membership Testing
df['series'].isin(values)→ Check if values exist in list/set (valuescan be list-like)
3. Boolean Evaluation
df['series'].all()→ Return True if all elements are True/truthydf['series'].any()→ Return True if any element is True/truthy
Key Notes:
where()vsmask(): Think "keep where" vs "replace where"isin()is ideal for filtering against multiple valuesall()/any()ignore NA values by default (useskipna=Falseto include)
- 8. Pandas Series: Conditional Logic & Boolean Masking
9. Type Conversion & Copying
1. Type Conversion
df['series'].astype(dtype)→ Force cast to specified dtype (e.g.,'int','float','str','category')df['series'].convert_dtypes()→ Convert to best possible nullable dtype (Pandas 1.0+)df['series'].infer_objects()→ Infer better dtype for object columns (soft conversion)
2. Memory & Copy Management
df['series'].copy(deep=True)→ Create a copy (deep=True for full memory copy)df['series'].bool()→ Return the bool value (only for single-element boolean Series)
Key Notes:
astype()vsconvert_dtypes():astype()forces conversion (may lose info)convert_dtypes()intelligently chooses nullable dtypes (e.g.,Int64instead offloatfor missing values)- Use
infer_objects()when dealing with mixed-type object columns bool()raises ValueError if Series doesn't have exactly 1 boolean element
Best Practices:
- Prefer
convert_dtypes()for modern nullable types - Use
astype()when you need specific control - Always
copy()before modifying if you need the original bool()is mainly for scalar boolean extraction
- 9. Pandas Series: Type Conversion & Copying
10. Function Mapping & Transformation
1. Element-wise Operations
df['series'].map(arg)→ Map values using a dict/Series/function (element-wise)df['series'].apply(func)→ Apply a function along the Series (element-wise or aggregate)
2. Aggregation Methods
df['series'].agg(func)→ Aggregate using one or multiple operations (alias:aggregate())- Common funcs:
'sum','mean','min','max', or custom functions
3. Transformation Methods
df['series'].transform(func)→ Return transformed values (same shape as input)- Differs from
agg(): maintains original dimensions
4. Series Combination
df['series'].combine(other, func)→ Combine with another Series using a functiondf['series'].combine_first(other)→ Fill nulls using another Series
Key Differences:
Method | Best For | Returns |
map() | Simple value replacements | Same length |
apply() | Complex element-wise operations | Same length |
agg() | Summarizing data | Scalar/value |
transform() | Group-aware transformations | Same length |
Example Usage:
s1 = pd.Series([1, 2, None])
s2 = pd.Series([10, None, 30])
# Element-wise
s1.map({1: 'a', 2: 'b'}) # → ['a', 'b', None]
s1.apply(lambda x: x*2 if pd.notnull(x) else 0) # → [2, 4, 0]
# Aggregation
s1.agg(['sum', 'mean']) # → Returns Series with both metrics
# Transformation
s1.transform(lambda x: x - x.mean()) # → Center data
# Combination
s1.combine(s2, max) # → [10, 2, 30]
s1.combine_first(s2) # → [1, 2, 30] (fills nulls)When to Use:
- Use
map()for simple value lookups - Use
apply()for custom complex operations - Use
agg()for summaries,transform()for group-wise standardization - Use
combine()for element-wise logic between Series
- 10. Pandas Series: Function Mapping & Transformation
11. Time Series Handling
1. DateTime Conversion
df['datetime_series'].to_period(freq)→ Convert to Period (e.g., 'M' for monthly)df['period_series'].to_timestamp()→ Convert Period back to Timestamp
2. Date Component Accessors (via .dt)
- Basic Components:
.dt.year→ Extract year.dt.month→ Extract month (1-12).dt.day→ Extract day (1-31).dt.hour→ Extract hour (0-23).dt.minute→ Extract minute (0-59).dt.second→ Extract second (0-59)- Advanced Components:
.dt.dayofweek→ Day of week (0=Monday, alias:.weekday).dt.dayofyear→ Day of year (1-366).dt.quarter→ Quarter (1-4).dt.is_month_end→ Bool for month-end dates.dt.is_quarter_end→ Bool for quarter-end dates.dt.is_year_end→ Bool for year-end dates
3. DateTime Formatting & Rounding
.dt.strftime(format)→ Format as string (e.g., "%Y-%m-%d").dt.round(freq)→ Round to nearest frequency (e.g., 'H' for hour).dt.floor(freq)→ Round down to frequency.dt.ceil(freq)→ Round up to frequency.dt.normalize()→ Set time to midnight (00:00:00)
4. TimeDelta Operations (for timedelta Series)
.dt.total_seconds()→ Convert entire duration to seconds
Example Usage:
# Create datetime Series
dates = pd.Series(pd.date_range('2023-01-01', periods=3, freq='M'))
# Access components
dates.dt.month # → [1, 2, 3]
dates.dt.is_quarter_end # → [False, False, True] (March end)
# Formatting
dates.dt.strftime("%Y-%m") # → ['2023-01', '2023-02', '2023-03']
# Rounding
pd.Series([pd.Timestamp('2023-01-01 14:30:45')]).dt.round('H')
# → 2023-01-01 15:00:00Key Notes:
- The
.dtaccessor only works with datetime-like Series - For timezone handling, use:
tz_localize()to assign timezonestz_convert()to convert timezones- Frequency strings (for rounding/periods):
- 'D' = day, 'M' = month, 'Q' = quarter
- 'H' = hour, 'T'/'min' = minute, 'S' = second
Common Use Cases:
- Extracting month/year for grouping
- Flagging period-end dates
- Creating formatted date strings for reports
- Normalizing timestamps to compare dates without times
- 11. Pandas Series: Time Series Handling
12. String Operations
1. Case Conversion
df['text_series'].str.lower()→ Convert to lowercasedf['text_series'].str.upper()→ Convert to uppercasedf['text_series'].str.title()→ Convert to title case (first letter capitalized)
2. Cleaning & Replacement
df['text_series'].str.strip()→ Remove whitespace from both endsdf['text_series'].str.replace(old, new)→ Replace substring patternsdf['text_series'].str.repeat(n)→ Repeat each string n times
3. Pattern Matching
df['text_series'].str.contains(pattern)→ Check for substring/regex matchesdf['text_series'].str.startswith(pattern)→ Check starting charactersdf['text_series'].str.endswith(pattern)→ Check ending charactersdf['text_series'].str.extract(regex)→ Extract regex groups into columns
4. Splitting & Joining
df['text_series'].str.split(sep)→ Split strings by delimiterdf['text_series'].str.get(i)→ Get element at position i after splitdf['text_series'].str.join(sep)→ Join list elements with separator
5. String Properties & Encoding
df['text_series'].str.len()→ Get length of each stringdf['text_series'].str.get_dummies(sep)→ Convert delimited strings to dummy variables
Example Usage:
names = pd.Series([' John Doe ', 'jane SMITH', 'alice cooper'])
# Case conversion
names.str.title() # → [' John Doe ', 'Jane Smith', 'Alice Cooper']
# Cleaning
names.str.strip() # → ['John Doe', 'jane SMITH', 'alice cooper']
# Pattern matching
names.str.contains('oh') # → [True, False, False]
# Splitting
names.str.split().str.get(0) # → ['John', 'jane', 'alice']Key Notes:
- All methods return new Series (original remains unchanged)
- Most methods accept regex patterns (
contains,replace, etc.) - Use
na=Falsein pattern matching to handle missing values - For complex string operations, chain multiple
.strmethods
Common Use Cases:
- Standardizing text data (case, whitespace)
- Extracting parts of strings (e.g., first names)
- Creating features from text patterns
- Preparing text for machine learning (via
get_dummies)
- 12. Pandas Series: String Operations
13. Categoricals
1. Core Categorical Properties (via .cat accessor)
df['cat_series'].cat.codes→ Returns integer codes for each categorydf['cat_series'].cat.categories→ Returns the index of categoriesdf['cat_series'].cat.ordered→ Returns True if categories have logical ordering
2. Category Management
df['cat_series'].cat.rename_categories(new_names)→ Rename categoriesdf['cat_series'].cat.reorder_categories(new_order)→ Reorder categoriesdf['cat_series'].cat.add_categories(new_cats)→ Add new categoriesdf['cat_series'].cat.remove_categories(to_remove)→ Remove specific categoriesdf['cat_series'].cat.remove_unused_categories()→ Remove unused categoriesdf['cat_series'].cat.set_categories(new_cats)→ Set new categories (removes others)
3. Order Control
df['cat_series'].cat.as_ordered()→ Set categories to be ordereddf['cat_series'].cat.as_unordered()→ Remove ordering
Example Usage:
# Create categorical Series
colors = pd.Series(['red', 'blue', 'green'], dtype='category')
# Access properties
colors.cat.codes # → [0, 1, 2] (category indices)
colors.cat.categories # → Index(['blue', 'green', 'red'], dtype='object')
# Modify categories
colors.cat.add_categories(['yellow'])
colors.cat.remove_categories(['green'])
# Change ordering
ordered_colors = colors.cat.as_ordered()Key Notes:
- All modification methods return new Series (original remains unchanged)
- Categories are automatically sorted unless specified otherwise
- Operations preserve category dtype (unlike string operations)
Common Use Cases:
- Memory optimization for string variables with few unique values
- Maintaining fixed category sets (even when some values are missing)
- Statistical modeling with ordered factors
- Grouping operations with controlled category order
Best Practices:
- Use
remove_unused_categories()after filtering - Set ordering when categories have logical sequence (e.g., sizes)
- Predefine categories when you need consistent levels across datasets
13. Pandas Series: Categoricals
14. Sparse Data Handling
1. Sparse Data Properties (via .sparse accessor)
df['sparse_series'].sparse→ Main accessor for sparse operationsdf['sparse_series'].sparse.npoints→ Count of non-fill-value pointsdf['sparse_series'].sparse.density→ Ratio of non-fill values (0-1)df['sparse_series'].sparse.fill_value→ Returns the fill value (default NaN)df['sparse_series'].sparse.sp_values→ Returns the stored non-fill values as numpy array
2. Sparse Conversion Methods
df['sparse_series'].sparse.to_coo()→ Convert to scipy.sparse.coo_matrixpd.Series.sparse.from_coo(coo_matrix)→ Create from scipy.sparse.coo_matrix (class method)
Example Usage:
# Create sparse Series
s = pd.Series([0, 0, 1, 0, 3]).astype('Sparse')
# Access properties
s.sparse.density # → 0.4 (2 non-zero values out of 5)
s.sparse.npoints # → 2
s.sparse.fill_value # → 0
# Convert to COO matrix
coo = s.sparse.to_coo()
# Reconstruct from COO
new_series = pd.Series.sparse.from_coo(coo)Key Notes:
- Optimized for data with >90% fill values (typically zeros/NaNs)
- Automatically converts to dense form during most operations
- Compatible with scipy.sparse matrices for machine learning
Memory Comparison:
python
dense = pd.Series([0]*1_000_000 + [1]) # ~7.6MB
sparse = dense.astype('Sparse[int]') # ~0.1MB (98% reduction)Best Practices:
- Use for high-dimensional sparse data (e.g., transaction records)
- Check
.densitybefore operations to assess sparsity benefits - Convert to dense format for small datasets (overhead outweighs benefits)
- 14. Pandas Series: Sparse Data Handling
15. Export & Conversion
1. File Export Formats
df['series'].to_pickle(path)→ Save as Python pickle file (preserves dtypes)df['series'].to_csv(path)→ Export to CSV (with optional header/index)df['series'].to_excel(path)→ Write to Excel sheet (requiresopenpyxl/xlsxwriter)df['series'].to_hdf(path, key)→ Store in HDF5 file (for large datasets)df['series'].to_sql(name, con)→ Write to SQL database (requires SQLAlchemy)
2. Data Structure Conversion
df['series'].to_dict()→ Convert to Python dictionary {index: value}df['series'].to_frame()→ Convert to single-column DataFramedf['series'].to_xarray()→ Convert to xarray.DataArray (for multidimensional data)
3. String/Text Formats
df['series'].to_json()→ Serialize to JSON string/filedf['series'].to_string()→ Render as console-friendly stringdf['series'].to_latex()→ Generate LaTeX table formatdf['series'].to_markdown()→ Convert to Markdown table
4. Utility Outputs
df['series'].to_clipboard()→ Copy to system clipboard (for Excel pasting)
Key Comparison Table:
Method | Best For | Preserves Dtype | Human Readable |
to_pickle() | Python object serialization | ✓ | ✗ |
to_csv() | Interoperability | ✗ | ✓ |
to_dict() | Python integration | ✓ | ✓ |
to_json() | Web APIs | Partial | ✓ |
to_hdf() | Large datasets | ✓ | ✗ |
Example Usage:
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
# File export
s.to_csv('data.csv') # Creates:
# a,1
# b,2
# c,3
# Structure conversion
s.to_dict() # → {'a': 1, 'b': 2, 'c': 3}
# Specialized formats
s.to_markdown() # Returns:
# | | 0 |
# |---|-----|
# | a | 1 |
# | b | 2 |
# | c | 3 |Best Practices:
- Use
picklefor temporary Python-only storage - Prefer
csv/jsonfor cross-language compatibility - For large datasets (>1GB), consider
hdforparquet(viapyarrow) - Use
to_clipboard()for quick Excel transfers
- 15. Pandas Series: Export & Conversion
16. Advanced & Utility Methods
- 16. Pandas Series: Advanced & Utility Methods