1. Series df[series]

Created

Sep 1, 2025 10:14 PM

Multi-select

Status

Done

Table of content

~~Series Structure~~

1. Series Structure

1. Basic Series Properties

df['series'].index → Returns the index (row labels) of the Series.
df['series'].name → Returns the name of the Series (if assigned).
df['series'].dtype → Returns the data type of the Series elements.
df['series'].dtypes → Alias for dtype (same output).
df['series'].shape → Returns a tuple (n_rows,) representing dimensions.
df['series'].ndim → Returns the number of dimensions (1 for Series).
df['series'].size → Returns the number of elements in the Series.
df['series'].values → Returns the data as a NumPy array.
df['series'].array → Returns the underlying PandasArray (extension array).

2. Boolean Checks

df['series'].empty → Returns True if the Series is empty.
df['series'].hasnans → Returns True if the Series contains any NaN values.

3. Advanced Attributes (Less Common)

df['series'].flags → Returns a Flags object with memory/performance settings.
df['series'].set_flags → Modifies the flags (e.g., copy_on_write).
df['series'].attrs → Returns a dictionary of custom metadata attributes.

Key Notes:

Most frequently used: index, name, dtype, shape, size, values.
For missing values: hasnans is useful for quick NaN checks.
Advanced use cases: flags, set_flags, and attrs are rarely needed in everyday workflows.

👌🏼

1. Pandas Series: Series Structure

2. Access & Selection

1. Label-Based & Positional Access

df['series'].at → Fast label-based scalar access (single value, like a dictionary).
df['series'].iat → Fast integer-position-based scalar access (like at but for positions).
df['series'].loc → Label-based indexing (slicing, single/multiple values).
df['series'].iloc → Integer-position-based indexing (slicing, single/multiple values).

Key Difference:

Use at/iat for single values (optimized for speed).
Use loc/iloc for slices or multiple values.

2. Value Retrieval & Conversion

df['series'].get(key) → Safe value access (returns None/default if key missing, like dict.get).
df['series'].item() → Extract the single value from a 1-element Series (raises error if size ≠ 1).
df['series'].to_numpy() → Convert Series to a NumPy array (preferred over .values in modern Pandas).
df['series'].array → Returns the underlying Pandas extension array (e.g., IntegerArray, StringArray).

3. Data Removal & Iteration

df['series'].pop(key) → Remove and return a value by label (modifies the Series in-place).
df['series'].items() → Iterate over (index, value) pairs (like dict.items).
df['series'].iter() → Alias for .items() (older Pandas versions).
df['series'].keys() → Alias for .index (returns index labels).

4. Advanced Selection

df['series'].xs(key) → Cross-section (select value by label, similar to .loc but less common).

When to Use What?

Need speed for single values? → at (label) / iat (position).
Slicing or filtering? → loc (label) / iloc (position).
Safe access with fallback? → get().
Convert to NumPy? → to_numpy() (modern) or .values (legacy).
Iterate? → items() or iter().

📎

2. Pandas Series: Access & Selection

3. Math Operations

1. Basic Arithmetic Operations

df['series'].add(other) → Addition (+ operator equivalent).
df['series'].sub(other) → Subtraction ( operator equivalent).
df['series'].mul(other) → Multiplication ( operator equivalent).
df['series'].div(other) → Division (/ operator, deprecated, use truediv).
df['series'].truediv(other) → True division (float division, / operator).
df['series'].floordiv(other) → Floor division (integer division, // operator).
df['series'].mod(other) → Modulo/remainder (% operator).
df['series'].pow(other) → Exponentiation (* operator).

2. Reverse Arithmetic Operations (Reflected)

These reverse the operand order (e.g., radd computes other + series instead of series + other).

df['series'].radd(other) → Reverse addition (other + series).
df['series'].rsub(other) → Reverse subtraction (other - series).
df['series'].rmul(other) → Reverse multiplication (other * series).
df['series'].rdiv(other) → Reverse division (other / series, deprecated, use rtruediv).
df['series'].rtruediv(other) → Reverse true division (other / series).
df['series'].rfloordiv(other) → Reverse floor division (other // series).
df['series'].rmod(other) → Reverse modulo (other % series).
df['series'].rpow(other) → Reverse exponentiation (other ** series).

3. Comparison Operations

df['series'].eq(other) → Equal to (== operator).
df['series'].ne(other) → Not equal to (!= operator).
df['series'].lt(other) → Less than (< operator).
df['series'].le(other) → Less than or equal to (<= operator).
df['series'].gt(other) → Greater than (> operator).
df['series'].ge(other) → Greater than or equal to (>= operator).
df['series'].between(left, right) → Check if values are between left and right(inclusive).

📎

3. Pandas Series:Math Operations

4. Statistical Summaries

Summary Statistics

df['series'].describe() → Quick summary (count, mean, std, min, max, quartiles).
df['series'].count() → Count of non-NA values.
df['series'].sum() → Sum of values.
df['series'].mean() → Arithmetic mean.
df['series'].median() → Median (50th percentile).
df['series'].mode() → Most frequent value(s).
df['series'].min() → Minimum value.
df['series'].max() → Maximum value.
df['series'].std() → Standard deviation.
df['series'].var() → Variance.

2. Unique Values & Counting

df['series'].nunique() → Number of unique values.
df['series'].unique() → Array of unique values.
df['series'].value_counts() → Counts of each unique value.

3. Index Locations

df['series'].idxmax() → Index of first max value.
df['series'].idxmin() → Index of first min value.

Key Notes:

Use describe() for an instant overview.
value_counts() is ideal for categorical data.
idxmax/idxmin return positions, not values.

📎

4. Pandas Series: Statistical Summaries

5. Cumulative, Ranking & Rolling

1. Cumulative Calculations

df['series'].cumsum() → Cumulative sum
df['series'].cumprod() → Cumulative product
df['series'].cummax() → Cumulative maximum
df['series'].cummin() → Cumulative minimum

2. Ranking & Percent Changes

df['series'].rank() → Rank values (with tie-breaking methods)
df['series'].pct_change() → Percentage change between elements
df['series'].quantile(q) → Value at quantile q (0-1)

3. Extreme Value Selection

df['series'].nlargest(n) → Top n largest values
df['series'].nsmallest(n) → Top n smallest values

4. Window Calculations

df['series'].rolling(window) → Rolling window calculations (mean, sum, etc.)
df['series'].expanding() → Expanding window calculations
df['series'].ewm(span) → Exponentially weighted moving average

Key Notes:

Cumulative methods (cum*) are useful for running totals/products
rolling()/expanding()/ewm() return objects for further calculations
nlargest()/nsmallest() preserve original indices by default

📎

5. Pandas Series: Cumulative, Ranking & Rolling

6. Advanced Statistics

1. Relationship & Dispersion Metrics

df['series'].corr(other) → Pearson correlation with another Series (-1 to 1)
df['series'].cov(other) → Covariance with another Series
df['series'].autocorr(lag=1) → Autocorrelation at specified lag (for time series)

2. Shape & Distribution Metrics

df['series'].skew() → Skewness (measure of asymmetry)
df['series'].kurt() / kurtosis() → Kurtosis (tailedness; alias kurtosis)

3. Aggregation & Error Metrics

df['series'].prod() → Product of all values
df['series'].sem() → Standard error of the mean (σ/√n)

Key Notes:

corr()/cov() require equal-length Series
Positive skew() = right-tailed, negative = left-tailed
High kurtosis() = heavy tails (leptokurtic)
autocorr() helps detect periodicity in time series

📎

6. Pandas Series: Advanced Statistics

7. Missing Data Handling

Missing Value Detection

df['series'].isna() → Boolean mask of missing values (alias: isnull())
df['series'].notna() → Boolean mask of non-missing values (alias: notnull())

2. Missing Value Removal

df['series'].dropna() → Remove missing values (returns new Series)

3. Missing Value Filling

df['series'].fillna(value) → Fill NA with specified value/method
df['series'].ffill() → Forward fill (alias: pad())
df['series'].bfill() → Backward fill (alias: backfill())
df['series'].interpolate() → Fill NA via interpolation

Key Notes:

isna/isnull and notna/notnull are identical (use whichever you prefer)
ffill/pad propagate last valid observation forward
bfill/backfill propagate next valid observation backward
interpolate offers multiple methods (linear, polynomial, etc.)

📎

7. Pandas Series: Missing Data Handling

8. Conditional Logic & Boolean Masking

1. Conditional Replacement

df['series'].where(cond, other) → Keep values where cond is True, else replace with other
(Preserves original where condition is met)
df['series'].mask(cond, other) → Replace values where cond is True with other
(Opposite of where - modifies where condition is met)

2. Membership Testing

df['series'].isin(values) → Check if values exist in list/set (values can be list-like)

3. Boolean Evaluation

df['series'].all() → Return True if all elements are True/truthy
df['series'].any() → Return True if any element is True/truthy

Key Notes:

where() vs mask(): Think "keep where" vs "replace where"
isin() is ideal for filtering against multiple values
all()/any() ignore NA values by default (use skipna=False to include)

📎

8. Pandas Series: Conditional Logic & Boolean Masking

9. Type Conversion & Copying

1. Type Conversion

df['series'].astype(dtype) → Force cast to specified dtype (e.g., 'int', 'float', 'str', 'category')
df['series'].convert_dtypes() → Convert to best possible nullable dtype (Pandas 1.0+)
df['series'].infer_objects() → Infer better dtype for object columns (soft conversion)

2. Memory & Copy Management

df['series'].copy(deep=True) → Create a copy (deep=True for full memory copy)
df['series'].bool() → Return the bool value (only for single-element boolean Series)

Key Notes:

astype() vs convert_dtypes():

astype() forces conversion (may lose info)
convert_dtypes() intelligently chooses nullable dtypes (e.g., Int64 instead of float for missing values)

Use infer_objects() when dealing with mixed-type object columns
bool() raises ValueError if Series doesn't have exactly 1 boolean element

Best Practices:

Prefer convert_dtypes() for modern nullable types
Use astype() when you need specific control
Always copy() before modifying if you need the original
bool() is mainly for scalar boolean extraction

📎

9. Pandas Series: Type Conversion & Copying

10. Function Mapping & Transformation

1. Element-wise Operations

df['series'].map(arg) → Map values using a dict/Series/function (element-wise)
df['series'].apply(func) → Apply a function along the Series (element-wise or aggregate)

2. Aggregation Methods

df['series'].agg(func) → Aggregate using one or multiple operations (alias: aggregate())
Common funcs: 'sum', 'mean', 'min', 'max', or custom functions

3. Transformation Methods

df['series'].transform(func) → Return transformed values (same shape as input)
Differs from agg(): maintains original dimensions

4. Series Combination

df['series'].combine(other, func) → Combine with another Series using a function
df['series'].combine_first(other) → Fill nulls using another Series

Key Differences:

Method	Best For	Returns
`map()`	Simple value replacements	Same length
`apply()`	Complex element-wise operations	Same length
`agg()`	Summarizing data	Scalar/value
`transform()`	Group-aware transformations	Same length

Example Usage:

s1 = pd.Series([1, 2, None])
s2 = pd.Series([10, None, 30])

# Element-wise
s1.map({1: 'a', 2: 'b'})          # → ['a', 'b', None]
s1.apply(lambda x: x*2 if pd.notnull(x) else 0) # → [2, 4, 0]

# Aggregation
s1.agg(['sum', 'mean'])            # → Returns Series with both metrics

# Transformation
s1.transform(lambda x: x - x.mean()) # → Center data

# Combination
s1.combine(s2, max)                # → [10, 2, 30]
s1.combine_first(s2)               # → [1, 2, 30] (fills nulls)

When to Use:

Use map() for simple value lookups
Use apply() for custom complex operations
Use agg() for summaries, transform() for group-wise standardization
Use combine() for element-wise logic between Series

📎

10. Pandas Series: Function Mapping & Transformation

11. Time Series Handling

1. DateTime Conversion

df['datetime_series'].to_period(freq) → Convert to Period (e.g., 'M' for monthly)
df['period_series'].to_timestamp() → Convert Period back to Timestamp

2. Date Component Accessors (via .dt)

Basic Components:

.dt.year → Extract year
.dt.month → Extract month (1-12)
.dt.day → Extract day (1-31)
.dt.hour → Extract hour (0-23)
.dt.minute → Extract minute (0-59)
.dt.second → Extract second (0-59)

Advanced Components:

.dt.dayofweek → Day of week (0=Monday, alias: .weekday)
.dt.dayofyear → Day of year (1-366)
.dt.quarter → Quarter (1-4)
.dt.is_month_end → Bool for month-end dates
.dt.is_quarter_end → Bool for quarter-end dates
.dt.is_year_end → Bool for year-end dates

3. DateTime Formatting & Rounding

.dt.strftime(format) → Format as string (e.g., "%Y-%m-%d")
.dt.round(freq) → Round to nearest frequency (e.g., 'H' for hour)
.dt.floor(freq) → Round down to frequency
.dt.ceil(freq) → Round up to frequency
.dt.normalize() → Set time to midnight (00:00:00)

4. TimeDelta Operations (for timedelta Series)

.dt.total_seconds() → Convert entire duration to seconds

Example Usage:

# Create datetime Series
dates = pd.Series(pd.date_range('2023-01-01', periods=3, freq='M'))

# Access components
dates.dt.month          # → [1, 2, 3]
dates.dt.is_quarter_end # → [False, False, True] (March end)

# Formatting
dates.dt.strftime("%Y-%m") # → ['2023-01', '2023-02', '2023-03']

# Rounding
pd.Series([pd.Timestamp('2023-01-01 14:30:45')]).dt.round('H')
# → 2023-01-01 15:00:00

Key Notes:

The .dt accessor only works with datetime-like Series
For timezone handling, use:

tz_localize() to assign timezones
tz_convert() to convert timezones

Frequency strings (for rounding/periods):

'D' = day, 'M' = month, 'Q' = quarter
'H' = hour, 'T'/'min' = minute, 'S' = second

Common Use Cases:

Extracting month/year for grouping
Flagging period-end dates
Creating formatted date strings for reports
Normalizing timestamps to compare dates without times

📎

11. Pandas Series: Time Series Handling

12. String Operations

1. Case Conversion

df['text_series'].str.lower() → Convert to lowercase
df['text_series'].str.upper() → Convert to uppercase
df['text_series'].str.title() → Convert to title case (first letter capitalized)

2. Cleaning & Replacement

df['text_series'].str.strip() → Remove whitespace from both ends
df['text_series'].str.replace(old, new) → Replace substring patterns
df['text_series'].str.repeat(n) → Repeat each string n times

3. Pattern Matching

df['text_series'].str.contains(pattern) → Check for substring/regex matches
df['text_series'].str.startswith(pattern) → Check starting characters
df['text_series'].str.endswith(pattern) → Check ending characters
df['text_series'].str.extract(regex) → Extract regex groups into columns

4. Splitting & Joining

df['text_series'].str.split(sep) → Split strings by delimiter
df['text_series'].str.get(i) → Get element at position i after split
df['text_series'].str.join(sep) → Join list elements with separator

5. String Properties & Encoding

df['text_series'].str.len() → Get length of each string
df['text_series'].str.get_dummies(sep) → Convert delimited strings to dummy variables

Example Usage:

names = pd.Series([' John Doe ', 'jane SMITH', 'alice cooper'])

# Case conversion
names.str.title()        # → [' John Doe ', 'Jane Smith', 'Alice Cooper']

# Cleaning
names.str.strip()        # → ['John Doe', 'jane SMITH', 'alice cooper']

# Pattern matching
names.str.contains('oh') # → [True, False, False]

# Splitting
names.str.split().str.get(0)  # → ['John', 'jane', 'alice']

Key Notes:

All methods return new Series (original remains unchanged)
Most methods accept regex patterns (contains, replace, etc.)
Use na=False in pattern matching to handle missing values
For complex string operations, chain multiple .str methods

Common Use Cases:

Standardizing text data (case, whitespace)
Extracting parts of strings (e.g., first names)
Creating features from text patterns
Preparing text for machine learning (via get_dummies)

📎

12. Pandas Series: String Operations

13. Categoricals

1. Core Categorical Properties (via .cat accessor)

df['cat_series'].cat.codes → Returns integer codes for each category
df['cat_series'].cat.categories → Returns the index of categories
df['cat_series'].cat.ordered → Returns True if categories have logical ordering

2. Category Management

df['cat_series'].cat.rename_categories(new_names) → Rename categories
df['cat_series'].cat.reorder_categories(new_order) → Reorder categories
df['cat_series'].cat.add_categories(new_cats) → Add new categories
df['cat_series'].cat.remove_categories(to_remove) → Remove specific categories
df['cat_series'].cat.remove_unused_categories() → Remove unused categories
df['cat_series'].cat.set_categories(new_cats) → Set new categories (removes others)

3. Order Control

df['cat_series'].cat.as_ordered() → Set categories to be ordered
df['cat_series'].cat.as_unordered() → Remove ordering

Example Usage:

# Create categorical Series
colors = pd.Series(['red', 'blue', 'green'], dtype='category')

# Access properties
colors.cat.codes        # → [0, 1, 2] (category indices)
colors.cat.categories   # → Index(['blue', 'green', 'red'], dtype='object')

# Modify categories
colors.cat.add_categories(['yellow'])
colors.cat.remove_categories(['green'])

# Change ordering
ordered_colors = colors.cat.as_ordered()

Key Notes:

All modification methods return new Series (original remains unchanged)
Categories are automatically sorted unless specified otherwise
Operations preserve category dtype (unlike string operations)

Common Use Cases:

Memory optimization for string variables with few unique values
Maintaining fixed category sets (even when some values are missing)
Statistical modeling with ordered factors
Grouping operations with controlled category order

Best Practices:

Use remove_unused_categories() after filtering
Set ordering when categories have logical sequence (e.g., sizes)
Predefine categories when you need consistent levels across datasets

📎

13. Pandas Series: Categoricals

14. Sparse Data Handling

1. Sparse Data Properties (via .sparse accessor)

df['sparse_series'].sparse → Main accessor for sparse operations
df['sparse_series'].sparse.npoints → Count of non-fill-value points
df['sparse_series'].sparse.density → Ratio of non-fill values (0-1)
df['sparse_series'].sparse.fill_value → Returns the fill value (default NaN)
df['sparse_series'].sparse.sp_values → Returns the stored non-fill values as numpy array

2. Sparse Conversion Methods

df['sparse_series'].sparse.to_coo() → Convert to scipy.sparse.coo_matrix
pd.Series.sparse.from_coo(coo_matrix) → Create from scipy.sparse.coo_matrix (class method)

Example Usage:

# Create sparse Series
s = pd.Series([0, 0, 1, 0, 3]).astype('Sparse')

# Access properties
s.sparse.density       # → 0.4 (2 non-zero values out of 5)
s.sparse.npoints       # → 2
s.sparse.fill_value    # → 0

# Convert to COO matrix
coo = s.sparse.to_coo()

# Reconstruct from COO
new_series = pd.Series.sparse.from_coo(coo)

Key Notes:

Optimized for data with >90% fill values (typically zeros/NaNs)
Automatically converts to dense form during most operations
Compatible with scipy.sparse matrices for machine learning

Memory Comparison:

python

dense = pd.Series([0]*1_000_000 + [1])        # ~7.6MB
sparse = dense.astype('Sparse[int]')           # ~0.1MB (98% reduction)

Best Practices:

Use for high-dimensional sparse data (e.g., transaction records)
Check .density before operations to assess sparsity benefits
Convert to dense format for small datasets (overhead outweighs benefits)

📎

14. Pandas Series: Sparse Data Handling

15. Export & Conversion

1. File Export Formats

df['series'].to_pickle(path) → Save as Python pickle file (preserves dtypes)
df['series'].to_csv(path) → Export to CSV (with optional header/index)
df['series'].to_excel(path) → Write to Excel sheet (requires openpyxl/xlsxwriter)
df['series'].to_hdf(path, key) → Store in HDF5 file (for large datasets)
df['series'].to_sql(name, con) → Write to SQL database (requires SQLAlchemy)

2. Data Structure Conversion

df['series'].to_dict() → Convert to Python dictionary {index: value}
df['series'].to_frame() → Convert to single-column DataFrame
df['series'].to_xarray() → Convert to xarray.DataArray (for multidimensional data)

3. String/Text Formats

df['series'].to_json() → Serialize to JSON string/file
df['series'].to_string() → Render as console-friendly string
df['series'].to_latex() → Generate LaTeX table format
df['series'].to_markdown() → Convert to Markdown table

4. Utility Outputs

df['series'].to_clipboard() → Copy to system clipboard (for Excel pasting)

Key Comparison Table:

Method	Best For	Preserves Dtype	Human Readable
`to_pickle()`	Python object serialization	✓	✗
`to_csv()`	Interoperability	✗	✓
`to_dict()`	Python integration	✓	✓
`to_json()`	Web APIs	Partial	✓
`to_hdf()`	Large datasets	✓	✗

Example Usage:

s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

# File export
s.to_csv('data.csv')  # Creates:
# a,1
# b,2
# c,3

# Structure conversion
s.to_dict()  # → {'a': 1, 'b': 2, 'c': 3}

# Specialized formats
s.to_markdown()  # Returns:
# |   |   0 |
# |---|-----|
# | a |   1 |
# | b |   2 |
# | c |   3 |

Best Practices:

Use pickle for temporary Python-only storage
Prefer csv/json for cross-language compatibility
For large datasets (>1GB), consider hdf or parquet (via pyarrow)
Use to_clipboard() for quick Excel transfers

📎

15. Pandas Series: Export & Conversion

16. Advanced & Utility Methods

‣

Alignment & Comparison

‣

Index Handling

‣

Reshaping & Structure

‣

Sorting & Selection

‣

Data Sampling

‣

Duplicate Handling

‣

Data Cleaning

‣

Naming & Formatting

‣

Special Operations

📎

16. Pandas Series: Advanced & Utility Methods

‣

1. Series df[series]

1. Series Structure

2. Access & Selection

3. Math Operations

4. Statistical Summaries

5. Cumulative, Ranking & Rolling

6. Advanced Statistics

7. Missing Data Handling

8. Conditional Logic & Boolean Masking

9. Type Conversion & Copying

10. Function Mapping & Transformation

11. Time Series Handling

12. String Operations

13. Categoricals

14. Sparse Data Handling

15. Export & Conversion

16. Advanced & Utility Methods

Pandas: Series User Guide