Status
Done
TABLE OF CONTENT
1. Case Conversion
df['text_series'].str.lower() → Convert to lowercasedf['text_series'].str.upper() → Convert to uppercasedf['text_series'].str.title() → Convert to title case (first letter capitalized)
2. Cleaning & Replacement
df['text_series'].str.strip() → Remove whitespace from both endsdf['text_series'].str.replace(old, new) → Replace substring patternsdf['text_series'].str.repeat(n) → Repeat each string n times
3. Pattern Matching
df['text_series'].str.contains(pattern) → Check for substring/regex matchesdf['text_series'].str.startswith(pattern) → Check starting charactersdf['text_series'].str.endswith(pattern) → Check ending charactersdf['text_series'].str.extract(regex) → Extract regex groups into columns
4. Splitting & Joining
df['text_series'].str.split(sep) → Split strings by delimiterdf['text_series'].str.get(i) → Get element at position i after splitdf['text_series'].str.join(sep) → Join list elements with separator
5. String Properties & Encoding
df['text_series'].str.len() → Get length of each stringdf['text_series'].str.get_dummies(sep) → Convert delimited strings to dummy variables
Example Usage:
names = pd.Series([' John Doe ', 'jane SMITH', 'alice cooper'])
# Case conversion
names.str.title() # → [' John Doe ', 'Jane Smith', 'Alice Cooper']
# Cleaning
names.str.strip() # → ['John Doe', 'jane SMITH', 'alice cooper']
# Pattern matching
names.str.contains('oh') # → [True, False, False]
# Splitting
names.str.split().str.get(0) # → ['John', 'jane', 'alice']Key Notes:
- All methods return new Series (original remains unchanged)
- Most methods accept regex patterns (
contains,Âreplace, etc.) - UseÂ
na=False in pattern matching to handle missing values - For complex string operations, chain multipleÂ
.str methods
Common Use Cases:
- Standardizing text data (case, whitespace)
- Extracting parts of strings (e.g., first names)
- Creating features from text patterns
- Preparing text for machine learning (viaÂ
get_dummies)
‣
1. Case Conversion Methods
‣
2. Text Cleaning Methods
‣
3. Pattern Matching & Extraction
‣
4. Splitting & Combining Strings
‣
5. String Properties & Encoding
‣
Practical Applications
‣
Performance Considerations
‣