Status
Done
TABLE OF CONTENT
1. Case Conversion
df['text_series'].str.lower() → Convert to lowercasedf['text_series'].str.upper() → Convert to uppercasedf['text_series'].str.title() → Convert to title case (first letter capitalized)
2. Cleaning & Replacement
df['text_series'].str.strip() → Remove whitespace from both endsdf['text_series'].str.replace(old, new) → Replace substring patternsdf['text_series'].str.repeat(n) → Repeat each string n times
3. Pattern Matching
df['text_series'].str.contains(pattern) → Check for substring/regex matchesdf['text_series'].str.startswith(pattern) → Check starting charactersdf['text_series'].str.endswith(pattern) → Check ending charactersdf['text_series'].str.extract(regex) → Extract regex groups into columns
4. Splitting & Joining
df['text_series'].str.split(sep) → Split strings by delimiterdf['text_series'].str.get(i) → Get element at position i after splitdf['text_series'].str.join(sep) → Join list elements with separator
5. String Properties & Encoding
df['text_series'].str.len() → Get length of each stringdf['text_series'].str.get_dummies(sep) → Convert delimited strings to dummy variables
Example Usage:
Key Notes:
- All methods return new Series (original remains unchanged)
- Most methods accept regex patterns (
contains,Âreplace, etc.) - UseÂ
na=False in pattern matching to handle missing values - For complex string operations, chain multipleÂ
.str methods
Common Use Cases:
- Standardizing text data (case, whitespace)
- Extracting parts of strings (e.g., first names)
- Creating features from text patterns
- Preparing text for machine learning (viaÂ
get_dummies)
‣
1. Case Conversion Methods
‣
2. Text Cleaning Methods
‣
3. Pattern Matching & Extraction
‣
4. Splitting & Combining Strings
‣
5. String Properties & Encoding
‣
Practical Applications
‣
Performance Considerations
‣