800 words
4 minutes
Python text.split()

I. text.split() in Python
split() is a Python string method (字符串方法) that divides a string into a list of substrings. It's one of the most commonly used tools for text processing.
1. Basic Usage
text = "Python is awesome"result = text.split()print(result) # Output: ['Python', 'is', 'awesome']
By default,
split() uses whitespace characters (空白字符) as delimiters: spaces, newlines \n, tabs \t, etc.
2. Parameter Details
1) split() vs split(' ') Difference
text = "Python is awesome" # Multiple spaces
# Default split() - handles any amount of whitespaceprint(text.split()) # Output: ['Python', 'is', 'awesome']
# split(' ') - strictly splits on single spaceprint(text.split(' ')) # Output: ['Python', '', '', 'is', '', '', 'awesome']2) Specifying Separator sep
data = "apple,banana,orange"print(data.split(',')) # Output: ['apple', 'banana', 'orange']
path = "user/local/bin" # slash/backslashprint(path.split('/')) # Output: ['user', 'local', 'bin']
sentence = "Hello-World-Python"print(sentence.split('-')) # Output: ['Hello', 'World', 'Python']3) Limiting Splits with maxsplit
The *message syntax is used for Extended Unpacking (扩展解包).
date,time,level: These take the first 3 elements of the list respectively.*message: This collects all remaining elements into a List (列表).message = parts[3:] # ['Connection', 'failed'] ← all remaining as list
text = "one two three four five"
# Split only first 2 timesprint(text.split(maxsplit=2)) # Output: ['one', 'two', 'three four five']
# Equivalent syntaxprint(text.split(' ', 2)) # Output: ['one', 'two', 'three four five']
# Practical example: parsing simple logslog = "2024-01-15 10:30:45 ERROR Connection failed"date, time, level, *message = log.split(maxsplit=3)print(f"Date: {date}, Time: {time}, Level: {level}, Message: {message}")# Output: Date: 2024-01-15, Time: 10:30:45, Level: ERROR, Message: ['Connection', 'failed']3. Common Use Cases
1) Word Frequency Counting
from collections import Counter
text = "Python is awesome. Python is powerful!"words = text.lower().split() # Convert to lowercase then split# Note: Punctuation remains! Output: ['python', 'is', 'awesome.', 'python', 'is', 'powerful!']
# Better approach: clean punctuationimport rewords = re.findall(r'\w+', text.lower())print(Counter(words)) # Output: Counter({'python': 2, 'is': 2, 'awesome': 1, 'powerful': 1})| Part | Meaning | Explanation |
|---|---|---|
re | regex module | Python’s regular expression library (正则表达式库) |
.findall() | find all matches | Returns all non-overlapping matches as a list |
r'' | raw string | Raw string (原始字符串) - backslashes are treated literally |
\w | word character | Matches letters, digits, and underscore (字母、数字、下划线) |
+ | one or more | Quantifier (量词) - match 1 or more occurrences |
text.lower() | lowercase | Converts everything to lowercase (小写) for case-insensitive counting |
2) Parsing CSV (Simple Cases)
csv_line = "John,25,Engineer,New York"name, age, job, city = csv_line.split(',')print(f"{name} is {age} years old, works as {job} in {city}")# Output: John is 25 years old, works as Engineer in New York3) Handling User Input
# Parsing commandscommand = "save document.txt"action, filename = command.split(maxsplit=1)print(f"Action: {action}, File: {filename}") # Output: Action: save, File: document.txt
# Processing multiple inputsuser_input = "5 10 15"numbers = [int(x) for x in user_input.split()]print(sum(numbers)) # Output: 304. Important Notes
⚠️ Common Pitfalls:
- Empty string:
text = ""print(text.split()) # Output: [] (empty list)print(text.split(',')) # Output: [''] (list with one element)- Separator not found:
text = "hello world"print(text.split(',')) # Output: ['hello world']- Consecutive separators:
text = "a,,b,c"print(text.split(',')) # Output: ['a', '', 'b', 'c']- Return value is always a list:
text = "python"result = text.split()print(type(result)) # Output: <class 'list'>print(result) # Output: ['python']5. Method Comparison
| Method | Purpose | Example | Result |
|---|---|---|---|
split() | Split by whitespace | "a b c".split() | ['a', 'b', 'c'] |
split(' ') | Split by single space | "a b".split(' ') | ['a', '', 'b'] |
rsplit() | Split from right | "a-b-c".rsplit('-',1) | ['a-b', 'c'] |
splitlines() | Split by line breaks | "a\nb".splitlines() | ['a', 'b'] |
partition() | Split into 3 parts | "a-b-c".partition('-') | ('a', '-', 'b-c') |
6. Practical Example: Parsing Configuration Files
config = """host=localhostport=8080debug=true"""
settings = {}for line in config.strip().split('\n'): if '=' in line: key, value = line.split('=', 1) settings[key] = value
print(settings)# Output: {'host': 'localhost', 'port': '8080', 'debug': 'true'}7. Advanced Techniques
1) Using split() with List Comprehension
# Extract numbers from mixed stringdata = "age:25,score:95,weight:70"values = [item.split(':')[1] for item in data.split(',')]print(values) # Output: ['25', '95', '70']
# Convert to appropriate typesnumeric_values = [int(item.split(':')[1]) for item in data.split(',')]print(numeric_values) # Output: [25, 95, 70]2) Handling Multiple Delimiters
import re
text = "apple;banana,orange|grape"# Split on ; , or |fruits = re.split('[;,]', text) # Simple casefruits = re.split('[;,\|]', text) # With escape for |print(fruits) # Output: ['apple', 'banana', 'orange', 'grape']3) Preserving Delimiters
# Using re.split() with capturing group keeps delimiterstext = "hello-world-python"parts = re.split('(-)', text)print(parts) # Output: ['hello', '-', 'world', '-', 'python']💡 One-line Takeaway
text.split() splits strings into word lists using whitespace; text.split(sep) splits by a specified delimiter; and the maxsplit parameter controls how many splits to perform. Python text.split()
https://lxy-alexander.github.io/blog/posts/python/python-textsplit/