800 words
4 minutes
Python text.split()

I. text.split() in Python#

split() is a Python string method (字符串方法) that divides a string into a list of substrings. It's one of the most commonly used tools for text processing.

1. Basic Usage#

text = "Python is awesome"
result = text.split()
print(result) # Output: ['Python', 'is', 'awesome']
By default, split() uses whitespace characters (空白字符) as delimiters: spaces, newlines \n, tabs \t, etc.

2. Parameter Details#

1) split() vs split(' ') Difference#

text = "Python is awesome" # Multiple spaces
# Default split() - handles any amount of whitespace
print(text.split()) # Output: ['Python', 'is', 'awesome']
# split(' ') - strictly splits on single space
print(text.split(' ')) # Output: ['Python', '', '', 'is', '', '', 'awesome']

2) Specifying Separator sep#

data = "apple,banana,orange"
print(data.split(',')) # Output: ['apple', 'banana', 'orange']
path = "user/local/bin" # slash/backslash
print(path.split('/')) # Output: ['user', 'local', 'bin']
sentence = "Hello-World-Python"
print(sentence.split('-')) # Output: ['Hello', 'World', 'Python']

3) Limiting Splits with maxsplit#

The *message syntax is used for Extended Unpacking (扩展解包).

  • date, time, level: These take the first 3 elements of the list respectively.
  • *message: This collects all remaining elements into a List (列表). message = parts[3:] # ['Connection', 'failed'] ← all remaining as list
text = "one two three four five"
# Split only first 2 times
print(text.split(maxsplit=2)) # Output: ['one', 'two', 'three four five']
# Equivalent syntax
print(text.split(' ', 2)) # Output: ['one', 'two', 'three four five']
# Practical example: parsing simple logs
log = "2024-01-15 10:30:45 ERROR Connection failed"
date, time, level, *message = log.split(maxsplit=3)
print(f"Date: {date}, Time: {time}, Level: {level}, Message: {message}")
# Output: Date: 2024-01-15, Time: 10:30:45, Level: ERROR, Message: ['Connection', 'failed']

3. Common Use Cases#

1) Word Frequency Counting#

from collections import Counter
text = "Python is awesome. Python is powerful!"
words = text.lower().split() # Convert to lowercase then split
# Note: Punctuation remains! Output: ['python', 'is', 'awesome.', 'python', 'is', 'powerful!']
# Better approach: clean punctuation
import re
words = re.findall(r'\w+', text.lower())
print(Counter(words)) # Output: Counter({'python': 2, 'is': 2, 'awesome': 1, 'powerful': 1})
PartMeaningExplanation
reregex modulePython’s regular expression library (正则表达式库)
.findall()find all matchesReturns all non-overlapping matches as a list
r''raw stringRaw string (原始字符串) - backslashes are treated literally
\wword characterMatches letters, digits, and underscore (字母、数字、下划线)
+one or moreQuantifier (量词) - match 1 or more occurrences
text.lower()lowercaseConverts everything to lowercase (小写) for case-insensitive counting

2) Parsing CSV (Simple Cases)#

csv_line = "John,25,Engineer,New York"
name, age, job, city = csv_line.split(',')
print(f"{name} is {age} years old, works as {job} in {city}")
# Output: John is 25 years old, works as Engineer in New York

3) Handling User Input#

# Parsing commands
command = "save document.txt"
action, filename = command.split(maxsplit=1)
print(f"Action: {action}, File: {filename}") # Output: Action: save, File: document.txt
# Processing multiple inputs
user_input = "5 10 15"
numbers = [int(x) for x in user_input.split()]
print(sum(numbers)) # Output: 30

4. Important Notes#

⚠️ Common Pitfalls:
  1. Empty string:
text = ""
print(text.split()) # Output: [] (empty list)
print(text.split(',')) # Output: [''] (list with one element)
  1. Separator not found:
text = "hello world"
print(text.split(',')) # Output: ['hello world']
  1. Consecutive separators:
text = "a,,b,c"
print(text.split(',')) # Output: ['a', '', 'b', 'c']
  1. Return value is always a list:
text = "python"
result = text.split()
print(type(result)) # Output: <class 'list'>
print(result) # Output: ['python']

5. Method Comparison#

MethodPurposeExampleResult
split()Split by whitespace"a b c".split()['a', 'b', 'c']
split(' ')Split by single space"a b".split(' ')['a', '', 'b']
rsplit()Split from right"a-b-c".rsplit('-',1)['a-b', 'c']
splitlines()Split by line breaks"a\nb".splitlines()['a', 'b']
partition()Split into 3 parts"a-b-c".partition('-')('a', '-', 'b-c')

6. Practical Example: Parsing Configuration Files#

config = """
host=localhost
port=8080
debug=true
"""
settings = {}
for line in config.strip().split('\n'):
if '=' in line:
key, value = line.split('=', 1)
settings[key] = value
print(settings)
# Output: {'host': 'localhost', 'port': '8080', 'debug': 'true'}

7. Advanced Techniques#

1) Using split() with List Comprehension#

# Extract numbers from mixed string
data = "age:25,score:95,weight:70"
values = [item.split(':')[1] for item in data.split(',')]
print(values) # Output: ['25', '95', '70']
# Convert to appropriate types
numeric_values = [int(item.split(':')[1]) for item in data.split(',')]
print(numeric_values) # Output: [25, 95, 70]

2) Handling Multiple Delimiters#

import re
text = "apple;banana,orange|grape"
# Split on ; , or |
fruits = re.split('[;,]', text) # Simple case
fruits = re.split('[;,\|]', text) # With escape for |
print(fruits) # Output: ['apple', 'banana', 'orange', 'grape']

3) Preserving Delimiters#

# Using re.split() with capturing group keeps delimiters
text = "hello-world-python"
parts = re.split('(-)', text)
print(parts) # Output: ['hello', '-', 'world', '-', 'python']
💡 One-line Takeaway
text.split() splits strings into word lists using whitespace; text.split(sep) splits by a specified delimiter; and the maxsplit parameter controls how many splits to perform.
Python text.split()
https://lxy-alexander.github.io/blog/posts/python/python-textsplit/
Author
Alexander Lee
Published at
2026-03-09
License
CC BY-NC-SA 4.0