I. `text.split()` in Python#

split() is a Python string method (字符串方法) that divides a string into a list of substrings. It's one of the most commonly used tools for text processing.

1. Basic Usage#

1
text = "Python is awesome"
2
result = text.split()
3
print(result)  # Output: ['Python', 'is', 'awesome']

By default, split() uses whitespace characters (空白字符) as delimiters: spaces, newlines \n, tabs \t, etc.

2. Parameter Details#

1) `split()` vs `split(' ')` Difference#

1
text = "Python   is   awesome"  # Multiple spaces
2

3
# Default split() - handles any amount of whitespace
4
print(text.split())   # Output: ['Python', 'is', 'awesome']
5

6
# split(' ') - strictly splits on single space
7
print(text.split(' '))  # Output: ['Python', '', '', 'is', '', '', 'awesome']

2) Specifying Separator `sep`#

1
data = "apple,banana,orange"
2
print(data.split(','))  # Output: ['apple', 'banana', 'orange']
3

4
path = "user/local/bin" # slash/backslash
5
print(path.split('/'))  # Output: ['user', 'local', 'bin']
6

7
sentence = "Hello-World-Python"
8
print(sentence.split('-'))  # Output: ['Hello', 'World', 'Python']

3) Limiting Splits with `maxsplit`#

The *message syntax is used for Extended Unpacking (扩展解包).

date, time, level: These take the first 3 elements of the list respectively.
*message: This collects all remaining elements into a List (列表). message = parts[3:] # ['Connection', 'failed'] ← all remaining as list

1
text = "one two three four five"
2

3
# Split only first 2 times
4
print(text.split(maxsplit=2))  # Output: ['one', 'two', 'three four five']
5

6
# Equivalent syntax
7
print(text.split(' ', 2))  # Output: ['one', 'two', 'three four five']
8

9
# Practical example: parsing simple logs
10
log = "2024-01-15 10:30:45 ERROR Connection failed"
11
date, time, level, *message = log.split(maxsplit=3)
12
print(f"Date: {date}, Time: {time}, Level: {level}, Message: {message}")
13
# Output: Date: 2024-01-15, Time: 10:30:45, Level: ERROR, Message: ['Connection', 'failed']

3. Common Use Cases#

1) Word Frequency Counting#

1
from collections import Counter
2

3
text = "Python is awesome. Python is powerful!"
4
words = text.lower().split()  # Convert to lowercase then split
5
# Note: Punctuation remains! Output: ['python', 'is', 'awesome.', 'python', 'is', 'powerful!']
6

7
# Better approach: clean punctuation
8
import re
9
words = re.findall(r'\w+', text.lower())
10
print(Counter(words))  # Output: Counter({'python': 2, 'is': 2, 'awesome': 1, 'powerful': 1})

Part	Meaning	Explanation
`re`	regex module	Python’s regular expression library (正则表达式库)
`.findall()`	find all matches	Returns all non-overlapping matches as a list
`r''`	raw string	Raw string (原始字符串) - backslashes are treated literally
`\w`	word character	Matches letters, digits, and underscore (字母、数字、下划线)
`+`	one or more	Quantifier (量词) - match 1 or more occurrences
`text.lower()`	lowercase	Converts everything to lowercase (小写) for case-insensitive counting

2) Parsing CSV (Simple Cases)#

1
csv_line = "John,25,Engineer,New York"
2
name, age, job, city = csv_line.split(',')
3
print(f"{name} is {age} years old, works as {job} in {city}")
4
# Output: John is 25 years old, works as Engineer in New York

3) Handling User Input#

1
# Parsing commands
2
command = "save document.txt"
3
action, filename = command.split(maxsplit=1)
4
print(f"Action: {action}, File: {filename}")  # Output: Action: save, File: document.txt
5

6
# Processing multiple inputs
7
user_input = "5 10 15"
8
numbers = [int(x) for x in user_input.split()]
9
print(sum(numbers))  # Output: 30

4. Important Notes#

⚠️ Common Pitfalls:

Empty string:

1
text = ""
2
print(text.split())  # Output: [] (empty list)
3
print(text.split(','))  # Output: [''] (list with one element)

Separator not found:

1
text = "hello world"
2
print(text.split(','))  # Output: ['hello world']

Consecutive separators:

1
text = "a,,b,c"
2
print(text.split(','))  # Output: ['a', '', 'b', 'c']

Return value is always a list:

1
text = "python"
2
result = text.split()
3
print(type(result))  # Output: <class 'list'>
4
print(result)  # Output: ['python']

5. Method Comparison#

Method	Purpose	Example	Result
`split()`	Split by whitespace	`"a b c".split()`	`['a', 'b', 'c']`
`split(' ')`	Split by single space	`"a b".split(' ')`	`['a', '', 'b']`
`rsplit()`	Split from right	`"a-b-c".rsplit('-',1)`	`['a-b', 'c']`
`splitlines()`	Split by line breaks	`"a\nb".splitlines()`	`['a', 'b']`
`partition()`	Split into 3 parts	`"a-b-c".partition('-')`	`('a', '-', 'b-c')`

6. Practical Example: Parsing Configuration Files#

1
config = """
2
host=localhost
3
port=8080
4
debug=true
5
"""
6

7
settings = {}
8
for line in config.strip().split('\n'):
9
    if '=' in line:
10
        key, value = line.split('=', 1)
11
        settings[key] = value
12

13
print(settings)
14
# Output: {'host': 'localhost', 'port': '8080', 'debug': 'true'}

7. Advanced Techniques#

1) Using `split()` with List Comprehension#

1
# Extract numbers from mixed string
2
data = "age:25,score:95,weight:70"
3
values = [item.split(':')[1] for item in data.split(',')]
4
print(values)  # Output: ['25', '95', '70']
5

6
# Convert to appropriate types
7
numeric_values = [int(item.split(':')[1]) for item in data.split(',')]
8
print(numeric_values)  # Output: [25, 95, 70]

2) Handling Multiple Delimiters#

1
import re
2

3
text = "apple;banana,orange|grape"
4
# Split on ; , or |
5
fruits = re.split('[;,]', text)  # Simple case
6
fruits = re.split('[;,\|]', text)  # With escape for |
7
print(fruits)  # Output: ['apple', 'banana', 'orange', 'grape']

3) Preserving Delimiters#

1
# Using re.split() with capturing group keeps delimiters
2
text = "hello-world-python"
3
parts = re.split('(-)', text)
4
print(parts)  # Output: ['hello', '-', 'world', '-', 'python']

💡 One-line Takeaway
text.split() splits strings into word lists using whitespace; text.split(sep) splits by a specified delimiter; and the maxsplit parameter controls how many splits to perform.

I. text.split() in Python#