573 words
3 minutes
NumPy Statistical Analysis

IV. NumPy Statistical Analysis (统计分析)#

NumPy's statistical functions let you summarize arrays with one line of code. Most functions accept an axis (轴) argument — without it, they operate on all elements; with it, they reduce along the specified dimension.

1. sum() / mean() — Total & Average (总和与均值)#

import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6]])
np.sum(a) # 21 — sum of ALL elements
np.sum(a, axis=0) # [5 7 9] — column sums (按列求和)
np.sum(a, axis=1) # [6 15] — row sums (按行求和)
np.mean(a) # 3.5
np.mean(a, axis=0) # [2.5 3.5 4.5]
Note: axis=0 collapses rows (operates down each column); axis=1 collapses columns (operates across each row).

2. min() / max() — Extreme Values (极值)#

np.min(a) # 1
np.max(a) # 6
np.min(a, axis=1) # [1 4] — min of each row
np.max(a, axis=0) # [4 5 6] — max of each column
np.ptp(a) # 5 — peak-to-peak = max - min (极差)

3. argmin() / argmax() — Index of Extreme Values (极值索引)#

Core idea: Returns the index (索引) of the minimum or maximum element, not the value itself.

b = np.array([3, 1, 4, 1, 5, 9, 2])
np.argmin(b) # 1 (index of first minimum value 1)
np.argmax(b) # 5 (index of maximum value 9)
# Along an axis
np.argmax(a, axis=0) # [1 1 1] → row index of max in each column

4. std() / var() — Spread Measures (离散程度)#

Core idea: Measure how spread out the data is. Standard deviation (标准差) = variance (方差)\sqrt{\text{variance (方差)}}

a = np.array([2, 4, 4, 4, 5, 5, 7, 9])
np.std(a) # 2.0 — population std (总体标准差)
np.var(a) # 4.0 — population variance (总体方差)
# Sample std/var (样本标准差/方差): use ddof=1
np.std(a, ddof=1) # 2.138...
np.var(a, ddof=1) # 4.571...
Note: Default ddof=0 gives population statistics. Use ddof=1 for sample statistics (common in data analysis).

5. cumsum() / cumprod() — Cumulative Functions (累积函数)#

Core idea: Returns running totals — each output element is the sum (or product) of all elements up to that position.

a = np.array([1, 2, 3, 4])
np.cumsum(a) # [1 3 6 10] — running sum (累积和)
np.cumprod(a) # [1 2 6 24] — running product (累积积)
# 2-D with axis
m = np.array([[1,2],[3,4]])
np.cumsum(m, axis=0) # [[1,2],[4,6]] — cumulative down columns

6. median() / percentile() — Percentile Stats (百分位数)#

a = np.array([1, 2, 3, 4, 5])
np.median(a) # 3.0 — middle value (中位数)
np.percentile(a, 25) # 2.0 — 25th percentile (四分位数)
np.percentile(a, [25, 50, 75]) # [2. 3. 4.]

7. Quick Comparison Table#

Function (函数)Returnsaxis support?
sum()Total of elements
mean()Average value
min() / max()Smallest / largest value
argmin() / argmax()Index of min / max
std()Standard deviation
var()Variance
cumsum()Running sum array
median()Middle value
percentile(a, q)q-th percentile
💡 One-line Takeaway
Always specify axis for multi-dimensional arrays, and use ddof=1 when computing sample (not population) statistics.
NumPy Statistical Analysis
https://lxy-alexander.github.io/blog/posts/numpy/api/04numpy-statistical-analysis/
Author
Alexander Lee
Published at
2026-03-12
License
CC BY-NC-SA 4.0