IV. NumPy Statistical Analysis (统计分析)#

NumPy's statistical functions let you summarize arrays with one line of code. Most functions accept an axis (轴) argument — without it, they operate on all elements; with it, they reduce along the specified dimension.

1. `sum()` / `mean()` — Total & Average (总和与均值)#

1
import numpy as np
2

3
a = np.array([[1, 2, 3],
4
              [4, 5, 6]])
5

6
np.sum(a)           # 21   — sum of ALL elements
7
np.sum(a, axis=0)   # [5 7 9] — column sums (按列求和)
8
np.sum(a, axis=1)   # [6 15]  — row sums (按行求和)
9

10
np.mean(a)          # 3.5
11
np.mean(a, axis=0)  # [2.5 3.5 4.5]

Note: axis=0 collapses rows (operates down each column); axis=1 collapses columns (operates across each row).

2. `min()` / `max()` — Extreme Values (极值)#

1
np.min(a)           # 1
2
np.max(a)           # 6
3
np.min(a, axis=1)   # [1 4] — min of each row
4
np.max(a, axis=0)   # [4 5 6] — max of each column
5

6
np.ptp(a)           # 5 — peak-to-peak = max - min (极差)

3. `argmin()` / `argmax()` — Index of Extreme Values (极值索引)#

Core idea: Returns the index (索引) of the minimum or maximum element, not the value itself.

1
b = np.array([3, 1, 4, 1, 5, 9, 2])
2

3
np.argmin(b)   # 1  (index of first minimum value 1)
4
np.argmax(b)   # 5  (index of maximum value 9)
5

6
# Along an axis
7
np.argmax(a, axis=0)  # [1 1 1] → row index of max in each column

4. `std()` / `var()` — Spread Measures (离散程度)#

Core idea: Measure how spread out the data is. Standard deviation (标准差) = $\sqrt{\text{variance (方差)}}$

1
a = np.array([2, 4, 4, 4, 5, 5, 7, 9])
2

3
np.std(a)    # 2.0  — population std (总体标准差)
4
np.var(a)    # 4.0  — population variance (总体方差)
5

6
# Sample std/var (样本标准差/方差): use ddof=1
7
np.std(a, ddof=1)   # 2.138...
8
np.var(a, ddof=1)   # 4.571...

Note: Default ddof=0 gives population statistics. Use ddof=1 for sample statistics (common in data analysis).

5. `cumsum()` / `cumprod()` — Cumulative Functions (累积函数)#

Core idea: Returns running totals — each output element is the sum (or product) of all elements up to that position.

1
a = np.array([1, 2, 3, 4])
2

3
np.cumsum(a)    # [1  3  6 10]  — running sum (累积和)
4
np.cumprod(a)   # [1  2  6 24]  — running product (累积积)
5

6
# 2-D with axis
7
m = np.array([[1,2],[3,4]])
8
np.cumsum(m, axis=0)  # [[1,2],[4,6]] — cumulative down columns

6. `median()` / `percentile()` — Percentile Stats (百分位数)#

1
a = np.array([1, 2, 3, 4, 5])
2

3
np.median(a)                    # 3.0 — middle value (中位数)
4
np.percentile(a, 25)            # 2.0 — 25th percentile (四分位数)
5
np.percentile(a, [25, 50, 75])  # [2. 3. 4.]

7. Quick Comparison Table#

Function (函数)	Returns	axis support?
`sum()`	Total of elements	✅
`mean()`	Average value	✅
`min()` / `max()`	Smallest / largest value	✅
`argmin()` / `argmax()`	Index of min / max	✅
`std()`	Standard deviation	✅
`var()`	Variance	✅
`cumsum()`	Running sum array	✅
`median()`	Middle value	✅
`percentile(a, q)`	q-th percentile	✅

💡 One-line Takeaway
Always specify axis for multi-dimensional arrays, and use ddof=1 when computing sample (not population) statistics.

IV. NumPy Statistical Analysis (统计分析)#

1. sum() / mean() — Total & Average (总和与均值)#

2. min() / max() — Extreme Values (极值)#

3. argmin() / argmax() — Index of Extreme Values (极值索引)#

4. std() / var() — Spread Measures (离散程度)#

5. cumsum() / cumprod() — Cumulative Functions (累积函数)#

6. median() / percentile() — Percentile Stats (百分位数)#