Pandas入门——Apply()/Agg()/Transform()

Python 更新时间：2026-03-25 11:16:44发布时间：1638天前百科书网趣学号

1、Apply()

函数作为一个对象，能作为参数传递给其它函数，apply()所接受的参数就包含函数，是所有函数中自由度最高的函数。

功能是自动遍历整个 Series 或者 Dataframe, 对每一个元素运行指定的函数。

Dataframe.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

1、1 系统函数 1、1 引用numpy系统函数

轴标签与行对齐，遍历每一行的每一个元素运行给定函数。

df = pd.Dataframe({'one': [2, 6, 4], 'two': [20, 10, 20], 'three': [4, 8, 8]}, index=['a', 'b', 'c'])
print(df.apply(np.mean, axis=0))

one       4.000000
two      16.666667
three     6.666667
dtype: float64

1、2 引用pandas系统函数

df = pd.Dataframe({'one': [2, 6, 4], 'two': [20, 10, 20], 'three': [4, 8, 8]}, index=['a', 'b', 'c'])
print(df.apply(pd.Dataframe.mean))

也可以写成匿名函数的形式：

print(df.apply(lambda x: x.mean()))

one       4.000000
two      16.666667
three     6.666667
dtype: float64

1、3 引用python系统函数

遍历每一个元素运行给定函数。

df = pd.Dataframe({'one': ['ab', 'what are you ', 'o'], 'two': [20, 10, 20], 'three': [4, 8, 8]}, index=['a', 'b', 'c'])
print(df['one'].apply(len))

a     2
b    13
c     1
Name: one, dtype: int64

1、2 自定义函数

2、Agg() 2、1 单个函数

当参数为单个函数时，与apply()等效。

2、1、1 不指定列

df = pd.Dataframe({'one': [2, 6, 4], 'two': [20, 10, 20], 'three': [4, 8, 8]}, index=['a', 'b', 'c'])
print(df.agg(pd.Dataframe.mean))

one       4.000000
two      16.666667
three     6.666667
dtype: float64

可以用列表形式传递函数，但输出格式会发生改变，参考多个函数时列表聚合。

print(df.agg([pd.Dataframe.mean]))

      one        two     three
mean  4.0  16.666667  6.666667

当包含不能聚合的数据类型时，只计算可聚合的列，并且需要以列表形式传递函数。

df = pd.Dataframe({'one': [2, 6, 4], 'two': ['a', 'b', 'c'], 'three': [4, 8, 8]}, index=['a', 'b', 'c'])
print(df.agg(pd.Dataframe.mean)) #会报错
print(df.agg([pd.Dataframe.mean])) #正确

      one     three
mean  4.0  6.666667

2、1、2 指定列

print(df.agg({'one': pd.Dataframe.mean, 'two': pd.Dataframe.sum}))

one     4.0
two    50.0
dtype: float64

2、2 多个函数

实现自定义describe()。

2、2、1 不指定列

可以用列表形式传递多个聚合函数。每个函数在输出结果 Dataframe 里以行的形式显示。

df = pd.Dataframe({'one': [2, 6, 4], 'two': [20, 10, 20], 'three': [4, 8, 8]}, index=['a', 'b', 'c'])
print(df.agg([pd.Dataframe.mean, pd.Dataframe.sum]))
print(df.agg([pd.Series.count, pd.Dataframe.median, pd.Dataframe.std])) #正确

       one        two      three
mean   4.0  16.666667   6.666667
sum   12.0  50.000000  20.000000

        one       two     three
count   3.0  3.000000  3.000000
median  4.0  7.000000  8.000000
std     2.0  2.516611  2.309401

也可以写成匿名函数的形式，但行名会发生变化。

print(df.agg([pd.Dataframe.mean, lambda x: x.sum()]))

           one        two      three
mean       4.0  16.666667   6.666667
  12.0  50.000000  20.000000

包含不可聚合的列时，处理方式与只传递单个函数相同。

2、2、2 指定列

指定哪些列应用哪些聚合函数，未执行聚合操作的列输出结果为 NaN 值。

print(df.agg({'one': [pd.Dataframe.mean, pd.Dataframe.sum], 'two': pd.Dataframe.sum}))

       one   two
mean   4.0   NaN
sum   12.0  50.0

3、Transform()

支持 NumPy 函数、字符串函数及自定义函数，不支持Pandas函数。

3、1 单个函数 3、1、1 不指定列

df = pd.Dataframe({'one': [2, 6, 4], 'two': [5, 7, 10], 'three': [4, 8, 8]}, index=['a', 'b', 'c'])
print(df.transform(np.abs))

   one  two  three
a    2    5      4
b    6    7      8
c    4   10      8

3、1、2 指定列

print(df.transform({'one': np.abs, 'two': lambda x: x+1}))

   one  two
a    2    6
b    6    8
c    4   11

3、2 多个函数

调用多个函数时，生成多层索引 Dataframe。第一层是原始数据集的列名；第二层是调用的函数名。

3、2、1 不指定列

df = pd.Dataframe({'one': [2, 6, 4], 'two': [5, 7, 10], 'three': [4, 8, 8]}, index=['a', 'b', 'c'])
print(df.transform([np.abs, lambda x: x+1]))

       one               two             three         
  absolute  absolute  absolute 
a        2        3        5        6        4        5
b        6        7        7        8        8        9
c        4        5       10       11        8        9

3、2、2 指定列

print(df.transform({'one': [np.abs, lambda x: x+1], 'two': lambda x: x+1}))

       one               two
  absolute  
a        2        3        6
b        6        7        8
c        4        5       11

Pandas入门——Apply()/Agg()/Transform()

Python相关栏目本月热门文章