如何计算 pandas pd.Timestamp 的平均值

问题

如果你有 pd.Timestamp 对象数组,你不能直接计算平均值,因为它们不能直接相加:

average_timestamp_problem.py
import pandas as pd

# Creating an array of five fixed pd.Timestamp objects
timestamps = [
    pd.Timestamp('2023-01-01 12:00:00'),
    pd.Timestamp('2023-01-02 12:00:00'),
    pd.Timestamp('2023-01-03 12:00:00'),
    pd.Timestamp('2023-01-04 12:00:00'),
    pd.Timestamp('2023-01-05 12:00:00')
]

# FAIL: This will raise a TypeError
average = sum(timestamps) / len(timestamps)

这将引发 TypeError

error.txt
TypeError                                 Traceback (most recent call last)
Cell In[1], line 13
      4 timestamps = [
      5     pd.Timestamp('2023-01-01 12:00:00'),
      6     pd.Timestamp('2023-01-02 12:00:00'),
   (...)
      9     pd.Timestamp('2023-01-05 12:00:00')
     10 ]
     12 # FAIL: This will raise a TypeError
---> 13 average = sum(timestamps) / len(timestamps)

File timestamps.pyx:483, in pandas._libs.tslibs.timestamps._Timestamp.__radd__()

File timestamps.pyx:465, in pandas._libs.tslibs.timestamps._Timestamp.__add__()

TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported.  Instead of adding/subtracting `n`, use `n * obj.freq`

解决方案

你可以对 ts.value 求和/取平均值,而不是直接对 ts 求和,然后在取平均值后将其转换回时间戳:

average_timestamp_solution.py
average = pd.Timestamp(sum(ts.value for ts in timestamps) / len(timestamps))

完整示例:

average_timestamp_full_example.py
import pandas as pd

# Creating an array of five fixed pd.Timestamp objects
timestamps = [
    pd.Timestamp('2023-01-01 12:00:00'),
    pd.Timestamp('2023-01-02 12:00:00'),
    pd.Timestamp('2023-01-03 12:00:00'),
    pd.Timestamp('2023-01-04 12:00:00'),
    pd.Timestamp('2023-01-05 12:00:00')
]

# Result: Timestamp('2023-01-03 12:00:00')
average = pd.Timestamp(sum(ts.value for ts in timestamps) / len(timestamps))

Check out similar posts by category: Pandas, Python