使用map转换每个元素
In [1]: import pandas as pd
...: import datetime
...: from operator import methodcaller
In [2]: pd.options.display.max_rows = 10
In [3]: s = pd.Series(pd.date_range(pd.Timestamp('now'), periods=5))
In [4]: s
Out[4]:
0 2019-03-01 14:44:40.030313
1 2019-03-02 14:44:40.030313
2 2019-03-03 14:44:40.030313
3 2019-03-04 14:44:40.030313
4 2019-03-05 14:44:40.030313
dtype: datetime64[ns]
In [5]: s.map(lambda x: x.strftime('%d-%m-%Y'))
Out[5]:
0 01-03-2019
1 02-03-2019
2 03-03-2019
3 04-03-2019
4 05-03-2019
dtype: object
In [6]: s.map(methodcaller('strftime', '%d-%m-%Y'))
Out[6]:
0 01-03-2019
1 02-03-2019
2 03-03-2019
3 04-03-2019
4 05-03-2019
dtype: object
对Series中的每个Timestamp元素调用date方法获得Datetime.date的raw对象
In [7]: s.map(methodcaller('date'))
Out[7]:
0 2019-03-01
1 2019-03-02
2 2019-03-03
3 2019-03-04
4 2019-03-05
dtype: object
In [8]: s.map(methodcaller('date')).values
Out[8]:
array([datetime.date(2019, 3, 1), datetime.date(2019, 3, 2),
datetime.date(2019, 3, 3), datetime.date(2019, 3, 4),
datetime.date(2019, 3, 5)], dtype=object)
等价方法是调用五绑定的Timestamp.date方法
In [9]: s.map(pd.Timestamp.date)
Out[9]:
0 2019-03-01
1 2019-03-02
2 2019-03-03
3 2019-03-04
4 2019-03-05
dtype: object
Timestamp.date方法高效且易读。Timestamp方法可以在pandas顶级方法,即pandas.Timestamp。
DatetimeIndex的date属性也可做类似的事。返回一个dtype=object的numpy对象。
In [10]: idx = pd.DatetimeIndex(s)
In [11]: idx
Out[11]:
DatetimeIndex(['2019-03-01 14:44:40.030313', '2019-03-02 14:44:40.030313',
'2019-03-03 14:44:40.030313', '2019-03-04 14:44:40.030313',
'2019-03-05 14:44:40.030313'],
dtype='datetime64[ns]', freq=None)
In [12]: idx.date
Out[12]:
array([datetime.date(2019, 3, 1), datetime.date(2019, 3, 2),
datetime.date(2019, 3, 3), datetime.date(2019, 3, 4),
datetime.date(2019, 3, 5)], dtype=object)
对于数据量大的datetime64[ns] Series,Timestamp.date性能好于operator.methodcaller,略微比lambda快。
In [13]: f1 = methodcaller('date')
...: f2 = lambda x: x.date()
...: f3 = pd.Timestamp.date
...: s2 = pd.Series(pd.date_range('20010101', periods=1000000, freq='T'))
...: s2
Out[13]:
0 2001-01-01 00:00:00
1 2001-01-01 00:01:00
2 2001-01-01 00:02:00
3 2001-01-01 00:03:00
4 2001-01-01 00:04:00
...
999995 2002-11-26 10:35:00
999996 2002-11-26 10:36:00
999997 2002-11-26 10:37:00
999998 2002-11-26 10:38:00
999999 2002-11-26 10:39:00
Length: 1000000, dtype: datetime64[ns]
In [14]: timeit s2.map(f1)
2.97 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [15]: timeit s2.map(f2)
2.9 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [16]: timeit s2.map(f3)
2.98 s ± 177 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
pandas的目标之一是在numpy之上提供一个操作层,这样就不必处理ndarray的底层细节。获取原始的datetime.date对象的用途有限,因为没有与之对应的numpy dtype且被pandas支持。Pandas仅支持datetime64[ns]类型,这是纳秒级的。