1.导入模块
In [1]: import pandas as pd
2.导入表格数据
>>> titanic = pd.read_csv(r"C:\Users\Administrator\Desktop\titanic.csv")
>>> titanic
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
.. ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q
[891 rows x 12 columns]
3.将姓名列中的大写名字全部修改为小写名字
>>> titanic["Name"].str.lower()
0 braund, mr. owen harris
1 cumings, mrs. john bradley (florence briggs thayer)
2 heikkinen, miss. laina
3 futrelle, mrs. jacques heath (lily may peel)
4 allen, mr. william henry
...
886 montvila, rev. juozas
887 graham, miss. margaret edith
888 johnston, miss. catherine helen "carrie"
889 behr, mr. karl howell
890 dooley, mr. patrick
Name: Name, Length: 891, dtype: object
4.按逗号分隔姓名,将行转化为列表
>>> titanic["Name"].str.split(",")
0 [Braund, Mr. Owen Harris]
1 [Cumings, Mrs. John Bradley (Florence Briggs Thayer)]
2 [Heikkinen, Miss. Laina]
3 [Futrelle, Mrs. Jacques Heath (Lily May Peel)]
4 [Allen, Mr. William Henry]
...
886 [Montvila, Rev. Juozas]
887 [Graham, Miss. Margaret Edith]
888 [Johnston, Miss. Catherine Helen "Carrie"]
889 [Behr, Mr. Karl Howell]
890 [Dooley, Mr. Patrick]
Name: Name, Length: 891, dtype: object
5.提取名字列表中的第一个元素
>>> titanic["Surname"] = titanic["Name"].str.split(",").str.get(0)
>>> titanic["Surname"]
0 Braund
1 Cumings
2 Heikkinen
3 Futrelle
4 Allen
...
886 Montvila
887 Graham
888 Johnston
889 Behr
890 Dooley
Name: Surname, Length: 891, dtype: object
6.查找包含指定字符的用户信息
>>> titanic[titanic["Name"].str.contains("Countess")]
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Surname
759 760 1 1 Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards) female 33.0 0 0 110152 86.5 B77 S Rothes
7.将男性值替换为M,并将所有女性值替换为F
>>> titanic["Sex_short"] = titanic["Sex"].replace({"male": "M", "female": "F"})
>>> titanic
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Surname Sex_short
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S Braund M
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38.0 1 0 PC 17599 71.2833 C85 C Cumings F
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S Heikkinen F
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S Futrelle F
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S Allen M
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S Montvila M
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S Graham F
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S Johnston F
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C Behr M
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q Dooley M
[891 rows x 14 columns]
8.总结
使用str方法可以使用字符串方法,replace方法是根据给定字典转换值的便捷方法