Labels: Numpy Arrays, DataFrames
Numpy Arrays [1]
# Import the numpy package as np
import numpy as np
- Element-wise calculations: fast and computationally efficient
- subset: quickly find out a subset according to the conditions
list = [1,2,3]
np_list = np.array(list) # Construct
sub_np_list = np_list[np_list>2] # Find all elements larger than 2
Pandas DataFrames [2]
- Store and manipulate tabular data in rows of observations and columns of variables.
dict = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
"capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
"area": [8.516, 17.10, 3.286, 9.597, 1.221],
"population": [200.4, 143.5, 1252, 1357, 52.98] }
import pandas as pd
brics = pd.DataFrame(dict)
print(brics)
- Assign the value of index.
# Set the index for brics
brics.index = ["BR", "RU", "IN", "CH", "SA"]
- Create a DataFrame is by importing a csv file using Pandas.
# Import the cars.csv data: cars
cars = pd.read_csv('cars.csv')
- Indexing DataFrames
We can use square brackets to select one column of the DataFrame: - The single bracket with output a Pandas Series.
It can also been used to access observations (rows) from a DataFrame.
# Print out country column as Pandas Series
print(cars['cars_per_cap'])
- The double bracket will output a Pandas DataFrame.
# Print out country column as Pandas DataFrame
print(cars[['cars_per_cap']])
- Data selection
loc
is label-based, which specifies rows and columns based on their row and column labels.
iloc
is integer index based, which specifies rows and columns by their integer index.
# Print out observation for Japan
print(cars.iloc[2])
# Print out observations for Australia and Egypt
print(cars.loc[['AUS', 'EG']])
Reference
[1] Python Numpy Arrays https://www.learnpython.org/en/Numpy_Arrays
[2] Python Pandas DataForms https://www.learnpython.org/en/Pandas_Basics