05_Visualizing Geographic Data

1. Geographic Data

From scientific fields like meteorology and climatology, through to the software on our smartphones like Google Maps and Facebook check-ins, geographic data is always present in our everyday lives. Raw geographic data like latitudes and longitudes are difficult to understand using the data charts and plots we've discussed so far. To explore this kind of data, you'll need to learn how to visualize the data on maps.

In this mission, we'll explore the fundamentals of geographic coordinate systems and how to work with the basemap library to plot geographic data points on maps. We'll be working with flight data from the openflights website. Here's a breakdown of the files we'll be working with and the most pertinent columns from each dataset:

airlines.csv - data on each airline.

  • country - where the airline is headquartered.
  • active - if the airline is still active.

airports.csv - data on each airport.

  • name - name of the airport.
  • city - city the airport is located.
  • country - country the airport is located.
  • code - unique airport code.
  • latitude - latitude value.
  • longitude - longitude value.

routes.csv - data on each flight route.

  • airline - airline for the route.
  • source - starting city for the route.
  • dest - destination city for the route.

We can explore a range of interesting questions and ideas using these datasets:

  • For each airport, which destination airport is the most common?
  • Which cities are the most important hubs for airports and airlines?

Read in the 3 CSV files into 3 separate dataframe objects - airlines, airports, and routes.

Use the DataFrame.iloc[] method to return the first row in each dataframe as a neat table.

Display the first rows for all dataframes using the print() function. Try to answer the following questions:

  • What's the best way to link the data from these 3 different datasets together?- What are the formats of the latitude and longitude values?

import pandas as pd
airlines = pd.read_csv('airlines.csv')
airports = pd.read_csv('airports.csv')
routes   = pd.read_csv('routes.csv')

print(airlines.iloc[0])
print(airports.iloc[0])
print(routes.iloc[0])

id 1
name Private flight
alias \N
iata -
icao NaN
callsign NaN
country NaN
active Y
Name: 0, dtype: object
id 1
name Goroka
city Goroka
country Papua New Guinea
code GKA
icao AYGA
latitude -6.08169
longitude 145.392
altitude 5282
offset 10
dst U
timezone Pacific/Port_Moresby
Name: 0, dtype: object
airline 2B
airline_id 410
source AER
source_id 2965
dest KZN
dest_id 2990
codeshare NaN
stops 0
equipment CR2
Name: 0, dtype: object

2. Geographic Coordinate Systems

A geographic coordinate system allows us to locate any point on Earth using latitude and longitude coordinates.
image.png

Here are the coordinates of 2 well known points of interest:

White House Washington DC 38.898166 -77.036441 Alcatraz Island San Francisco CA 37.827122 -122.422934

In most cases, we want to visualize latitude and longitude points on two-dimensional maps. Two-dimensional maps are faster to render, easier to view on a computer and distribute, and are more familiar to the experience of popular mapping software like Google Maps. Latitude and longitude values describe points on a sphere, which is three-dimensional. To plot the values on a two-dimensional plane, we need to convert the coordinates to the Cartesian coordinate system using a map projection.

A map projection transforms points on a sphere to a two-dimensional plane. When projecting down to the two-dimensional plane, some properties are distorted. Each map projection makes trade-offs in what properties to preserve and you can read about the different trade-offs here. We'll use the Mercator projection, because it is commonly used by popular mapping software.

3. Installing Basemap

Before we convert our flight data to Cartesian coordinates and plot it, let's learn more about the basemap toolkit. Basemap is an extension to Matplotlib that makes it easier to work with geographic data. The documentation for basemap provides a good high-level overview of what the library does:

The matplotlib basemap toolkit is a library for plotting 2D data on maps in Python. Basemap does not do any plotting on it’s own, but provides the facilities to transform coordinates to one of 25 different map projections. Basemap makes it easy to convert from the spherical coordinate system (latitudes & longitudes) to the Mercator projection. While basemap uses Matplotlib to actually draw and control the map, the library provides many methods that enable us to work with maps quickly. Before we dive into how basemap works, let's get familiar with how to install it.

The easiest way to install basemap is through Anaconda. If you're new to Anaconda, we recommend checking out our Python and Pandas installation project:

conda install basemap

The Basemap library has some external dependencies that Anaconda handles the installation for. To test the installation, run the following import code:

from mpl_toolkits.basemap import Basemap

If an error is returned, we recommend searching for similar errors on StackOverflow to help debug the issue. Because basemap uses matplotlib, you'll want to import matplotlib.pyplot into your environment when you use Basemap.

from mpl_toolkits.basemap import Basemap

4. Workflow With Basemap

Here's what the general workflow will look like when working with two-dimensional maps:

  • Create a new basemap instance with the specific map projection we want to use and how much of the map we want included.
  • Convert spherical coordinates to Cartesian coordinates using the basemap instance.
  • Use the matplotlib and basemap methods to customize the map.
  • Display the map.

Let's focus on the first step and create a new basemap instance. To create a new instance of the basemap class, we call the basemap constructor and pass in values for the required parameters:

  • projection: the map projection.
  • llcrnrlat: latitude of lower left hand corner of the desired map domain
  • urcrnrlat: latitude of upper right hand corner of the desired map domain
  • llcrnrlon: longitude of lower left hand corner of the desired map domain
  • urcrnrlon: longitude of upper right hand corner of the desired map domain

Create a new basemap instance with the following parameters:

  • projection: "merc"
  • llcrnrlat: -80 degrees
  • urcrnrlat: 80 degrees
  • llcrnrlon: -180 degrees
  • urcrnrlon: 180 degrees

Assign the instance to the new variable m.

import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
m = Basemap(projection = 'merc', 
            llcrnrlat = -80,
           urcrnrlat = 80,
           llcrnrlon = -180,
           urcrnrlon = 180)

5. Converting From Spherical to Cartesian Coordinates

As we mentioned before, we need to convert latitude and longitude values to Cartesian coordinates to display them on a two-dimensional map. We can pass in a list of latitude and longitude values into the basemap instance and it will return back converted lists of longitude and latitude values using the projection we specified earlier. The constructor only accepts list values, so we'll need to use Series.tolist() to convert the longitude and latitude columns from the airports dataframe to lists. Then, we pass them to the basemap instance with the longitude values first then the latitude values:

x, y = m(longitudes, latitudes)

The basemap object will return 2 list objects, which we assign to x and y. Finally, we display the first 5 elements of the original longitude values, original latitude values, the converted longitude values, and the converted latitude values.


Convert the longitude values from spherical to Cartesian and assign the resulting list to x.

Convert the latitude values from spherical to Cartesian and assign the resulting list to y.

m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)

x, y = m(longitudes, latitudes)

m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)

longitudes = airports["longitude"].tolist()
latitudes = airports["latitude"].tolist()

x, y = m(longitudes, latitudes)

m.scatter(x, y, s=1)

plt.show()

image.png

7. Customizing The Plot Using Basemap

You'll notice that the outlines of the coasts for each continent are missing from the map above. We can display the coast lines using the basemap.drawcoastlines() method.


Use basemap.drawcoastlines() to enable the coast lines to be displayed.

Display the plot using plt.show().

m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)
longitudes = airports["longitude"].tolist()
latitudes = airports["latitude"].tolist()
x, y = m(longitudes, latitudes)
m.scatter(x, y, s=1)
m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)
longitudes = airports["longitude"].tolist()
latitudes = airports["latitude"].tolist()
x, y = m(longitudes, latitudes)
m.scatter(x, y, s=1)
m.drawcoastlines()
plt.show()
image.png

8. Customizing The Plot Using Matplotlib

Because basemap uses matplotlib under the hood, we can interact with the matplotlib classes that basemap uses directly to customize the appearance of the map.

We can add code that:

  • uses pyplot.subplots() to specify the figsize parameter
  • returns the Figure and Axes object for a single subplot and assigns to fig and ax respectively
  • use the Axes.set_title() method to set the map title

Before creating the basemap instance and generating the scatter plot, add code that:

  • creates a figure with a height of 15 inches and a width of 20 inches
  • sets the title of the scatter plot to "Scaled Up Earth With Coastlines"
# Add code here, before creating the Basemap instance.
m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)
longitudes = airports["longitude"].tolist()
latitudes = airports["latitude"].tolist()
x, y = m(longitudes, latitudes)
m.scatter(x, y, s=1)
m.drawcoastlines()
plt.show()
fig, ax = plt.subplots(figsize=(15,20))
plt.title("Scaled Up Earth With Coastlines")
m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)
longitudes = airports["longitude"].tolist()
latitudes = airports["latitude"].tolist()
x, y = m(longitudes, latitudes)
m.scatter(x, y, s=1)
m.drawcoastlines()
plt.show()


image.png
image.png

9. Introduction to Great Circles

To better understand the flight routes, we can draw great circles to connect starting and ending locations on a map. A great circle is the shortest circle connecting 2 points on a sphere.

Great Circles(pic)

On a two-dimensional map, the great circle is demonstrated as a line because it is projected from three-dimensional down to two-dimensional using the map projection. We can use these to visualize the flight routes from the routes dataframe. To plot great circles, we need the source longitude, source latitude, destination longitude, and the destination latitude for each route. While the routes dataframe contains the source and destination airports for each route, the latitude and longitude values for each airport are in a separate dataframe (airports).

To make things easier, we've created a new CSV file called geo_routes.csv that contains the latitude and longitude values corresponding to the source and destination airports for each route. We've also removed some columns we won't be working with.


Read geo_routes.csv into a dataframe named geo_routes.

Use the DataFrame.info() method to look for columns containing any null values.

Display the first five rows in geo_routes.

geo_routes = pd.read_csv("geo_routes.csv")
print(geo_routes.info())
print(geo_routes.head(5))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67428 entries, 0 to 67427
Data columns (total 8 columns):
airline 67428 non-null object
source 67428 non-null object
dest 67428 non-null object
equipment 67410 non-null object
start_lon 67428 non-null float64
end_lon 67428 non-null float64
start_lat 67428 non-null float64
end_lat 67428 non-null float64
dtypes: float64(4), object(4)
memory usage: 4.1+ MB
None
airline source dest equipment start_lon end_lon start_lat end_lat
0 2B AER KZN CR2 39.956589 49.278728 43.449928 55.606186
1 2B ASF KZN CR2 48.006278 49.278728 46.283333 55.606186
2 2B ASF MRV CR2 48.006278 43.081889 46.283333 44.225072
3 2B CEK KZN CR2 61.503333 49.278728 55.305836 55.606186
4 2B CEK OVB CR2 61.503333 82.650656 55.305836 55.012622

10. Displaying Great Circles

We use the basemap.drawgreatcircle() method to display a great circle between 2 points. The basemap.drawgreatcircle() method requires four parameters in the following order:

  • lon1 - longitude of the starting point.
  • lat1 - latitude of the starting point.
  • lon2 - longitude of the ending point.
  • lat2 - latitude of the ending point.

The following code generates a great circle for the first three routes in the dataframe:

m.drawgreatcircle(39.956589, 43.449928, 49.278728, 55.606186)
m.drawgreatcircle(48.006278, 46.283333, 49.278728, 55.606186)
m.drawgreatcircle(39.956589, 43.449928, 43.081889 , 44.225072)

Unfortunately, basemap struggles to create great circles for routes that have an absolute difference of larger than 180 degrees for either the latitude or longitude values. This is because the basemap.drawgreatcircle() method isn't able to create great circles properly when they go outside of the map boundaries. This is mentioned briefly in the documentation for the method:

Note: Cannot handle situations in which the great circle intersects the edge of the map projection domain, and then re-enters the domain.


Write a function, named create_great_circles() that draws a great circle for each route that has an absolute difference in the latitude and longitude values less than 180. This function should:

  • Accept a dataframe as the sole parameter
  • Iterate over the rows in the dataframe using DataFrame.iterrows()
  • For each row:
    • Draw a great circle using the four geographic coordinates only if:
      • The absolute difference between the latitude values (end_lat and start_lat) is less than 180.
      • If the absolute difference between the longitude values (end_lon and start_lon) is less than 180. Create a filtered dataframe containing just the routes that start at the DFW airport.
  • Select only the rows in geo_routes where the value for the source column equals "DFW".
  • Assign the resulting dataframe to dfw.

Pass dfw into create_great_circles() and display the plot using the pyplot.show() function.

fig, ax = plt.subplots(figsize=(15,20))
m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)
m.drawcoastlines()
fig, ax = plt.subplots(figsize=(15,20))
m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180)
m.drawcoastlines()

def create_great_circles(df):
    for index, row in df.iterrows():
        end_lat, start_lat = row['end_lat'], row['start_lat']
        end_lon, start_lon = row['end_lon'], row['start_lon']
        
        if abs(end_lat - start_lat) < 180:
            if abs(end_lon - start_lon) < 180:
                m.drawgreatcircle(start_lon, start_lat, end_lon, end_lat)

dfw = geo_routes[geo_routes['source'] == "DFW"]
create_great_circles(dfw)
plt.show()
image.png
image.png

In this mission, we learned how to visualize geographic data using basemap. This is the last mission in the Storytelling Through Data Visualization course. You should now have a solid foundation in data visualization for exploring data and communicating insights. We encourage you to keep exploring data visualization on your own. Here are some suggestions for what to do next:
Plotting tools: Creating 3D plots using Plotly Creating interactive visualizations using bokeh Creating interactive map visualizations using folium The art and science of data visualization: Visual Display of Quantitative Information Visual Explanations: Images and Quantities, Evidence and Narrative

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 217,907评论 6 506
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,987评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,298评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,586评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,633评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,488评论 1 302
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,275评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,176评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,619评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,819评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,932评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,655评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,265评论 3 329
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,871评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,994评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,095评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,884评论 2 354

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,331评论 0 10
  • 原文《四十而不惑,五十而知天命 》 四十不惑,过了四十岁,也学习了将近十五年,一些事物、事件的本来面...
    郭月山阅读 172评论 0 0
  • 梦回 前尘影事 跪拜在弥陀前 同立誓愿 愿慈悲众生,体证菩提 愿此心光明,了却尘缘 转眼千年 轮回几度 一瞬的沧海...
    乖乖的小竹子阅读 329评论 0 0
  • 你见过这样子的景墙么 一根根的方钢 组成的铁格栅 成为了路与路间最牢固的隔断 如果你以为她的价值仅限于此 那就错了...
    丶足迹阅读 210评论 0 0