Data Science and ML

How to Make Proximity Maps with Python | by Lee Vaughan | Oct, 2024

Here’s the full code (written in JupyterLab). I’ll break down the code blocks in the following sections.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from geopy.distance import great_circle

# SEC schools with coordinates (coords by ChatGPT4):
data = {
'school': ['Alabama', 'LSU', 'Ole Miss', 'Miss State',
'Auburn', 'Arkansas', 'Missouri', 'Vanderbilt',
'Tennessee', 'Florida', 'Georgia', 'Kentucky',
'S. Carolina', 'TAMU', 'Texas', 'Oklahoma'],
'latitude': [33.209, 30.412, 34.365, 33.456,
32.603, 36.068, 38.951, 36.162,
35.960, 29.651, 33.950, 38.049,
34.000, 30.620, 30.284, 35.222],
'longitude': [-87.538, -91.177, -89.526, -88.811,
-85.484, -94.172, -92.328, -86.784,
-83.920, -82.324, -83.377, -84.500,
-81.034, -96.340, -97.740, -97.445]
}

df = pd.DataFrame(data)

# Pick a school to plot the distance from.
# Use the same name as in the previous data dict:
SCHOOL = 'Texas'

# Set the grid resolution.
# Larger = higher res and smoother contours:
RESOLUTION = 500

# Get coordinates for SCHOOL:
school_index = df[df['school'] == SCHOOL].index[0]
school_coords = df.loc[school_index, ['latitude', 'longitude']].to_numpy()

# Create grid of points for interpolation:
x_min, x_max = df['longitude'].min(), df['longitude'].max()
y_min, y_max = df['latitude'].min(), df['latitude'].max()
xx, yy = np.meshgrid(np.linspace(x_min, x_max, RESOLUTION),
np.linspace(y_min, y_max, RESOLUTION))

# Calculate distances from SCHOOL to every point in grid:
distances = np.zeros(xx.shape)
for i in range(xx.shape[0]):
for j in range(xx.shape[1]):
point_coords = (yy[i, j], xx[i, j])
distances[i, j] = great_circle(school_coords, point_coords).miles

# Create the color-filled contour map:
fig, ax = plt.subplots(1, 1, figsize=(10, 8))
contour = ax.contourf(xx, yy, distances,
cmap='coolwarm',
alpha=0.9)
cbar = fig.colorbar(contour, ax=ax, shrink=0.7)
cbar.set_label(f'Distance from {SCHOOL} (miles)')
ax.scatter(df['longitude'], df['latitude'], s=2, color='black')

# Load state boundaries from US Census Bureau:
url = 'https://www2.census.gov/geo/tiger/GENZ2021/shp/cb_2021_us_state_20m.zip'
states = gpd.read_file(url)

# Filter states within the map limits:
states = states.cx[x_min:x_max, y_min:y_max]

# Plot the state boundaries:
states.boundary.plot(ax=ax, linewidth=1, edgecolor='black')

# Add labels for the schools:
for i, school in enumerate(df['school']):
ax.annotate(
school,
(df['longitude'][i], df['latitude'][i]),
textcoords="offset points",
xytext=(2, 1),
ha='left',
fontsize=8
)

ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.set_title(f'Distance from {SCHOOL} to Other SEC Schools')

# fig.savefig('distance_map.png', dpi=600)
plt.show()

And here’s the output, showing the distance from the University of Texas in Austin to the other SEC schools:

A colorful map of distances from the University of Texas in Austin.
Distance from University of Texas to other SEC schools (by author)

Importing Libraries

This project requires NumPy, Matplotlib, pandas, geopandas, geopy, and scipy. You can find installation instructions in the links.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from geopy.distance import great_circle

Loading Data

For the input data, I made a list of the schools and then had ChatGPT produce the dictionary with the lat-lon coordinates. The dictionary was then converted into a pandas DataFrame named df.

# SEC schools with coordinates (coords by ChatGPT4):
data = {
'school': ['Alabama', 'LSU', 'Ole Miss', 'Miss State',
'Auburn', 'Arkansas', 'Missouri', 'Vanderbilt',
'Tennessee', 'Florida', 'Georgia', 'Kentucky',
'S. Carolina', 'TAMU', 'Texas', 'Oklahoma'],
'latitude': [33.209, 30.412, 34.365, 33.456,
32.603, 36.068, 38.951, 36.162,
35.960, 29.651, 33.950, 38.049,
34.000, 30.620, 30.284, 35.222],
'longitude': [-87.538, -91.177, -89.526, -88.811,
-85.484, -94.172, -92.328, -86.784,
-83.920, -82.324, -83.377, -84.500,
-81.034, -96.340, -97.740, -97.445]
}

df = pd.DataFrame(data)

Assigning Constants

The code will produce a distance map from one of the listed SEC schools. We’ll assign the school’s name (typed exactly as it appears in the dictionary) to a constant named SCHOOL.

# Pick a school to plot the distance from. 
# Use the same name as in the data dict:
SCHOOL = 'Texas'

To control the “smoothness” of the contours, we’ll use a constant named RESOLUTION. The larger the number, the finer the underlying grid and thus the smoother the contours. Values around 500–1,000 produce good results.

# Set the grid resolution.
# Larger = higher res and smoother contours:
RESOLUTION = 500

Getting the School Location

Now to get the specified school’s map coordinates. In this case, the school will be the University of Texas in Austin, Texas.

# Get coordinates for SCHOOL:
school_index = df[df['school'] == SCHOOL].index[0]
school_coords = df.loc[school_index, ['latitude', 'longitude']].to_numpy()

The first line identifies the DataFrame index of the school specified by the SCHOOL constant. This index is then used to get the school’s coordinates. Because index returns a list of indices where the condition is true, we use [0] to get the first (presumably only) item in this list.

Next, we extract latitude and longitude values from the DataFrame and convert them into a NumPy array with the to_numpy() method.

If you’re unfamiliar with NumPy arrays, check out this article:

Creating the Grid

Before we make a contour map, we must build a regular grid and populate the grid nodes (intersections) with distance values. The following code creates the grid.

# Create grid of points for interpolation:
x_min, x_max = df['longitude'].min(), df['longitude'].max()
y_min, y_max = df['latitude'].min(), df['latitude'].max()
xx, yy = np.meshgrid(np.linspace(x_min, x_max, RESOLUTION),
np.linspace(y_min, y_max, RESOLUTION))

The first step here is to get the min and max values (x_min, x_max and y_min, y_max) of the longitude and latitude from the DataFrame.

Next, we use NumPy’s meshgrid() method to create a grid of points within the bounds defined by the min and max latitudes and longitudes.

Here’s how the grid looks for a resolution of 100:

A visualization of the nodes (points) on a 2D mesh.
The grid nodes of a grid created with resolution = 100 (by author)

Each node will hold a value that can be contoured.

Calculating Distances

The following code calculates concentric distances from the specified school.

# Calculate distances from SCHOOL to every point in grid:
distances = np.zeros(xx.shape)
for i in range(xx.shape[0]):
for j in range(xx.shape[1]):
point_coords = (yy[i, j], xx[i, j])
distances[i, j] = great_circle(school_coords, point_coords).miles

The first order of business is to initialize a NumPy array called distances. It has the same shape as thexx grid and is filled with zeroes. We’ll use it to store the calculated distances from SCHOOL.

Next, we loop over the rows of the grid, then, in a nested loop, iterate over the columns of the grid. With each iteration we retrieve the coordinates of the point at position (i, j) in the grid, with yy and xx holding the grid coordinates.

The final line calculates the great-circle distance (the distance between two points on a sphere) from the school to the current point coordinates (point_coords). The ultimate result is an array of distances with units in miles.

Creating the Map

Now that we have x, y, and distance data, we can contour the distance values and make a display.

# Create the color-filled contour map:
fig, ax = plt.subplots(1, 1, figsize=(10, 8))
contour = ax.contourf(xx, yy, distances,
cmap='coolwarm',
alpha=0.9)
cbar = fig.colorbar(contour, ax=ax, shrink=0.7)
cbar.set_label(f'Distance from {SCHOOL} (miles)')
ax.scatter(df['longitude'], df['latitude'], s=2, color='black')

We start by setting up a Matplotlib figure of size 10 x 8. If you’re not familiar with the fig, ax terminology, check out this terrific article for a quick introduction:

To draw the color-filled contours we use Matplotlib’s contourf() method. It uses the xx, yy, and distancesvalues, the coolwarm colormap, and a slight amount of transparency (alpha=0.9).

The default color bar for the display is lacking, in my opinion, so we customize it somewhat. The fig.colorbar() method adds a color bar to the plot to indicate the distance scale. The shrink argument keeps the height of the color bar from being disproportionate to the plot.

Finally, we use Matplotlib’s scatter() method to add the school locations to the map, with a marker size of 2. Later, we’ll label these points with the school names.

Adding the State Boundaries

The map currently has only the school locations to use as landmarks. To make the map more relatable, the following code adds state boundaries.

# Load state boundaries from US Census Bureau:
url = 'https://www2.census.gov/geo/tiger/GENZ2021/shp/cb_2021_us_state_20m.zip'
states = gpd.read_file(url)

# Filter states within the map limits:
states = states.cx[x_min:x_max, y_min:y_max]

# Plot the state boundaries:
states.boundary.plot(ax=ax, linewidth=1, edgecolor='black')

The third line uses geopandas’ cx indexer method for spatial slicing. It filters geometries in a GeoDataFrame based on a bounding box defined by the minimum and maximum x (longitude) and y (latitude) coordinates. Here, we filter out all the states outside the bounding box.

Adding Labels and a Title

The following code finishes the plot by tying up a few loose ends, such as adding the school names to their map markers, labeling the x and y axes, and setting an updateable title.

# Add labels for the schools:
for i, school in enumerate(df['school']):
ax.annotate(
school,
(df['longitude'][i], df['latitude'][i]),
textcoords="offset points",
xytext=(2, 1),
ha='left',
fontsize=8
)

ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.set_title(f'Distance from {SCHOOL} to Other SEC Schools')
fig.savefig('distance_map.png', dpi=600)
plt.show()

To label the schools, we use a for loop and enumeration to choose the correct coordinates and names for each school and use Matplotlib’s annotate() method to post them on the map. We use annotate() rather than the text() method to access the xytext argument, which lets us shift the label to where we want it.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button