Problem 2: Geospatial location of COVID19 confirmed cases

This is intended for educational purposes and is not intended as a realistic analysis of the COVID-19 virus confirmed cases.

Part a (10 points): Get the dataset and extract mainland US cases

  • Go to your host operating system (no git is installed in CompPhys docker, on purpose)
  • Find the directory you cloned the midterm into.
  • Execute: git clone https://github.com/CSSEGISandData/COVID-19.git. If successful, you should see the CSV file in the next cell.
  • Extract the data for the mainland US (latitude between 25 and 50 degrees, longitude between -130 and -70 degrees) into a numpy array that has (latitude, longitude, number of confirmed cases), with separate rows for each of the entries in the text file. Extract the number of confirmed cases that occurred on 3/18/20 (18-March-2020).

Hint: Using genfromtxt will not work. You will have to extract this by hand. The problem is that there are quotation marks in various entries (either in the country or the state) that you will have to work around.

In [1]:
! head -n 4 COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv



In [2]:
import numpy as np
import math
import copy
import matplotlib.pyplot as plt

Read into numpy array

In [3]:
s = 'COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv'

Solution:

In [4]:
us_coords = [[25,-130],[50,-70]]
In [5]:
import csv

date_to_use = '3/18/20'

states = []
countries = []
lats = []
lons = []
nconfirmed = []
reader = csv.reader(open(s), delimiter=',')
ir = 0
for r in reader:
    if ir == 0 : 
        index_to_use = r.index(date_to_use)
    else:
        state,country,lat,lon = r[0:4]
        states.append(state)
        countries.append(country)
        lats.append(float(lat))
        lons.append(float(lon))
        nconfirmed .append( float(r[index_to_use]) )
    ir += 1
data = np.array( list(zip(lats,lons,nconfirmed) ) )
states = np.array(states)
countries=np.array(countries)
usdata = data[countries == 'US']
selected = (usdata[:,0] > us_coords[0][0]) & (usdata[:,0] < us_coords[1][0]) & (usdata[:,1] > us_coords[0][1]) & (usdata[:,1] < us_coords[1][1])
mainland_us_data = usdata[ selected ]

Plot the mainland US data

In [6]:
## Solution: 

plt.scatter(mainland_us_data[:,1], mainland_us_data[:,0])
Out[6]:
<matplotlib.collections.PathCollection at 0x7f9a60336290>
In [ ]:
 

Part b (15 points) : Make Voronoi diagram

  • Make a Voronoi diagram of the data above.
  • Assume there are 10 Voronoi cells. Assign each a different color.
  • Initialize them to 10 randomly assigned points in the data sample.
  • Compute the centroids that minimize the $k$-means distance to the data points.
  • Plot the centroids of the data in black circles.
  • Plot the separate individual points by centroid color.

Initialize centroids

In [7]:
ncentroids = 10
eps = 1e-3
np.random.shuffle(mainland_us_data[:,0:2])
centroids = copy.copy(mainland_us_data[:,0:2][0:ncentroids] )
deltamax = 10000.
In [8]:
plt.scatter(mainland_us_data[:,1], mainland_us_data[:,0])
plt.scatter(centroids[:,1], centroids[:,0], marker='*')
plt.show()

Make a grid

In [9]:
points = mainland_us_data
ii = np.arange(points.shape[0])
jj = np.arange(ncentroids)
i,j = np.meshgrid(ii,jj)

Run the k-means minimization

In [10]:
while deltamax > eps: 
    old_centroids = copy.copy(centroids)    
    deltavals = np.sqrt(( points[i,0:2] - centroids[j] )**2)
    distances = np.linalg.norm( deltavals, axis=2 )
    closest_centroid = np.argmin(distances, axis=0)     
    centroids = np.array([(points[closest_centroid==k,0:2]).mean(axis=0) for k in range(ncentroids)])
    deltamax = np.max( old_centroids - centroids)
    print(deltamax)
2.591535294117648
3.0881029411764445
1.5586845588235292
0.30685931372548936
0.17923083778966387
0.1639445075757493
0.23169783549782608
0.15634453781512292
0.15918517786560926
0.5352110599078372
0.32759608294930587
0.098246640316205
0.07375714285714707
0.0
In [11]:
colors = np.array(['r', 'g', 'b', 'y', 'c', 'm', 'darkviolet', 'brown', 'teal', 'sandybrown'])
plt.scatter( points[:,1], points[:,0], c = colors[closest_centroid])
plt.scatter( centroids[:,1], centroids[:,0], c = 'k', s=200, marker='o')
Out[11]:
<matplotlib.collections.PathCollection at 0x7f9a60321990>
In [ ]: