The Battle of Neighborhoods — Open a Bubble Tea Shop in Manhattan
Coursera - IBM Applied Data Science Capstone Project
Introduction and Business Problem Statement
Manhattan (New York City) is the most populous city in the United States. As a part of the Coursera Applied Data Science Capstone Project by IBM, we are going to examine the Manhattan drinks venues dataset and determine the optimal location to open a bubble tea shop.
Our business stakeholders are interested in opening a bubble tea shop in Manhattan but is also worried about the current pandemic situation. Therefore the stakeholder has reached out and ask us to research the market and suggest where to open in Manhattan. According to the Yelp report in September, there were 32,109 closures as of August 31, with 19,590 restaurants across nation have permanently shuttered their doors since March. Yet, there are still new restaurants opening their door against the pandemic. Many studies has found that restaurants work well for delivery and takeout have been able to keep their closure rates lower than others, including food trucks, bakeries and coffee shops. We think open a bubble tea shop with to-go and delivery service should operate well under Covid.
The aim of this project is to provide an optimal location to open a bubble tea shop in New York City under COVID. In this report, we will focus on all neighborhoods in Manhattan.
- NYC Boroughs/Neighborhood Geospatial Dataset
- Foursquare venue data through the Foursquare API
Let’s get a brief overview of the structure of New York City.
Methodology
- Data from NYC Boroughs and Neighborhoods Coordinates was downloaded to a JSON file.
- Added all Manhattan neighborhoods to generate the map.
3. Retrieved Venues data from Foursquare API and fetch Tea Shop, Coffee Shop and Bubble Tea Shop category.
4. Clean up the venue data to only include the top five venue category from above selection.
5. Generate a map with the top five venues.
6. Performed the one-hot encoding on data.
7. Performed a k-means clustering algorithm to partition the neighborhoods into six clusters.
Results and Discussion
Each cluster can now be examined and the discriminating venue categories that distinguish each cluster can be determined.
Cluster One
Cluster Two
Cluster Three
Cluster Four
Cluster Five
Cluster Six
After reviewing the data of each cluster, below are the findings:
At Cluster 1,4 and 5 the top three common venue doesn’t has bubble tea shop in it, so it is possible to open a bubble tea shop in this cluster.
At Cluster 2, 3 and 6 has bubble tea shops for the top three common venue, so need to be careful when you intend to open one in these neighborhoods.
In this Applied Data Science Capstone Project, we applied the various data manipulation techniques learned in Data Analytics with Python using the Pandas library. We also used the k-means clustering taught in Machine Learning with Python from Scikit Learn to cluster various neighborhoods based on various types of drink Venues.