The Battle of Neighborhoods — Open a Bubble Tea Shop in Manhattan

Claire Zhang
4 min readNov 3, 2020

--

Coursera - IBM Applied Data Science Capstone Project

Introduction and Business Problem Statement

Manhattan (New York City) is the most populous city in the United States. As a part of the Coursera Applied Data Science Capstone Project by IBM, we are going to examine the Manhattan drinks venues dataset and determine the optimal location to open a bubble tea shop.

Our business stakeholders are interested in opening a bubble tea shop in Manhattan but is also worried about the current pandemic situation. Therefore the stakeholder has reached out and ask us to research the market and suggest where to open in Manhattan. According to the Yelp report in September, there were 32,109 closures as of August 31, with 19,590 restaurants across nation have permanently shuttered their doors since March. Yet, there are still new restaurants opening their door against the pandemic. Many studies has found that restaurants work well for delivery and takeout have been able to keep their closure rates lower than others, including food trucks, bakeries and coffee shops. We think open a bubble tea shop with to-go and delivery service should operate well under Covid.

The aim of this project is to provide an optimal location to open a bubble tea shop in New York City under COVID. In this report, we will focus on all neighborhoods in Manhattan.

  1. NYC Boroughs/Neighborhood Geospatial Dataset
  2. Foursquare venue data through the Foursquare API

Let’s get a brief overview of the structure of New York City.

Methodology

  1. Data from NYC Boroughs and Neighborhoods Coordinates was downloaded to a JSON file.
  2. Added all Manhattan neighborhoods to generate the map.

3. Retrieved Venues data from Foursquare API and fetch Tea Shop, Coffee Shop and Bubble Tea Shop category.

4. Clean up the venue data to only include the top five venue category from above selection.

5. Generate a map with the top five venues.

6. Performed the one-hot encoding on data.

7. Performed a k-means clustering algorithm to partition the neighborhoods into six clusters.

Results and Discussion

Each cluster can now be examined and the discriminating venue categories that distinguish each cluster can be determined.

Cluster One

Café is the most common venues in cluster one

Cluster Two

Coffee Shop is the most common venues in cluster two

Cluster Three

Tea Room and Café are the most common venues in cluster three

Cluster Four

Coffee Shop is the most common venues in cluster four

Cluster Five

Coffee Shop and Café are the most common venues in cluster five

Cluster Six

Café is the most common venues in cluster six

After reviewing the data of each cluster, below are the findings:

At Cluster 1,4 and 5 the top three common venue doesn’t has bubble tea shop in it, so it is possible to open a bubble tea shop in this cluster.

At Cluster 2, 3 and 6 has bubble tea shops for the top three common venue, so need to be careful when you intend to open one in these neighborhoods.

In this Applied Data Science Capstone Project, we applied the various data manipulation techniques learned in Data Analytics with Python using the Pandas library. We also used the k-means clustering taught in Machine Learning with Python from Scikit Learn to cluster various neighborhoods based on various types of drink Venues.

--

--

Claire Zhang
Claire Zhang

Written by Claire Zhang

I’m a dataviz enthusiast with a curiosity for solving puzzles with data and passion for design.

No responses yet