Airbnb Listings in Boston and Seattle: How do they differ?
Unfortunately, I know the U.S. mostly from television. I had the opportunity to do a semester abroad in South West Florida, but I never had the time to visit cities outside of the Sunshine State. However, there are supposed to be some cultural differences, and I have firmly resolved to make another trip to the U.S. and visit both the west and east coast. To prepare for this trip, the Airbnb records about Boston and Seattle come in handy.
Initially, I asked myself a question that is pretty obvious: How do the cities differ in price?
It would be straightforward to simply look at the two averages of the price columns. However, the mean value is not always useful as a measure. For example, it can be misleading if the data have too many outliers. Thus, I decided to merge the data sets and display the price as a box plot, which is an excellent figure to compactly show a lot of information.
The graph shows that the data sets do indeed have many outliers. After removing these, it is more apparent that Airbnb prices in Boston are notably higher than in Seattle. This applies to the mean¹ (172.06 vs. 127.9) as well as to the median² (150 vs. 100).
¹The mean is the average value. It can be distorted by outliers, which is why it also makes sense to look at the median.
²The median is a measures that is exactly “in the middle” when the measured values are sorted by size. In general, a median divides a data set, sample, or distribution into two equal parts such that the values in one half are no greater than the median value and no smaller in the other.
When analyzing the columns, the ZIP code, in particular, caught my eye, and I wanted to know how the two cities performed when comparing the average price by ZIP code.
The results also indicate that Boston is the more expensive city. Among the top 20 ZIP codes, only two are from Seattle.
After comparing the prices of the two cities, I wanted to answer the question: Which factors had a particularly strong influence on the price?
After I cleaned the data I manually selected which ones might be interesting as indicators for the price. For example, there were some columns related to reviews (e.g. reviews_per_month, review_scores_rating, review_score_cleanliness). I thought it stands to reason that these columns might have an impact on the price. Other interesting columns were amenities and zipcode.
I then ran a linear regression to see how well a model could using my chosen characteristics could predict the price. I choose the R2-Score³ and the Root mean squared error (RMSE)⁴ as measures. The initial calculation of the measures showed the following results:
- Boston: R2-Score = 0.62, RMSE = 65.14
- Seattle: R2-Score = 0.57, RMSE = 44.15
³The R2-Score indicates how much scatter in the data can be “explained” by an existing linear regression model. The closer the value is to 1, the better.
⁴The RMSE is the square root of the mean of the square of all of the error. Its use is very common, and it is considered an excellent general-purpose error metric for numerical predictions. The lower the value, the better.
For a data set with so few samples (3500–3800 single listings) and simple linear regression, these values are quite acceptable. But I wanted to know if the values can be improved by further feature engineering. Therefore I performed a simple correlation analysis.
In each case, I selected the features that had a positive or negative correlation coefficient⁵ of 0.1 or more and plotted the top 10 in each case (for Seattle, only 7 were found that fell into the “negative correlation” category).
⁵This measure can take a value between -1 and 1. If 1, there is a complete positive relationship between the dependent and independent variable. A positive relationship means that when one variable increases, the other also increases. If -1, there is a complete negative correlation.
It can be seen that for both cities, features such as beds, bedrooms, and guests included seeming to have a positive influence on price. Reviews per month, on the other hand, are negative for both cities. This could be because people are more inclined to write a review when something does not meet expectations.
I then selected all features that had a correlation coefficient of 0.1 or greater and removed all others from the data set to recalculate the RMSE and R2 score measures. These were the results:
- Boston: R2-Score = 0.59, RMSE = 67.57
- Seattle: R2-Score = 0.57 , RMSE = 45.81
As can be seen, the measures have slightly deteriorated. This may be since the features probably also have relationships with each other that were not detected by the pairwise correlation analysis.
However, when looking at the features that correlate with the price, I also noticed ZIP codes. This brings my to my last question: Are there certain areas in the cities that cause Airbnb accommodation prices to go up?
To answer this, I first plotted the prices per listing on the respective city maps. However, I could not identify any areas for either city that stood out. Therefore, I added another column to my data set, indicating whether each listing was within a ZIP code area that correlated negatively, positively, or not at all with the price. I then plotted the listings on the map again, but this time colored by correlation. Below are the results.
In Boston, the analysis was a bit clearer. Just looking at the prices showed that there is a high supply in the city center. Particularly locations in the West and South End neighborhoods seem to lead to higher prices. Airbnbs in the Aberdeen, Brighton, Rochester, and Allston areas, on the other hand, seem to be more affordable.
In Seattle, I could not identify a negative correlation. However, listings in the center seem to tend to have relatively high prices. This is also obvious since many tourist attractions such as the Seattle Aquarium and the Space Needle are located there.
That’s it! Thank you for reading my article. I had a lot of fun working with geo data and visualizing my results.
You can find all code in my Github.
Skyline of Seattle © Daniel Schwen