In the field of data analytics, the ability to extract meaningful insights from datasets is an invaluable skill. As I delve deeper into SQL, I appreciate the elegance of joins and their profound impact on data analysis.
Let’s take a simple yet intriguing example: identifying cities with the most reviews in a business dataset. The aim is to sort cities in descending order by their number of reviews. The SQL INNER JOIN comes into play here, creating a bridge between the business and review tables based on the business_id.
Here’s a peek at the SQL magic:
SELECT b.city
,COUNT(r.id) AS review_count
FROM business r
JOIN review r
ON b.id = r.business_id
GROUP BY b.city
ORDER BY review_count DESC;
+-----------------+-----------------+
| bu_city | bu_review_count |
+-----------------+-----------------+
| Las Vegas | 193 |
| Phoenix | 65 |
| Toronto | 51 |
| Scottsdale | 37 |
| Henderson | 30 |
| Tempe | 28 |
| Pittsburgh | 23 |
| Chandler | 22 |
| Charlotte | 21 |
| Montréal | 18 |
| Madison | 16 |
| Gilbert | 13 |
| Mesa | 13 |
| Cleveland | 12 |
| North Las Vegas | 6 |
| Edinburgh | 5 |
| Glendale | 5 |
| Lakewood | 5 |
| Cave Creek | 4 |
| Champaign | 4 |
| Markham | 4 |
| North York | 4 |
| Mississauga | 3 |
| Surprise | 3 |
| Avondale | 2 |
+-----------------+-----------------+
(Output limit exceeded, 25 of 67 total rows shown)
This code is crisp and efficient – it pairs each review with the corresponding business and tallies the reviews per city. The result? A clear-cut list of cities ordered by their bustling review activity.
But what if our curiosity extends to quieter cities yet to be reviewed?
That’s where LEFT JOIN steps in, ensuring that every city gets its moment, regardless of whether it has been reviewed:
SELECT b.city
,COUNT(r.id) AS review_count
FROM business b
LEFT JOIN review r
ON b.id = r.business_id
GROUP BY b.city
ORDER BY review_count ASC
+--------------+--------------+
| city | review_count |
+--------------+--------------+
| Aberdour | 0 |
| Ahwahtukee | 0 |
| Ambridge | 0 |
| Amherst | 0 |
| Anjou | 0 |
| Anthem | 0 |
| Aspinwall | 0 |
| Avon | 0 |
| Avon Lake | 0 |
| Bay Village | 0 |
| Bedford | 0 |
| Beeton | 0 |
| Belleville | 0 |
| Belmont | 0 |
| Berry | 0 |
| Bethel Park | 0 |
| Black Earth | 0 |
| Blainville | 0 |
| Boisbriand | 0 |
| Bolton | 0 |
| Boucherville | 0 |
| Braddock | 0 |
| Bradford | 0 |
| Brecksville | 0 |
| Bridgeville | 0 |
+--------------+--------------+
(Output limit exceeded, 25 of 362 total rows shown)
When we use LEFT JOIN, cities that have no reviews are not excluded from our analysis. Instead, they are included with a review count of zero. This approach can be constructive for businesses that want to understand their presence in different locations or identify untapped markets.
The power of SQL is not just in the data retrieved but in the narrative it helps to weave. It can guide business decisions, highlight growth areas, or reveal customer behaviour patterns.
It’s been an exciting journey of translating business questions into SQL queries and uncovering stories hidden in the data.
As I continue to explore and share these SQL encounters, I encourage fellow data enthusiasts to dive into their datasets with these techniques. A wealth of insights is waiting to be discovered, and SQL is the key to unlocking them.
Leave a Reply