Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.
In this lesson we'll cover:
Let's start by running an INNER JOIN on the Crunchbase dataset and taking a look at the results. We'll just look at company-permalink in each table, as well as a couple other fields, to get a sense of what's actually being joined.
SELECT companies.permalink AS companies_permalink,
companies.name AS companies_name,
acquisitions.company_permalink AS acquisitions_permalink,
acquisitions.acquired_at AS acquired_date
FROM tutorial.crunchbase_companies companies
JOIN tutorial.crunchbase_acquisitions acquisitions
ON companies.permalink = acquisitions.company_permalinkYou may notice that "280 North" appears twice in this list. That is because it has two entries in the tutorial.crunchbase_acquisitions table, both of which are being joined onto the tutorial.crunchbase_companies table.
Now try running that query as a LEFT JOIN:
SELECT companies.permalink AS companies_permalink,
companies.name AS companies_name,
acquisitions.company_permalink AS acquisitions_permalink,
acquisitions.acquired_at AS acquired_date
FROM tutorial.crunchbase_companies companies
LEFT JOIN tutorial.crunchbase_acquisitions acquisitions
ON companies.permalink = acquisitions.company_permalinkYou can see that the first two companies from the previous result set, #waywire and 1000memories, are pushed down the page by a number of results that contain null values in the acquisitions_permalink and acquired_date fields.
This is because the LEFT JOIN command tells the database to return all rows in the table in the FROM lause, regardless of whether or not they have matches in the table in the LEFT JOIN clause.
You can explore the differences between a LEFT JOIN and a JOIN by solving these practice problems:
Write a query that performs an inner join between the table and the tutorial.crunchbase_acquisitions table, but instead of listing individual rows, count the number of non-null rows in each table.tutorial.crunchbase_companies
Modify the query above to be a LEFT JOIN. Note the difference in results.
Now that you've got a sense of how left joins work, try this harder aggregation problem:
Count the number of unique companies (don't double-count companies) and unique acquired companies by state. Do not include results for which there is no state data, and order by the number of acquired companies from highest to lowest.