Table: Person
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Person table: +----+---------+ | id | email | +----+---------+ | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com | +----+---------+ Output: +---------+ | Email | +---------+ | a@b.com | +---------+ Explanation: a@b.com is repeated two times.
We can use the GROUP BY
statement to group the data by the email
field, and then use the HAVING
statement to filter out the email
addresses that appear more than once.
import pandas as pd
def duplicate_emails(person: pd.DataFrame) -> pd.DataFrame:
results = pd.DataFrame()
results = person.loc[person.duplicated(subset=["email"]), ["email"]]
return results.drop_duplicates()
# Write your MySQL query statement below
SELECT email
FROM Person
GROUP BY 1
HAVING COUNT(1) > 1;
We can use a self-join to join the Person
table with itself, and then filter out the records where the id
is different but the email
is the same.
SELECT DISTINCT p1.email
FROM
person AS p1,
person AS p2
WHERE p1.id != p2.id AND p1.email = p2.email;