Find Duplicate Customers by Normalized Email
Problem
**[Classic — data cleaning]**
Customers sometimes appear with email case variants (`Alice@example.com`,
`alice@example.com`, `ALICE@EXAMPLE.COM`) — they should be treated as one.
Find emails that have **2 or more** occurrences after lowercasing.
Return columns: `normalized` (lowercase email), `duplicate_count`.
Order by count descending then normalized.
Input data
Example rows — the live problem includes the full dataset.
customers
| customer_id | |
|---|---|
| 1 | Alice@example.com |
| 2 | alice@example.com |
| 3 | ALICE@EXAMPLE.COM |
| 4 | bob@example.com |
| 5 | BOB@example.com |
Expected output
Your answer should return 2 rows with the columns normalized, duplicate_count.
Starter code (Pandas (Python))
import pandas as pd
def find_email_dups(customers: pd.DataFrame) -> pd.DataFrame:
# Your code here
return customersSolve this Pandas question free
Write Pandas (Python) and run it instantly in your browser — even on your phone. No signup needed to try.
Solution & explanation
Create a free account to unlock the optimal solution, a step-by-step explanation, and the hidden test cases that grade your answer.
Sign up free to unlock