AnalystPath

Find Duplicate Customers by Normalized Email

PandasEasyJunior level~15 min

Problem

**[Classic — data cleaning]**

Customers sometimes appear with email case variants (`Alice@example.com`,
`alice@example.com`, `ALICE@EXAMPLE.COM`) — they should be treated as one.

Find emails that have **2 or more** occurrences after lowercasing.

Return columns: `normalized` (lowercase email), `duplicate_count`.
Order by count descending then normalized.

Input data

Example rows — the live problem includes the full dataset.

customers
customer_idemail
1Alice@example.com
2alice@example.com
3ALICE@EXAMPLE.COM
4bob@example.com
5BOB@example.com

Expected output

Your answer should return 2 rows with the columns normalized, duplicate_count.

Starter code (Pandas (Python))

import pandas as pd

def find_email_dups(customers: pd.DataFrame) -> pd.DataFrame:
    # Your code here
    return customers

Solve this Pandas question free

Write Pandas (Python) and run it instantly in your browser — even on your phone. No signup needed to try.

Solution & explanation

Create a free account to unlock the optimal solution, a step-by-step explanation, and the hidden test cases that grade your answer.

Sign up free to unlock

Related Pandas questions