Top Percentile Return Rates by Region
Problem
The `returns` DataFrame holds the return rate of each product line within a sales region: `line_id`, `region`, and `return_rate` (a decimal between 0 and 1, higher is worse).
The quality team wants the worst offenders within each region. For every region, find the product lines whose return rate sits in the top 5 percentile of that region, where 'top 5 percentile' means a percent-rank of at most 0.05 when the lines are ranked from highest return rate to lowest within the region. Percent-rank here is the SQL `PERCENT_RANK` definition: (rank - 1) / (count - 1), with ties sharing the lowest rank, and a single-row region having percent-rank 0.
Return `line_id`, `region`, and `return_rate`, ordered by `region` ascending, then `return_rate` descending, then `line_id` ascending.
Input data
Example rows — the live problem includes the full dataset.
| line_id | region | return_rate |
|---|---|---|
| 1 | Andes | 0.92 |
| 2 | Andes | 0.68 |
| 3 | Andes | 0.17 |
| 4 | Baltic | 0.94 |
| 5 | Baltic | 0.81 |
Expected output
Your answer should return 4 rows with the columns line_id, region, return_rate.
Starter code (Pandas (Python))
import pandas as pd
def top_percentile_returns(returns) -> pd.DataFrame:
# Your code here
return returnsSolve this Pandas question free
Write Pandas (Python) and run it instantly in your browser — even on your phone. No signup needed to try.
Solution & explanation
Create a free account to unlock the optimal solution, a step-by-step explanation, and the hidden test cases that grade your answer.
Sign up free to unlock