Render Node Utilisation

PandasHardSenior level~10 min

Problem

A render farm logs every job in a DataFrame `render_jobs` with columns `job_id`, `node_id`, `begin_at`, and `finish_at` (the last two are datetime strings). Jobs on the same node can overlap in time.

For each node, report two figures:
- `busy_hours`: the total number of whole hours the node was busy with at least one job. Overlapping jobs are merged so shared time is counted only once, and the merged total is then floored to whole hours.
- `peak_parallel_jobs`: the largest number of jobs that ran simultaneously at any instant.

Return columns `node_id`, `busy_hours`, `peak_parallel_jobs`, ordered by `node_id`.

Input data

Example rows — the live problem includes the full dataset.

render_jobs

job_id	node_id	begin_at	finish_at
1	501	2024-03-10 08:00:00	2024-03-10 09:00:00
2	501	2024-03-10 08:30:00	2024-03-10 10:30:00
3	501	2024-03-10 11:00:00	2024-03-10 12:00:00
7	501	2024-03-10 13:00:00	2024-03-10 15:30:00
4	502	2024-03-10 09:00:00	2024-03-10 10:00:00

Expected output

Your answer should return 3 rows with the columns node_id, busy_hours, peak_parallel_jobs.

Starter code (Pandas (Python))

import pandas as pd

def render_node_utilisation(render_jobs) -> pd.DataFrame:
    # Your code here
    return render_jobs

Solve this Pandas question free

Write Pandas (Python) and run it instantly in your browser — even on your phone. No signup needed to try.

Solve it now → Create free account

Solution & explanation

Create a free account to unlock the optimal solution, a step-by-step explanation, and the hidden test cases that grade your answer.