Sweep Line (Silver-Gold) Part 1
What's Covered
Basic Setup/Idea of Sweep Line
1D Sweep Line
Prerequisites
Required
Sorting
Basic geometry
Basic time complexity analysis
Before we take on sweep line in its full glory (sweeping a 2D plane), we must understand the basic idea of sweeping. For this, we start much simpler: in a single dimension (1D).
We define some terms. Every sweep line problem, 1D and 2D alike, has four key components:
The scene is the space/setting in which a problem takes place, or "where we are sweeping" and has some dimensions. In particular, the scene may be 1D (a number line) or 2D (a plane) -- as you can infer, "1D sweep line" refers to algorithms where the scene is 1D, and "2D sweep line" refers to algorithms where the scene is 2D.
The critical points are fixed, designated positions in the scene that mark events or points in the problem we care about -- these will be swept by our imaginary sweepline
The sweepline (our key tool in any sweep line algorithm) is an imaginary line that we drag or "sweep" across the scene, processing each critical point (in a certain order, i.e. left to right or top to bottom) based on what is asked in the problem
The active points (or active set) are the critical points being currently processed (other points -- such as those already processed or not yet processed -- are "inactive")
Of these four components, the scene and the critical points are given in the problem itself. Meanwhile, the sweepline and the active points are just elements we use to understand processing in the problem. But from these terms alone, the key takeaway is that the premise behind sweep line is "order the critical points from left to right, and then sweep."
In the 1D case, points are very simple to represent -- a single coordinate suffices, since all points have the same height -- so any good intro to sweep line should start with exposition in 1D. We start basic, with an easy bronze problem:
Fence Painting (2015) asks us, in concise terms, "What is the union of two 1D intervals (or segments) of paint on a fence?" Indeed, sweep line is not needed here; the editorial mentions two valid other solutions:
Use an array to represent the number of coats of paint for each unit of the fence. Then, use two for loops to iterate over the bounds of the painted intervals, incrementally updating the coats of paint for each unit of the fence. Finally, with a single linear scan, count the units with at least one coat of paint.
Do casework in constant time.
Mathematically check a ton of conditions with if statements.
Come up with individual formulas for each case, drawn by hand.
In all, the second solution here is unnecessary: it is casework heavy and does not generalize well to multiple segments. The first solution is more useful and still holds for multiple segments, but does not generalize well for complexity: our coordinates could be extremely big, or we might have so many segments that processing every unit along them would time out.
(As we reason below, however, it can be argued that the first solution is sweep line; still, even with this concession made, it is suboptimal enough to discard.)
Instead, we can frame this in sweep line terms. The problem gives us a 1D scene -- the fence being painted. For critical points, if we simply now view every single unit of the fence as its own critical point, we reach a sweep line solution equivalent to the first one given above (processing along the fence with for loops).
But we can do better! There are really only four (not necessarily distinct) critical points -- the two endpoints for each of two intervals (or, if we have k
intervals, at most 2k
critical points). This gives us the following sweep line idea:
Read in the critical points, labelling each one as the start of an interval or the end of an interval. Then, following the spirit of sweep line, sort these critical points from left to right by coordinate (in increasing order). Now, if we make the observation that we always see the start points of intervals before we see their end points (since intervals always have to start before they end), we can frame our sweep as follows:
Start the sweepline at the first critical point, and maintain a running tally of the active points (an active count) that tells us, at any given point in our sweeping process, how many intervals have not yet ended/are still actively being painted (framed another way, how many more starting points have we encountered so far than ending points, since every pair of start and end perfectly activates and deactivates an interval). As such, every time we encounter a starting point, we can add 1 to this count (a +1 update), and every time we encounter an ending point, we can subtract 1 from this count (a -1 update).
It is easy to see that this count will always be nonnegative -- the number of starting points swept so far is always at least the number of ending points swept so far. Now, if we store the count as cnt
, we have two cases (and maintain two additional coordinates l
and r
):
If the count is positive, we are still painting. When this first happens after a 0 count, we must record the coordinate
l
that it takes place at. To keep track of if the last count was 0, we can explicitly store the last count asprevcnt
and make this check.If the count is 0, all intervals so far have ended and we are no longer painting. When this first happens after a positive count, then we record the coordinate
r
at which to stop painting and addr - l
to the answer.
Below is a demonstration of this approach on the sample case:
The total sweep line implementation is below (I use an array to store the points to avoid having to deal with an extra variable for each point):
Submitting our code, we see that this captures every test case. Of course, you can see that the sweep line technique was overkill for such a simple problem, but if we have a more complex problem with numerous segments, the exact same code works with little change!
As with most sweep line applications, the bottleneck of our solution is sorting the critical points. In the case of k
intervals and 2k
critical points, the optimal sweep line solution thus gives O(2k log(2k)) = O(k log k)
time complexity due to sorting. It can be proven that this is actually the best possible time complexity given an arbitrary number of intervals and range of coordinates.
Restaurant Customers (CSES) asks, "Given all arrival and leaving times of restaurant customers, what was the maximum number of customers present at the same time?"
Again, we have a sweep line formulation: the scene is the timeline of the restaurant (1D, involving the single dimension of time), and the critical points are arrival and departure times. Just as before, it suffices to sort these form left to right and sweep through them with our sweepline, maintaining an active count.
Very similar to before, since the active count represents the number of people currently in the restaurant, we can
+1 update the active count when a customer arrives
-1 update the active count when one leaves
The only difference is, this time, the problem asks us to determine the maximum active count. To do this, we can just max update the answer at every critical point (if possible).
My code:
Last updated