This commentary is meant to give a brief outline of what is required to solve each problem and what challenges may be involved. It can be used to guide anyone interested in trying to solve these problems, but is not intended to describe all the details of a correct solution. We encourage everyone to try and implement these details on their own using their programming language of choice.
Attempts to solve past problems can be submitted on the CCC Online Grader. Also, after a contest is complete, official test data is available by clicking the corresponding “Download” button on our Past Contests webpage.
To help Bocchi determine how much tape she needs, we need to do some careful counting.
For the first subtask, we can count \(B\), the number of black triangles, which is the number of times 1 appears in the input. The correct answer will be \(3\times B\). Note that we can ignore the second line of input which is also true for the second subtask. However, for this second subtask, there could be adjacent black triangles and Bocchi does not need to place warning tape on the common side in these cases. Notice that each of these common sides contributes two to the value of \(3 \times B\). This means we obtain the correct final answer by subtracting two from \(3 \times B\) for each pair of adjacent black triangles.
The same principle applies to the third subtask but we also have to look for black triangles that are adjacent but in different rows. These can only occur at even-numbered positions (assuming we start counting at zero).
It is possible to solve this problem without making the observation that you can subtract from \(3 \times B\). This approach involves looking for all the places where warning tape needs to be placed. An algorithm is needed which includes sides of triangles on the perimeter of the pathway as well as sides between adjacent triangles which are not the same colour.
Regardless of which approach is taken, many solutions to the first three subtasks will be quite efficient. However, the fourth subtask is meant to ensure this by disallowing overly slow solutions. It specifically requires that not too many passes are made over either row. More formally, only a constant number of passes of each row is allowed. Solutions that miss this requirement are probably too slow because they use a built-in function within a loop which itself loops through a row of triangles.
To solve this problem, we will find the asymmetric value for each crop and compute the minimum asymmetric value for each crop’s corresponding length. For every subtask, we find different ways to compute the asymmetric value of each crop.
Considering iterating over all crops. A crop is defined by its left endpoint and right endpoint. Hence, we can use a nested for loop to first fix the left endpoint and then fix the right endpoint. There are other ways to loop through crops, such as looping through its left endpoint and its length and vice versa. To find the asymmetric value of a crop, we can have an accumulator and sum up the absolute difference for every such pair by looping over \(i\) defined in the problem statement. There are \(N\cdot(N-1)/2= N^2/2 - N/2\) such crops, and we need about \(N\) calculations on average to to compute the asymmetric value of each crop; hence, it runs in time that is roughly cubic in the number of mountains.
For this subtask, we note that if we were to find the asymmetric value of each crop, then we have to do it faster than roughly \(N\) time. Now consider a crop with a left endpoint of \(l\) and a right endpoint of \(r\). We will take a closer look at the formula to compute the asymmetric value: \[|h_l - h_r| + |h_{l+1} - h_{r-1}| + |h_{l+2} - h_{r-2}| + \cdots + |h_{l + t} - h_{r - t}|\] where \(t = \lfloor \frac{r-l}{2} \rfloor\). Note that because the array is non-decreasing in this subtask, then it follows that for mountains with indices \(i \le j\), we have that \(|h_i - h_j| = h_j - h_i\). Hence the above formula is equivalent to \[(h_r - h_l) + (h_{r-1} - h_{l+1}) + (h_{r-2} - h_{l+2}) + \cdots + (h_{r - t} - h_{l + t}).\] Let \(m = \frac{l+r}{2}\), which represents the midpoint of the mountains (it may not be an integer, but that does not matter). Notice that for every term whose index is greater than \(m\), it is being added, and for every term whose index is less than \(m\) it is being subtracted. We can then rearrange the formula to become \[h_r + h_{r-1} + h_{r-2} + \cdots + h_{r-t} - (h_l + h_{l+1} + h_{l+2} + \cdots + h_{l + t}).\] Note that we have simplified this problem into static range sum queries. We can compute them quickly by building a prefix sum array of the mountains’ heights (i.e., an array where the \(k\)th element is the sum of the first \(k\) elements of \(h\)) before iterating through all the crops. Note that to calculate the asymmetric crop, in this case, it will be constant time (i.e., independent of \(N\)), so this runs in quadratic time in the number of mountains.
We will now let \([l, r]\) represent a crop where \(l \le r\) are the left and right endpoints, respectively. Now for this subtask, let us take a closer look at the formula for the crop \([l, r]\): \[|h_l - h_r| + (|h_{l+1} - h_{r-1}| + |h_{l+2} - h_{r-2}| + \cdots + |h_{l + t} - h_{r - t}|).\] Notice that the terms in brackets are the asymmetric value of the crop \([l+1, r-1]\). If we have a way to iterate through the crops so that crop \([l+1, r-1]\) comes right before \([l, r]\), we can add \(|h_l - h_r|\) directly to the accumulator to get the asymmetric value of crop \([l, r]\). This motivates us to cleverly iterate over the crops by fixing the midpoint and expanding outwards while maintaining an accumulator for each midpoint. Note that the midpoint will not be an actual mountain for crops with even length. This runs in time which is the square of the number of mountains. As we move from crop \([l+1, r-1]\) to \([l, r]\), we only add a term to the accumulator. Note that there exist dynamic programming solutions, but these were not necessary to receive full marks.
For this subtask, since \(R = 1\)
and \(C = 1\), we can try and make only
the first row and column a palindrome. One way to do this is to fill the
first row and column with the letter a
and fill the rest of
the grid with the letter b
.
Because there are so few cases, it suffices to solve this subtask on paper and hardcode the answers. To reduce the amount of casework, notice that if we have a solution for \((R, C)\), we can get a solution for \((C, R)\) by flipping the grid.
Another possible approach is to write a brute force since there are only four important possibilities for each cell.
This subtask is intended to aid students in thinking about the problem.
The solutions to subtasks 1 and 2 should give some intuition here. From subtask 1, we propose the following general solution:
Fill the first \(R\) rows and the
first \(C\) columns with the letter
a
and fill the rest of the grid with the letter
b
.
We notice that this fails when \(R = 0\), \(C = 0\), \(R = N\) or \(C = M\). We can handle these in pairs since we can flip the grid.
If \(R = 0\), we can fill the first
\(M - 1\) columns with the letter
a
, and the last column with the letter b
, and
then increment the last \(M - C\)
characters of the last row. For instance, for the input
4 4 0 2
we can output:
aaab
aaab
aaab
aabc
If \(R = N\), every row must be a
palindrome. Because of this, starting from a full grid of a
characters, we can easily reduce the number of palindromic columns by
two at a time by making matching columns of the first row b
characters. For instance, for the input 4 4 4 2
, we can
output:
baab
aaaa
aaaa
aaaa
However, this fails if \(C\) is not the same parity as \(M\), that is, if we cannot get \(C\) by subtracting \(2\) an integer number of times from \(M\).
In this case, it depends: if \(M\)
is odd, we can use the middle column. For instance, for the input
5 5 5 2
:
babab
aaaaa
aaaaa
aaaaa
aaaaa
However, the input 4 4 4 1
is impossible since there is
no middle column.
We can handle \(C = 0\) and \(C = M\) symmetrically by flipping the inputs, processing, and then flipping the output.
This subtask can be solved by running an MST (Minimum Spanning Tree) algorithm that only considers the cost of each edge. Note that we may have multiple components, so our final answer will be the sum of costs of the MST of each component. Luckily, an algorithm like Kruskal’s handles this problem without any additional complexity.
The constraints of this subtask allow us to focus on an invariant that is very useful in the long run. Consider a single edge \(e = (a, b, l, c)\). We get the following cases:
If \(e\) is the unique shortest path from \(a\) to \(b\), we definitely want to take \(e\);
If there exists a path from \(a\) to \(b\) with length \(l'<l\), we definitely don’t want to take \(e\);
If there exists another path from \(a\) to \(b\) with length \(l'=l\) (but no paths that are shorter), we still don’t want to take \(e\). However, this time the proof is much more complex:
Suppose \(p\) is a path from \(a\) to \(b\) consisting of edges \(\{p_1, p_2, \ldots, p_k\}\) such that \[\sum_{i=1}^{k-1} \text{length}(p_i) = l\] Now, suppose that \(\forall i, p_i\) is the unique shortest path between its endpoints \(a(p_i)\) and \(b(p_i)\). If this is not the case for some \(i\), then we can find another path from \(a(p_i)\) to \(b(p_i)\) with length exactly \(\text{length}(p_i)\) and replace \(p_i\) with that path (if that path had length \(< \text{length}(p_i)\) then we would have a path from \(a\) to \(b\) with length \(< l\), which is a contradiction). Observe that this process can only be repeated a finite number of times.
In the end, we have a path from \(a\) to \(b\) that has length \(l\), does not take the edge \(e\), and every edge in that path is the unique shortest path between its endpoints. We’ll call the edges on this path \(q_1, \ldots, q_m\).
Now consider some optimal answer that contains the edge \(e\). Clearly, \(q_1, \ldots, q_m\) must be in the answer. However, this means that we can remove \(e\) as any time we need to traverse from \(a\) to \(b\), we can take the edges \(q_1, \ldots, q_m\) instead. This violates the optimality of the answer and is thus a contradiction.
This results in a very simple solution for this subtask: we loop over every edge \(e = (a, b, l, c)\) and check whether it’s the unique shortest path between \(a\) and \(b\). One way to do this is to first run Dijkstra’s algorithm \(n\) times to find the shortest path between every pair of nodes (and store it in a 2-D array dist). Then, for each edge we check if \[l < \min_{1 \le i \le n, i \neq a, i \neq b}(\text{dist}[a][i] + \text{dist}[i][b])\]
Remark: Notice that we don’t even care about costs at all! In fact, the original problem didn’t have costs (\(l_i = c_i\)), but we thought it was an interesting extension :)
Unfortunately, we lose the guarantees that allow our solution from subtask 2 to work. Fortunately, we can restore these guarantees with some clever reductions.
First, let’s look at the graph formed by all of the \(0\)-length edges. If any pair of nodes \((a, b)\) are connected in this graph, then our answer must contain a path of length \(0\) between \(a\) and \(b\). Thus, we must choose a subset of the \(0\)-length edges that has the same connectivity properties as the original graph (i.e. if \((a, b)\) are connected by \(0\)-length edges in the original graph, then they must be connected by \(0\)-length edges in our answer). Trying to do so while also minimizing cost is analogous to finding the minimum spanning forest of the graph, which is the same as our solution in subtask 1.
Unfortunately, the \(0\)-length edges are still part of our graph. In order to remove them for good, we must make the following observation: we can treat each component in our spanning forest as a single node as we’re able to travel between any two nodes in that component with a path of length \(0\). This motivates us to compress our initial graph by these components to remove all \(0\)-length edges.
Note: Be careful, the compressed graph may also contain self-edges so make sure to ignore them.
Next, to remove multiedges notice that we only ever want at most one edge between any pair of nodes. Thus, we take the one with lowest length, breaking ties by cost.
Finally, we run our solution from subtask 2 and sum the answer we obtained from that solution with the one from computing the minimum spanning forest.
Alternative Solution
There is also an alternative solution much closer to Kruskal’s algorithm for computing MST: we loop over the edges in order of length, breaking ties by cost and adding it to the graph if it improves the shortest path between its endpoints. This solution also has the added benefit of not worrying about \(0\)-length edges or multi-edges separately.
The answers for \(N = 3\) are \(0, 1, 2, 3\).
Suppose the answers for \(N = 3^k\) are \(a_1, a_2, \dots, a_M\). The answers for the first third of \(N = 3^{k+1}\) are \[a_1, a_2, \dots, a_M\]
The Cantor set has copies of the same structure. In particular, the first third of the Cantor set is identical to the last third. The remaining answers for \(N = 3^{k+1}\) are \[2 \times 3^k + a_1, 2 \times 3^k + a_2, \dots, 2 \times 3^k + a_M\]
These observations are enough to generate the answer for any power of \(3\).
Note that \(\frac{34676}{51169}\) is covered by the 49th filter, but not any earlier filter. This is the maximum record for \(N \le 10^5\). A naive approach is expected to fail on \(N = 51169\).
This subtask requires deeper observations about Alice’s filters.
Property 1 (Filters are symmetric). If \(r\) is covered by the \(k\)-th filter, then \(1-r\) is covered by the \(k\)-th filter. Likewise, if the \(k\)-th filter does not cover \(r\), then \(1-r\) is not covered by the \(k\)-th filter.
Property 2 (The \(k\)-th filter and \((k+1)\)-th filter are related). Let \(k\) be a positive integer, and let \(0 \le r \le \frac 1 3\). If \(r\) is covered by the \((k+1)\)-th filter, then \(3r\) is covered by the \(k\)-th filter. Likewise, if \(r\) is not covered by the \((k+1)\)-th filter, then \(3r\) is not covered by the \(k\)-th filter.
The proof of these 2 properties is left as an exercise. From these properties, it becomes natural to define the function \(f(r) = 3 \times \min(r, 1-r)\). Here are a few theorems about \(f\).
Theorem 1. If \(r\)
is not covered by any filter, then \(f(r)\) is not covered by any filter.
Proof. Apply Property 1 and Property 2.
Theorem 2. Suppose \(r\) is covered by the \(k\)-th filter. Then either \(k = 1\), or \(f(r)\) is covered by the \((k-1)\)-th filter.
Proof. Apply Property 1 and Property 2.
Here is the approach to solving subtask 2:
Let \(r = \frac x N\). Construct this sequence: \(r, f(r), f(f(r)), f(f(f(r))), \dots\)
If \(r\) is in the Cantor set, then by Theorem 1, every term in the sequence is in the Cantor set. Since every term is a multiple of \(\frac 1 N\), the sequence will enter a cycle.
If \(r\) is not in the Cantor set, then \(r\) is covered by a filter. By Theorem 2, a term in the sequence will be covered by the first filter.
There are 2 outcomes. Either the sequence will enter a cycle, or a term will be covered by the first filter. The algorithm needs to handle both outcomes. The intended time complexity is \(O(N)\) or slightly slower.
Alternatively, it is enough to ignore cycle detection and construct the first \(49\) terms. In the worst case (i.e. \(r = \frac{34676}{51169}\)), the \(49\)-th term is covered by the first filter.
Note that \(\frac{222534799}{867579680}\) is covered by the 130th filter, but not any earlier filter. The author believes this is the maximum record for \(N \le 10^9\).
Firstly, use the first 18 filters to remove most of the numbers. After this step, less than \(10^6\) numbers remain.
Next, apply the algorithm from subtask 2 on each of the remaining numbers. It may be a good idea to use memoization to improve time complexity. There are other tricks to improve performance.
The intended time complexity is \(O(N^{2/3} \log N)\) because there are \(O(N^{2/3})\) numbers to consider, and each number takes roughly \(O(\log N)\) to consider.