Prefix Sum with HashMap: Time Complexity Optimization

A very cool way of using Prefix Sum with Hash-map is shown in this post. I am going to break down this LeetCode 560 problem to reveal the idea of lowering down the time complexity by adopting Hash-map (or dictionary in Python).

Prefix Sum with HashMap: Time Complexity Optimization
Prefix Sum and HashMap

LeetCode 560

Subarray Sum Equals K - LeetCode
Can you solve this real interview question? Subarray Sum Equals K - Given an array of integers nums and an integer k, return the total number of subarrays whose sum equals to k. A subarray is a contiguous non-empty sequence of elements within an array. Example 1: Input: nums = [1,1,1], k = 2 Output: 2 Example 2: Input: nums = [1,2,3], k = 3 Output: 2 Constraints: * 1 <= nums.length <= 2 * 104 * -1000 <= nums[i] <= 1000 * -107 <= k <= 107

LeetCode 560

Reviewing question 560, an array nums and a target k is given to determine the number of sub-arrays res which having sum of interval equals k. Such that, giving array nums = [1, 1, 1, 1, 1, 1] and k = 4. The result res will be 3, target sub-array set contains nums[0:3]nums[1:4] and nums[2:5].

Adopting Prefix Sum

Since the time complexity of calculating sum of interval for a specific interval is $O(n)$:

If we want to calculate interval of sum more than one time, such as the problem 560, we should utilize Prefix Sum, which, $O(n)$ for constructing pref array and $O(1)$ for querying a specific interval, say [L, R]:

Back to the Problem

In the problem, we need to check sum of intervals of each specific pair of L and R, since there are negative elements in nums.

The brute force code can be easily written using two for loop, which incurs a square time complexity $O(n^2)$.

From observation, we notice that:

When we considering each R, to find sum of interval starts with each L ranging from 0 to R, a linear time is needed to deduct the pref[R] by each pref[L].

Since scanning each pref[L] linearly is a little bit waste of time, so we can consider utilizing some spaces to trade time. We can adopt a dictionary (HashMap).

Consider that:

  • In the inner for loop of L, we want to know how many L's can satisfy pref[R] - pref[L] == k, equivalent to pref[L] == pref[R] - k.
  • We adopt a dictionary d to maintain number of L's having Prefix Sum as the key of d, hence d[i] means number of L satisfying pref[L] == i.
  • Then we break the inner loop by res += d[pref[R] - k], that's wise, isn't it?
  • Finally we update the dictionary d by the loop of R (refer to the following code).
  • Note that we should initalize d by d = {0: 1}, think about why we should do so. (That is similar to we initialize pref using pref=[0]. Because there can be a sum of interval starting from the first element)

We have cancelled out pref[L]. We can also discard pref[R]. The reason of it is R is in the outer loop, and it can totally combined with the previous loop for calculating pref. We also discard the pref array to make it calculate-on-use.

Conclusion

We use a dictionary (HashMap) to replace the inner for loop for variable L, and we modify the code to eliminate the need for the 'pref' array. As a result, the time complexity is reduced from quadratic time to linear time. This technique is quite useful and can be applied to more LeetCode problems.