Leetcode M347 Top K Frequent Elements Optimization
M347. Top K Frequent Elements description: Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order. Constraints: $1 <= nums.length <= 10^5$ $-10^4 <= nums[i] <= 10^4$ $k$ is in the range [1, the number of unique elements in the array]. It is guaranteed that the answer is unique.
A very easy problem for a medium. But I did a few modifications to the typical approach. The code below is the “fastest” 0ms submission, which takes 12~20ms currently.
struct myComp {
constexpr bool operator()(
pair<int, int> const& a,
pair<int, int> const& b)
const noexcept
{
return a.second < b.second;
}
};
class Solution {
public:
vector<int> topKFrequent(vector<int>& nums, int k) {
unordered_map<int,int> mp;
for(int i:nums) mp[i]++;
priority_queue<pair<int,int>, vector<pair<int,int>>, myComp> pq;
for(auto i:mp) pq.push(i);
vector<int> ans;
for(int i = 0; i < k; i++) {
auto top = pq.top();
ans.push_back(top.first);
pq.pop();
}
return ans;
}
};
First we count the frequencies, then we find the top $k$ ones with the highest frequency. The code above used a priority queue (pq) to record the order, but adding a number to a pq is $O(\log n)$, and adding $n$ numbers is $O(n\log n - n)$, which is not much faster than using a vector and sorting it. We can do much better with the following code:
class Solution {
public:
vector<int> topKFrequent(vector<int>& nums, int k) {
unordered_map<int,int> mp;
for(int i:nums) mp[i]--;
priority_queue<array<int,2>> pq;
int i=0;
auto it=mp.begin();
for(;i<k;++i,++it) pq.emplace(array<int,2>{it->second,it->first});
for(;it!=mp.end();++it){
if(it->second>pq.top()[0]) continue;
pq.emplace(array<int,2>{it->second,it->first});
pq.pop();
}
vector<int> ans;
while(!pq.empty()){
ans.emplace_back(pq.top()[1]);
pq.pop();
}
reverse(ans.begin(),ans.end());
return ans;
}
};
The trick is, since we only need the top $k$ ones, we can just store those in the pq. And each time we find a value that is larger than the smallest one in the pq, we remove the smallest one and add the new one. So we actually need a pq that implememts a minimum heap. We can do that by writing a custom comparator, but that’s unnecessary - if we negate the numbers, the result of comparison would be inverted. So in the first step, instead of adding the frequency count, we subtract it.
Then we add the first $k$ elements, then each time if we encounter an element that has a higher frequency then the minimum frequency that we have, we remove the minimum frequency one and add the new one. The operation is only $O(\log k)$. In the end, we extract the corresponding values, but they will be in the order of small to large, so we need to reverse the order, which is $O(k)$. It’s a much better algorithm when $n»k$.