Leetcode M1318 Minimum Flips to Make a OR b Equal to c, E1351 Count Negative Numbers in a Sorted Matrix
M1318 description: Given 3 positives numbers a, b and c. Return the minimum flips required in some bits of a and b to make ( a OR b == c ). (bitwise OR operation). Flip operation consists of change any single bit 1 to 0 or change the bit 0 to 1 in their binary representation. Constraints: 1 <= a <= $10^9$ 1 <= b <= $10^9$ 1 <= c <= $10^9$
This problem is very easy, a typical algorithm is to go through each bit position of the three numbers. If $c$ at that position is 1 but $a|b$ at that position is 0, we add one flip, and if $c$ is 0 we add one if $a$ is 1 and another one if $b$ is 1 at that position. (Since the numbers are positive, we don’t need to check the 32nd bit.)
But putting these together, let’s make a table:
c 1, a|b 0 : add 1 c 0, a&b 1 : add 2 c 0, a|b 1 : add 1
We can do the bitwise operations to the numbers beforehand, instead of checking each digit in the loop. So here’s the first version that I submitted:
class Solution {
public:
int minFlips(int a, int b, int c) {
int r=0;
int t=1;
int abc=(a|b)^c,aAb=a&b;
for(int i=0;i<31;++i){
if(abc&t){
r++;
if(aAb&t) r++;
}
t<<=1;
}
return r;
}
};
Then I noticed that, I can also combine the ‘&’ operation to get rid of the branching. The version without the branching reads,
class Solution {
public:
int minFlips(int a, int b, int c) {
int r=0;
int abc=(a|b)^c,aAb=a&b&abc;
for(int i=0;i<31;++i) r+=(abc>>i&1)+(aAb>>i&1);
return r;
}
};
At this point, it’s obvious that we are simply counting the number of ‘1’s in the two numbers in binary, which is known as a the Hamming weight or population count. There is a built-in function in GCC compiler for that, “__builtin_popcount(int)”, so we can write it like this:
class Solution {
public:
int minFlips(int a, int b, int c) {
uint abc=(a|b)^c,aAb=a&b&abc;
#ifdef __GNUG__
return __builtin_popcount(abc)+__builtin_popcount(aAb);
#else
int r=0;
for(int i=0;i<31;++i) r+=(abc>>i&1)+(aAb>>i&1);
return r;
#endif
}
};
I did some research and noticed that llvm also supports the function “__builtin_popcount”, and MSVC has these functions “__popcnt16, __popcnt, __popcnt64”. We can use these functions according to the compiler.
Then I found this webpage, which lists the best bit hacks for many computation tasks currently known. I’m fascinated by the esoteric beauty of the algorithms and magic numbers. Using the algorithm in the list, we can write the solution as
#ifndef __GNUG__
int popC_32(uint v){
v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
return ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}
#endif
class Solution {
public:
int minFlips(int a, int b, int c) {
ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
uint abc=(a|b)^c,aAb=a&b&abc;
#ifdef __GNUG__
return __builtin_popcount(abc)+__builtin_popcount(aAb);
#else
return popC_32(abc)+popC_32(aAb);
#endif
}
};
But since many CPUs support the instruction “POPCNT”, it’s probably better to use the built-in functions if they are available.
E1351 description: Given a m x n matrix grid which is sorted in non-increasing order both row-wise and column-wise, return the number of negative numbers in grid. Constraints: m == grid.length n == grid[i].length 1 <= m, n <= 100 -100 <= grid[i][j] <= 100
An O(m+n) algorithm is easy to find,
class Solution {
public:
int countNegatives(vector<vector<int>>& grid) {
const int m=grid.size(),n=grid[0].size();
int k=n-1,r=0;
for(int i=0;i<m;++i){
while(k>=0&&grid[i][k]<0) k--;
if(k<0){r+=(m-i)*n;return r;}
r+=n-1-k;
}
return r;
}
};
We may try to use binary search to increase speed,
class Solution {
public:
int countNegatives(vector<vector<int>>& grid) {
const int m=grid.size(),n=grid[0].size();
int k=n,r=0;
for(int i=0;i<m;++i){
k=upper_bound(grid[i].begin(),grid[i].begin()+k,0,greater())-grid[i].begin();
if(k==0){r+=(m-i)*n;return r;}
r+=n-k;
}
return r;
}
};
But now it takes $O(m\log(n))$, which may be slower than $O(m+n)$. Also, if $m>n$, it would be better to do binary search in the columns, in which case we must write our own upper bound function. Also, because we’ll access elements in different vectors, the overhead may be larger.
Let’s consider the complexity. For $m\log(n)$ to be less than $m+n$, we need $n>m*(\log(n)-1)$. Considering that $n<100$, it can be satisfied if $n>6m$. But also considering the overhead and the constant coeffecient, we may try to use a larger criterion, something like this:
int upperB(vector<vector<int>>& grid,int i,int k){
if(grid[0][i]<0) return 0;
int l=0,r=k;
while(r-l>1){
k=(l+r)/2;
if(grid[k][i]>=0) l=k;
else r=k;
}
return r;
}
class Solution {
public:
int countNegatives(vector<vector<int>>& grid) {
const int m=grid.size(),n=grid[0].size();
if(n>8*m){
int k=n,r=0;
for(int i=0;i<m;++i){
k=upper_bound(grid[i].begin(),grid[i].begin()+k,0,greater())-grid
[i].begin();
if(k==0){r+=(m-i)*n;return r;}
r+=n-k;
}
return r;
}
else if(m>64*n){
int k=m,r=0;
for(int i=0;i<n;++i){
k=upperB(grid,i,k);
if(k==0){r+=(n-i)*m;return r;}
r+=m-k;
}
return r;
}
else{
int k=n-1,r=0;
for(int i=0;i<m;++i){
while(k>=0&&grid[i][k]<0) k--;
if(k<0){r+=(m-i)*n;return r;}
r+=n-1-k;
}
return r;
}
}
};
But considering that the size of the input is quite small, it won’t show much difference in this case, unless there’re a lot of test cases where the input has one dimension much larger than the other.
Then I thought, instead of doing binary search vertically or horizontally, what if we do it… diagonally? The good thing about it is, we can be sure that we can get rid of at least half of the inputs after each step, because if we choose any point on the diagonal and cut the rectangle into four smaller ones, the total area of the top left and the bottom right must be at least half of the area of the original rectangle.
The following code implements this algorithm:
int negCount(int tl_r,int tl_c,int br_r,int br_c,vector<vector<int>>& grid){
//cout<<"tl is {"<<tl_r<<","<<tl_c<<"}, br is {"<<br_r<<","<<br_c<<"},";
if(tl_r==br_r||tl_c==br_c) return 0;
if(br_r-tl_r==1){
if(grid[tl_r][tl_c]<0){
//cout<<"\nadded col count="<<br_c-c1<<endl;
return br_c-tl_c;
}
int c1=tl_c,c2=br_c,cm;
while(c2-c1>1){
cm=(c1+c2)/2;
if(grid[tl_r][cm]>=0)c1=cm;
else c2=cm;
}
//cout<<"\nadded col count="<<br_c-c2<<endl;
return br_c-c2;
}
if(br_c-tl_c==1){
if(grid[tl_r][tl_c]<0){
//cout<<"\nadded row count="<<br_r-r1<<endl;
return br_r-tl_r;
}
int r1=tl_r,r2=br_r,rm;
while(r2-r1>1){
rm=(r1+r2)/2;
if(grid[rm][tl_c]>=0)r1=rm;
else r2=rm;
}
//cout<<"\nadded row count="<<br_r-r2<<endl;
return br_r-r2;
}
int r1=tl_r,r2=br_r,c1=tl_c,c2=br_c,rm,cm;
if(grid[r1][c1]<0){r2=r1;c2=c1;}
else{
while(r2-r1>1&&c2-c1>1){
rm=(r1+r2)/2;
cm=(c1+c2)/2;
if(grid[rm][cm]>=0){r1=rm;c1=cm;}
else{r2=rm;c2=cm;}
}
while(r2-r1>1){
rm=(r1+r2)/2;
if(grid[rm][cm]>=0)r1=rm;
else r2=rm;
}
while(c2-c1>1){
cm=(c1+c2)/2;
if(grid[rm][cm]>=0)c1=cm;
else c2=cm;
}
}
//cout<<"r1 is "<<r1<<",c1 is "<<c1<<"\n";
//cout<<"r2 is "<<r2<<",c2 is "<<c2<<"\nadded count="<<(br_r-r2)*(br_c-c2)<<endl;
return (br_r-r2)*(br_c-c2) +negCount(tl_r,c2,r2,br_c,grid) +negCount(r2,tl_c,br_r,c2,grid);
}
class Solution {
public:
int countNegatives(vector<vector<int>>& grid) {
const int m=grid.size(),n=grid[0].size();
return negCount(0,0,m,n,grid);
}
};
What’s the complexity of this algorithm? Let’s take a look. Without loss of generality, let’s assume $m>n$. The first function call will take log(n) first, when c1 and c2 meet, then it will take log(m/n) to let r1 and r2 meet, so it will take O(log(m)). Next, the complication starts. The time it takes depends on the shapes of the two rectangles that result from the first call, so it’s hard to tell. If the index of the point where the positive and negative meet on the diagonal is (m1,n1), then the complexity of the next function call is log(max(m-m1,n1))+log(max(m1,n-n1)). The maximum it can get is log(m)+log(n), which is when the point is at the lower right corner - but that also means, we only has two rows left and we just need to do two more 1D upper-bound search. Considering this, the worse case is probably when the point is at the center. Let’s find out the complexity in this scenario.
Each time the rectangle will be cut into 4 equal smaller ones, two of them will be the input of the next call. The complexity is log(m)+2log(m/2)+4log(m/4)+8log(m/8)+…, but it doesn’t go down to $2^{\log(m)}\log(m/m)$, instead it stops at $2^{\log(n)}\log(m/n)$ because at that point we only have 1D arrays to search.
Summing up the terms, I got that the complexity equals O(f(m,n)) where $m\geq n$ and $f(m,n)=2n\log(m/n)+2n-\log m -2$. When $m\gg n$, it’s approximately O(nlog(m/n)), and when $m=n$, it becomes O(m+n). I think it’s a pretty good algorithm, although it’s hard to tell with such small input sizes.
A variation of this algorithm is, instead of finding the exact cell that separates the positive and negative values, we can stop at the separation line of the smaller dimension, so it takes O(log(min(m,n))) instead of max(m,n). Then the two rectangles that would be the input of the next call will have an overlap in the larger dimension of size around m/n. The complexity analysis of this one would be even more complicated. Anyway, I wonder how the performance would be on a larger scale.