mirror of
https://github.com/scemama/Bit-Twiddling-Hacks-By-Sean-Eron-Anderson.git
synced 2024-11-09 07:33:43 +01:00
Update README.md
This commit is contained in:
parent
77303a8e9f
commit
c615cebfd4
182
README.md
182
README.md
@ -1618,20 +1618,14 @@ for (int i = 0; i < sizeof(x) * CHAR_BIT; i++) // unroll for more speed...
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Interleaved bits (aka Morton numbers) are useful for linearizing 2D integer
|
Interleaved bits (aka Morton numbers) are useful for linearizing 2D integer coordinates, so x and y are combined into a single number that can be compared easily and has the property that a number is usually close to another if their x and y values are close.
|
||||||
coordinates, so x and y are combined into a single number that can be
|
|
||||||
compared easily and has the property that a number is usually close to
|
|
||||||
another if their x and y values are close.
|
|
||||||
|
|
||||||
|
|
||||||
<hr>
|
<hr>
|
||||||
|
|
||||||
|
|
||||||
<a name="InterleaveTableLookup">
|
<a name="InterleaveTableLookup">
|
||||||
### Interleave bits by table lookup
|
### Interleave bits by table lookup
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
static const unsigned short MortonTable256[256] =
|
static const unsigned short MortonTable256[256] =
|
||||||
{
|
{
|
||||||
@ -1679,25 +1673,16 @@ MortonTable256[x >> 8] << 16 |
|
|||||||
MortonTable256[x & 0xFF];
|
MortonTable256[x & 0xFF];
|
||||||
|
|
||||||
```
|
```
|
||||||
For more speed, use an additional table with values that are
|
For more speed, use an additional table with values that are MortonTable256 pre-shifted one bit to the left. This second table could then be used for the y lookups, thus reducing the operations by two, but almost doubling the memory required.
|
||||||
MortonTable256 pre-shifted one bit to the left. This second table
|
Extending this same idea, four tables could be used, with two of them pre-shifted by 16 to the left of the previous two, so that we would only need 11 operations total.
|
||||||
could then be used for the y lookups, thus reducing the
|
|
||||||
operations by two, but almost doubling the memory required.
|
|
||||||
Extending this same idea, four tables could be used, with two of them
|
|
||||||
pre-shifted by 16 to the left of the previous two, so that we would
|
|
||||||
only need 11 operations total.
|
|
||||||
<hr>
|
<hr>
|
||||||
|
|
||||||
|
|
||||||
<a name="Interleave64bitOps">
|
<a name="Interleave64bitOps">
|
||||||
### Interleave bits with 64-bit multiply</a>
|
### Interleave bits with 64-bit multiply
|
||||||
|
</a>
|
||||||
|
|
||||||
|
In 11 operations, this version interleaves bits of two bytes (rather than shorts, as in the other versions), but many of the operations are 64-bit multiplies so it isn't appropriate for all machines. The input parameters, x and y, should be less than 256.
|
||||||
In 11 operations, this version interleaves bits of two bytes
|
|
||||||
(rather than shorts, as in the other versions),
|
|
||||||
but many of the operations are 64-bit multiplies
|
|
||||||
so it isn't appropriate for all machines. The input parameters, x and y,
|
|
||||||
should be less than 256.
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
unsigned char x; // Interleave bits of (8-bit) x and y, so that all of the
|
unsigned char x; // Interleave bits of (8-bit) x and y, so that all of the
|
||||||
@ -1710,13 +1695,10 @@ z = ((x * 0x0101010101010101ULL & 0x8040201008040201ULL) *
|
|||||||
0x0102040810204081ULL >> 48) & 0xAAAA;
|
0x0102040810204081ULL >> 48) & 0xAAAA;
|
||||||
```
|
```
|
||||||
|
|
||||||
Holger Bettag was inspired to suggest this technique on
|
Holger Bettag was inspired to suggest this technique on October 10, 2004 after reading the multiply-based bit reversals here.
|
||||||
October 10, 2004 after reading the multiply-based bit reversals here.
|
|
||||||
|
|
||||||
|
|
||||||
<hr>
|
<hr>
|
||||||
|
|
||||||
|
|
||||||
<a name="InterleaveBMN">
|
<a name="InterleaveBMN">
|
||||||
### Interleave bits by Binary Magic Numbers
|
### Interleave bits by Binary Magic Numbers
|
||||||
</a>
|
</a>
|
||||||
@ -1744,25 +1726,19 @@ y = (y | (y << S[0])) & B[0];
|
|||||||
z = x | (y << 1);
|
z = x | (y << 1);
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
<hr>
|
<hr>
|
||||||
|
|
||||||
|
|
||||||
<a name="ZeroInWord">
|
<a name="ZeroInWord">
|
||||||
### Determine if a word has a zero byte
|
### Determine if a word has a zero byte
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
// Fewer operations:
|
// Fewer operations:
|
||||||
unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
|
unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
|
||||||
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
|
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
|
||||||
```
|
```
|
||||||
|
|
||||||
The code above may be useful when doing a fast string copy in which a word
|
The code above may be useful when doing a fast string copy in which a word is copied at a time; it uses 5 operations. On the other hand, testing for a null byte in the obvious ways (which follow) have at least 7 operations (when counted in the most sparing way), and at most 12.
|
||||||
is copied at a time; it uses 5 operations.
|
|
||||||
On the other hand, testing for a null byte in the obvious ways (which follow)
|
|
||||||
have at least 7 operations (when counted in the most sparing way), and at most 12.
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
// More operations:
|
// More operations:
|
||||||
@ -1772,24 +1748,7 @@ unsigned char * p = (unsigned char *) &v;
|
|||||||
bool hasNoZeroByte = *p && *(p + 1) && *(p + 2) && *(p + 3);
|
bool hasNoZeroByte = *p && *(p + 1) && *(p + 2) && *(p + 3);
|
||||||
```
|
```
|
||||||
|
|
||||||
The code at the beginning of this section (labeled "Fewer operations")
|
The code at the beginning of this section (labeled "Fewer operations") works by first zeroing the high bits of the 4 bytes in the word. Subsequently, it adds a number that will result in an overflow to the high bit of a byte if any of the low bits were initialy set. Next the high bits of the original word are ORed with these values; thus, the high bit of a byte is set iff any bit in the byte was set. Finally, we determine if any of these high bits are zero by ORing with ones everywhere except the high bits and inverting the result. Extending to 64 bits is trivial; simply increase the constants to be `0x7F7F7F7F7F7F7F7F`. For an additional improvement, a fast pretest that requires only 4 operations may be performed to determine if the word <em>may</em> have a zero byte. The test also returns true if the high byte is 0x80, so there are occasional false positives, but the slower and more reliable version above may then be used on candidates for an overall increase in speed with correct output.
|
||||||
works by first zeroing the high bits of the 4 bytes in the word.
|
|
||||||
Subsequently, it adds a number that will result in an overflow to
|
|
||||||
the high bit of a byte if any of the low bits were initialy set.
|
|
||||||
Next the high bits of the original word are ORed with these values;
|
|
||||||
thus, the high bit of a byte is set iff any bit in the byte was set.
|
|
||||||
Finally, we determine if any of these high bits are zero by ORing with
|
|
||||||
ones everywhere except the high bits and inverting the result.
|
|
||||||
Extending to 64 bits is trivial; simply increase the constants to be
|
|
||||||
0x7F7F7F7F7F7F7F7F.
|
|
||||||
|
|
||||||
For an additional improvement, a fast pretest that requires only 4 operations
|
|
||||||
may be performed to determine if the word <em>may</em> have a zero byte.
|
|
||||||
The test also returns true if the high byte is 0x80, so there are
|
|
||||||
occasional false positives, but the slower and more reliable version
|
|
||||||
above may then be used on candidates for an overall increase in speed with
|
|
||||||
correct output.
|
|
||||||
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
bool hasZeroByte = ((v + 0x7efefeff) ^ ~v) & 0x81010100;
|
bool hasZeroByte = ((v + 0x7efefeff) ^ ~v) & 0x81010100;
|
||||||
@ -1800,71 +1759,37 @@ if (hasZeroByte) // or may just have 0x80 in the high byte
|
|||||||
```
|
```
|
||||||
|
|
||||||
There is yet a faster method —
|
There is yet a faster method —
|
||||||
use <a href="http://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord">`hasless`</a>(v, 1),
|
use [hasless](http://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord)(v, 1), which is defined below; it works in 4 operations and requires no subsquent verification. It simplifies to
|
||||||
which is defined below; it
|
|
||||||
works in 4 operations and requires no subsquent verification. It simplifies
|
|
||||||
to
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)
|
#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)
|
||||||
```
|
```
|
||||||
|
|
||||||
The subexpression (v - 0x01010101UL), evaluates to a high bit set in any
|
The subexpression `(v - 0x01010101UL)`, evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than `0x80`. The sub-expression `~v & 0x80808080UL` evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than 0x80). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than `0x80` in the first sub-expression are masked off by the second. Paul Messmer suggested the fast pretest improvement on October 2, 2004. Juha Järvi later suggested `hasless(v, 1)` on April 6, 2005, which he found on [Paul Hsieh's Assembly Lab](http://www.azillionmonkeys.com/qed/asmexample.html); previously it was written in a newsgroup post on April 27, 1987 by Alan Mycroft.
|
||||||
byte whenever the corresponding byte in v is zero or greater than 0x80.
|
|
||||||
The sub-expression ~v & 0x80808080UL
|
|
||||||
evaluates to high bits set in bytes where the byte of v doesn't have its high
|
|
||||||
bit set (so the byte was less than 0x80). Finally, by ANDing these two
|
|
||||||
sub-expressions the result is the high bits set where the bytes in v
|
|
||||||
were zero, since the high bits set due to a value greater than 0x80
|
|
||||||
in the first sub-expression are masked off by the second.
|
|
||||||
|
|
||||||
Paul Messmer suggested the fast pretest improvement on October 2, 2004.
|
|
||||||
Juha Järvi later suggested `hasless(v, 1)`
|
|
||||||
on April 6, 2005, which
|
|
||||||
he found on <a href="http://www.azillionmonkeys.com/qed/asmexample.html">Paul
|
|
||||||
Hsieh's Assembly Lab</a>; previously it was written in a newsgroup post
|
|
||||||
on April 27, 1987 by Alan Mycroft.
|
|
||||||
|
|
||||||
|
|
||||||
<hr>
|
<hr>
|
||||||
|
|
||||||
|
|
||||||
<a name="#ValueInWord">
|
<a name="#ValueInWord">
|
||||||
### Determine if a word has a byte equal to n
|
### Determine if a word has a byte equal to n
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
We may want to know if any byte in a word has a specific value. To do so, we can XOR the value to test with a word that has been filled with the byte values in which we're interested. Because XORing a value with itself results in a zero byte and nonzero otherwise, we can pass the result to `haszero`.
|
||||||
We may want to know if any byte in a word has a specific value. To do so,
|
|
||||||
we can XOR the value to test with a word that has been filled with the
|
|
||||||
byte values in which we're interested. Because XORing a value with itself
|
|
||||||
results in a zero byte and nonzero otherwise, we can pass the result to
|
|
||||||
`haszero`.
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
#define hasvalue(x,n) \
|
#define hasvalue(x,n) \
|
||||||
(haszero((x) ^ (~0UL/255 * (n))))
|
(haszero((x) ^ (~0UL/255 * (n))))
|
||||||
```
|
```
|
||||||
|
|
||||||
Stephen M Bennet suggested this on December 13, 2009 after reading the entry
|
Stephen M Bennet suggested this on December 13, 2009 after reading the entry for `haszero`.
|
||||||
for `haszero`.
|
|
||||||
|
|
||||||
|
|
||||||
<hr>
|
<hr>
|
||||||
|
|
||||||
|
|
||||||
<a name="HasLessInWord">
|
<a name="HasLessInWord">
|
||||||
### Determine if a word has a byte less than n
|
### Determine if a word has a byte less than n
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
Test if a word x contains an unsigned byte with value < n. Specifically for n=1, it can be used to find a 0-byte by examining one long at a time, or any byte by XORing x with a mask first. Uses 4 arithmetic/logical operations when n is constant.
|
||||||
|
Requirements: `x>=0; 0<=n<=128`
|
||||||
Test if a word x contains an unsigned byte with value < n.
|
|
||||||
Specifically for n=1, it can be used to find a 0-byte by examining one
|
|
||||||
long at a time, or any byte by XORing x with a mask first.
|
|
||||||
Uses 4 arithmetic/logical operations when n is constant.
|
|
||||||
|
|
||||||
Requirements: x>=0; 0<=n<=128
|
|
||||||
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
#define hasless(x,n) (((x)-~0UL/255*(n))&~(x)&~0UL/255*128)
|
#define hasless(x,n) (((x)-~0UL/255*(n))&~(x)&~0UL/255*128)
|
||||||
@ -1875,24 +1800,16 @@ To count the number of bytes in x that are less than n in 7 operations, use
|
|||||||
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
|
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Juha Järvi sent this clever technique to me on April 6, 2005. The `countless` macro was added by Sean Anderson on April 10, 2005, inspired by Juha's `countmore`, below.
|
||||||
Juha Järvi sent this clever technique to me on April 6, 2005. The
|
|
||||||
`countless` macro was added by Sean Anderson on
|
|
||||||
April 10, 2005, inspired by Juha's `countmore`, below.
|
|
||||||
|
|
||||||
|
|
||||||
<hr>
|
<hr>
|
||||||
|
|
||||||
|
|
||||||
<a name="HasMoreInWord">
|
<a name="HasMoreInWord">
|
||||||
### Determine if a word has a byte greater than n
|
### Determine if a word has a byte greater than n
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
Test if a word x contains an unsigned byte with `value > n`. Uses 3 arithmetic/logical operations when n is constant.
|
||||||
Test if a word x contains an unsigned byte with value > n.
|
Requirements: `x>=0; 0<=n<=127`
|
||||||
Uses 3 arithmetic/logical operations when n is constant.
|
|
||||||
|
|
||||||
Requirements: x>=0; 0<=n<=127
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
#define hasmore(x,n) (((x)+~0UL/255*(127-(n))|(x))&~0UL/255*128)
|
#define hasmore(x,n) (((x)+~0UL/255*(127-(n))|(x))&~0UL/255*128)
|
||||||
@ -1903,67 +1820,41 @@ To count the number of bytes in x that are more than n in 6 operations, use:
|
|||||||
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)
|
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)
|
||||||
```
|
```
|
||||||
|
|
||||||
The macro `hasmore` was suggested by Juha Järvi on
|
The macro `hasmore` was suggested by Juha Järvi on April 6, 2005, and he added `countmore` on April 8, 2005.
|
||||||
April 6, 2005, and he added `countmore` on April 8, 2005.
|
|
||||||
|
|
||||||
|
|
||||||
<hr>
|
<hr>
|
||||||
|
|
||||||
|
|
||||||
<a name="HasBetweenInWord">
|
<a name="HasBetweenInWord">
|
||||||
### Determine if a word has a byte between m and n
|
### Determine if a word has a byte between m and n
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
When `m < n`, this technique tests if a word x contains an unsigned byte value, such that `m < value < n`. It uses 7 arithmetic/logical operations when n and m are constant. Note: Bytes that equal n can be reported by `likelyhasbetween`
|
||||||
When m < n, this technique tests if a word x contains an
|
as false positives, so this should be checked by character if a certain result is needed.
|
||||||
unsigned byte value, such that m < value < n.
|
Requirements: `x>=0; 0<=m<=127; 0<=n<=128`
|
||||||
<!--When m > n, it tests for byte values
|
|
||||||
outside the range; that is value < n and m <= value.-->
|
|
||||||
It uses 7 arithmetic/logical operations when n and m are constant.
|
|
||||||
|
|
||||||
Note: Bytes that equal n can be reported by `likelyhasbetween`
|
|
||||||
as false positives,
|
|
||||||
so this should be checked by character if a certain result is needed.
|
|
||||||
|
|
||||||
Requirements: x>=0; 0<=m<=127; 0<=n<=128
|
|
||||||
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
#define likelyhasbetween(x,m,n) \
|
#define likelyhasbetween(x,m,n) \
|
||||||
((((x)-~0UL/255*(n))&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)
|
((((x)-~0UL/255*(n))&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)
|
||||||
```
|
```
|
||||||
This technique would be suitable for a fast pretest. A variation that
|
This technique would be suitable for a fast pretest. A variation that takes one more operation (8 total for constant m and n) but provides the exact answer is:
|
||||||
takes one more operation (8 total for constant m and n)
|
|
||||||
but provides the exact answer is:
|
|
||||||
```c
|
```c
|
||||||
#define hasbetween(x,m,n) \
|
#define hasbetween(x,m,n) \
|
||||||
((~0UL/255*(127+(n))-((x)&~0UL/255*127)&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)
|
((~0UL/255*(127+(n))-((x)&~0UL/255*127)&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)
|
||||||
```
|
```
|
||||||
To count the number of bytes in x that are between m and n (exclusive)
|
To count the number of bytes in x that are between m and n (exclusive) in 10 operations, use:
|
||||||
in 10 operations, use:
|
|
||||||
```c
|
```c
|
||||||
#define countbetween(x,m,n) (hasbetween(x,m,n)/128%255)
|
#define countbetween(x,m,n) (hasbetween(x,m,n)/128%255)
|
||||||
```
|
```
|
||||||
|
|
||||||
Juha Järvi suggested `likelyhasbetween` on April 6, 2005.
|
Juha Järvi suggested `likelyhasbetween` on April 6, 2005. From there, Sean Anderson created `hasbetween` and `countbetween` on April 10, 2005.
|
||||||
From there,
|
|
||||||
Sean Anderson created `hasbetween` and
|
|
||||||
`countbetween` on April 10, 2005.
|
|
||||||
|
|
||||||
|
|
||||||
<hr>
|
<hr>
|
||||||
|
|
||||||
|
|
||||||
<a name="NextBitPermutation">
|
<a name="NextBitPermutation">
|
||||||
### Compute the lexicographically next bit permutation
|
### Compute the lexicographically next bit permutation
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
Suppose we have a pattern of N bits set to 1 in an integer and we want the next permutation of N 1 bits in a lexicographical sense. For example, if N is 3 and the bit pattern is 00010011, the next patterns would be `00010101`, `00010110`, `00011001`, `00011010`, `00011100`, `00100011`, and so forth. The following is a fast way to compute the next permutation.
|
||||||
Suppose we have a pattern of N bits set to 1 in an integer and we want the
|
|
||||||
next permutation of N 1 bits in a lexicographical sense.
|
|
||||||
For example, if N is 3 and the bit pattern is 00010011, the next patterns
|
|
||||||
would be 00010101, 00010110, 00011001,00011010, 00011100, 00100011,
|
|
||||||
and so forth. The following is a fast way to compute the next permutation.
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
unsigned int v; // current permutation of bits
|
unsigned int v; // current permutation of bits
|
||||||
@ -1975,25 +1866,14 @@ unsigned int t = v | (v - 1); // t gets v's least significant 0 bits set to 1
|
|||||||
w = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
|
w = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
|
||||||
```
|
```
|
||||||
|
|
||||||
The __builtin_ctz(v) GNU C compiler intrinsic for x86 CPUs returns the
|
The __builtin_ctz(v) GNU C compiler intrinsic for x86 CPUs returns the number of trailing zeros. If you are using Microsoft compilers for x86, the intrinsic is _BitScanForward. These both emit a bsf instruction, but equivalents may be available for other architectures. If not, then consider using one of the methods for counting the consecutive zero bits
|
||||||
number of trailing zeros. If you are using Microsoft compilers for x86,
|
mentioned earlier. Here is another version that tends to be slower because of its division operator, but it does not require counting the trailing zeros.
|
||||||
the intrinsic is _BitScanForward. These both emit a bsf instruction, but
|
|
||||||
equivalents may be available for other architectures. If not, then
|
|
||||||
consider using one of the methods for counting the consecutive zero bits
|
|
||||||
mentioned earlier.
|
|
||||||
|
|
||||||
Here is another version that tends to be slower because of its
|
|
||||||
division operator, but it does not require counting the trailing zeros.
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
unsigned int t = (v | (v - 1)) + 1;
|
unsigned int t = (v | (v - 1)) + 1;
|
||||||
w = t | ((((t & -t) / (v & -v)) >> 1) - 1);
|
w = t | ((((t & -t) / (v & -v)) >> 1) - 1);
|
||||||
```
|
```
|
||||||
|
|
||||||
Thanks to Dario Sneidermanis of Argentina, who provided this on
|
Thanks to Dario Sneidermanis of Argentina, who provided this on November 28, 2009.
|
||||||
November 28, 2009.
|
|
||||||
|
|
||||||
|
[A Belorussian translation](http://webhostingrating.com/libs/bithacks-be) (provided by [Webhostingrating](http://webhostingrating.com/)) is available.
|
||||||
<a href="http://webhostingrating.com/libs/bithacks-be">
|
|
||||||
A Belorussian translation</a> (provided by <a href="http://webhostingrating.com/">Webhostingrating</a>)
|
|
||||||
is available.
|
|
||||||
|
Loading…
Reference in New Issue
Block a user