diff --git a/README.md b/README.md index 402224f..80080d8 100644 --- a/README.md +++ b/README.md @@ -85,36 +85,12 @@ Individually, the code snippets here are in the public domain (unless otherwise * [Determine if a word has a byte between m and n](#HasBetweenInWord) * [Compute the lexicographically next bit permutation](#NextBitPermutation) -
- -
+ Alternatively, if you prefer the result be either -1 or +1, then use: -
+ ```c sign = +1 | (v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then -1, else +1 ``` -+ On the other hand, if you prefer the result be either -1, 0, or +1, then use: -
+ ```c sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); // Or, for more speed but less portability: @@ -165,15 +141,15 @@ and throughout rather than assuming bytes were 8 bits long. Angus recommended the more portable versions above, involving casting on March 4, 2006. Rohit Garg suggested the version for non-negative integers on September 12, 2009. -- -
-
+
On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C
specification leaves the result of signed right-shift implementation-defined,
so on some systems this hack might not work. I've read that ANSI C does not
@@ -219,7 +195,7 @@ that still use one's complement).
On March 14, 2004, Keith H. Duggar sent me the patented variation above; it is
superior to the one I initially came up with,
-r=(+1|(v>>(sizeof(int)*CHAR_BIT-1)))*v
,
+`r=(+1|(v>>(sizeof(int)*CHAR_BIT-1)))*v`,
because a multiply is not used.
Unfortunately, this method has been
patented in the USA on June 6, 2000 by Vladimir Yu Volkonsky and
@@ -255,15 +231,15 @@ the negative value of v to an unsigned by adding 2**N,
yielding a 2s complement representation of v's value that I'll call U.
Then, U is negated, giving the desired result,
-U = 0 - U = 2**N - U = 2**N - (v+2**N) = -v = abs(v).
-
-
+ To find the maximum, use: -
+ ```c r = x ^ ((x ^ y) & -(x < y)); // max(x, y) ``` @@ -310,7 +286,7 @@ and y should be unsigned or cast to unsigned for the subtractions to avoid unnecessarily throwing an exception, however the right-shift needs a signed operand to produce all one bits when negative, so cast to signed there. -+ On March 7, 2003, Angus Duggan pointed out the right-shift portability issue. On May 3, 2005, Randal E. Bryant alerted me to the need for the @@ -324,15 +300,15 @@ the potential for overflow exceptions with subtractions in r = y + ((x - y) & -(x < y)), which was the previous version. Timothy B. Terriberry suggested using xor rather than add and subract to avoid casting and the risk of overflows on June 2, 2009. -
-
+ John Byrd caught a typo in the code (attributed to html formatting) on May 2, 2005. On March 4, 2006, Pat Wood pointed out that the ANSI C standard requires that the bitfield have the keyword "signed" to be signed; otherwise, the sign is undefined. -
+ A slightly faster but less portable method that doesn't depend on the bits in x above position b being zero is: -
+ ```c int const m = CHAR_BIT * sizeof(x) - b; r = (x << m) >> m; ``` -
+
Sean A. Irvine suggested that I
add sign extension methods to this page on June 13, 2004, and he provided
m = (1 << (b - 1)) - 1; r = -(x & ~m) | x;
@@ -442,15 +418,15 @@ other than the b bits we wanted to sign-extend on Oct. 15, 2008.
On December 31, 2009 Chris Pirazzi suggested I add the faster version,
which requires two operations for constant bit-widths and three
for variable widths.
-
-
-
-
- -
+ Ron Jeffery sent this to me on February 9, 2006. -
- -
-
+ On July 14, 2009 Hallvard Furuseth suggested the macro compacted table. -
-
+ Published in 1988, the C Programming Language 2nd Ed. (by Brian W. Kernighan and Dennis M. Ritchie) mentions this in exercise 2-9. @@ -682,15 +658,15 @@ On April 19, 2006 Don Knuth pointed out to me that this method "was first published by Peter Wegner in CACM 3 (1960), 322. (Also discovered independently by Derrick Lehmer and published in 1964 in a book edited by Beckenbach.)" -
-
+ Rich Schroeppel originally created a 9-bit version, similiar to option 1; see the Programming Hacks section of @@ -725,21 +701,21 @@ devised by Sean Anderson. Randal E. Bryant offered a couple bug fixes on May 3, 2005. Bruce Dawson tweaked what had been a 12-bit version and made it suitable for 14 bits using the same number of operations on Feburary 1, 2007. -
+ -
-
+ The best method for counting bits in a 32-bit integer v is the following: -
+ ```c v = v - ((v >> 1) & 0x55555555); // reuse input as temporary v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count ``` -+ The best bit counting method takes only 12 operations, which is the same as the lookup-table method, but avoids the memory and potential cache misses of a table. @@ -786,10 +762,10 @@ with 64-bit instructions), though it doesn't use 64-bit instructions. The counts of bits set in the bytes is done in parallel, and the sum total of the bits set in the bytes is computed by multiplying by 0x1010101 and shifting right 24 bits. -
+ A generalization of the best bit counting method to integers of bit-widths upto 128 (parameterized by type T) is this: -
+ ```c v = v - ((v >> 1) & (T)~(T)0/3); // temp v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3); // temp @@ -797,7 +773,7 @@ v = (v + (v >> 4)) & (T)~(T)0/255*15; // temp c = (T)(v * ((T)~(T)0/255)) >> (sizeof(T) - 1) * CHAR_BIT; // count ``` -+ See Ian Ashdown's nice newsgroup post for more information on counting the number of bits set (also known as sideways addition). @@ -813,15 +789,15 @@ Eric Cole spotted on January 8, 2006. Eric later suggested the arbitrary bit-width generalization to the best method on November 17, 2006. On April 5, 2007, Al Williams observed that I had a line of dead code at the top of the first method. -
-
-
+ Juha Järvi sent this to me on November 21, 2009. -
-
-
-
-
+ Andrew Shapira came up with this and sent it to me on Sept. 2, 2007. -
+ Thanks to Mathew Hendry for pointing out the shift-lookup idea at the end on Dec. 15, 2002. That optimization shaves two operations off using only shifting and XORing to find the parity. -
- -
+ Sanjeev Sivasankaran suggested I add this on June 12, 2007. Vincent Lefèvre pointed out the potential for overflow exceptions on July 9, 2008 -
+ On January 20, 2005, Iain A. Fleming pointed out that the macro above doesn't work when you swap with the same memory location, such as SWAP(a[i], a[j]) with i == j. So if that may occur, consider @@ -1106,15 +1082,15 @@ defining the macro as On July 14, 2009, Hallvard Furuseth suggested that on some machines, (((a) ^ (b)) && ((b) ^= (a) ^= (b), (a) ^= (b))) might be faster, since the (a) ^ (b) expression is reused. -
-
+ This method of swapping is similar to the general purpose XOR swap trick, but intended for operating on individual bits. The variable x stores the result of XORing the pairs of bit values we want to swap, and then the bits are set to the result of themselves XORed with x. Of course, the result is undefined if the sequences overlap. -
+ On July 14, 2009 Hallvard Furuseth suggested that I change the 1 << n to 1U << n because the value was being assigned to an unsigned and to avoid shifting into a sign bit. -
-
+ On October 15, 2004, Michael Hoisie pointed out a bug in the original version. Randal E. Bryant suggested removing an extra operation on May 3, 2005. @@ -1173,14 +1149,14 @@ Behdad Esfabod suggested a slight change that eliminated one iteration of the loop on May 18, 2005. Then, on February 6, 2007, Liyong Zhou suggested a better version that loops while v is not 0, so rather than iterating over all bits it stops early. -
-
+ On July 14, 2009 Hallvard Furuseth suggested the macro compacted table. -
-
+ This method was attributed to Rich Schroeppel in the Programming Hacks section of Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. -
-
a, b, c, d, e, f, g,
and h
, which
+`a, b, c, d, e, f, g,` and `h`, which
comprise an 8-bit byte. Notice how the first multiply fans out the
bit pattern to multiple copies, while the last multiply combines them
in the fifth byte from the right.
@@ -1303,18 +1279,18 @@ Note that the last two steps can be combined on some processors because
the registers can be accessed as bytes;
just multiply so that a register stores the upper 32 bits of the result
and the take the low byte. Thus, it may take only 6 operations.
-+ Devised by Sean Anderson, July 13, 2001. -
- -
-
+ See Dr. Dobb's Journal 1983, Edwin Freed's article on Binary Magic Numbers for more information. The second variation was suggested @@ -1374,15 +1350,15 @@ by Ken Raeburn on September 13, 2005. Veldmeijer mentioned that the first version could do without ANDS in the last line on March 19, 2006. -
-
-
-
+
Devised by Sean Anderson, August 15, 2001. Before Sean A. Irvine corrected me
on June 17, 2004, I mistakenly commented that we could alternatively assign
-m = ((m + 1) & d) - 1;
at the end. Michael Miller spotted a
+`m = ((m + 1) & d) - 1;` at the end. Michael Miller spotted a
typo in the code April 25, 2005.
-
-
+ It finds the result by summing the values in base (1 << s) in parallel. First every other base (1 << s) value is added to the previous one. Imagine that the result is written on a piece of paper. Cut the paper @@ -1544,29 +1520,29 @@ cuts, we cut no more; just continue to add the values and put the result onto a new piece of paper as before, while there are at least two s-bit values. -
+
Devised by Sean Anderson, August 20, 2001. A typo was spotted by
Randy E. Bryant on May 3, 2005 (after pasting the code, I had later
added "unsinged" to a variable declaration). As in the previous hack,
I mistakenly commented that we could alternatively assign
-m = ((m + 1) & d) - 1;
at the end, and Don Knuth corrected
+`m = ((m + 1) & d) - 1;` at the end, and Don Knuth corrected
me on April 19, 2006 and suggested
-m = m & -((signed)(m - d) >> s)
.
+`m = m & -((signed)(m - d) >> s)`.
On June 18, 2009 Sean Irvine proposed a change that used
-((n >> s) & M[s])
instead of
-((n & ~M[s]) >> s)
,
+`((n >> s) & M[s])` instead of
+`((n & ~M[s]) >> s)`,
which typically requires fewer operations because the M[s] constant is already
loaded.
-
-
-
+ Eric Cole sent me this on January 15, 2006. Evan Felix pointed out a typo on April 4, 2006. Vincent Lefèvre told me on July 9, 2008 to change the endian check to use the float's endian, which could differ from the integer's endian. -
-
+ The code above is tuned to uniformly distributed output values. If your inputs are evenly distributed across all 32-bit values, then consider using the following: -
+ ```c if (tt = v >> 24) { @@ -1697,11 +1673,11 @@ distributed input values was suggested by David A. Butterfield on September -1 to indicate an error, so I changed the first entry in the table to that.+ The second version was sent to me by Eric Cole on January 7, 2006. Andrew Shapira subsequently trimmed a few operations @@ -1767,14 +1743,14 @@ using smaller numbers for b[], which load faster on some architectures may be needed). These values work for the general version, but not for the special-case version below it, where v is a power of 2; Glenn Slayden brought this oversight to my attention on December 12, 2003. -
-
+ If you know that v is a power of 2, then you only need the following: -
+ ```c static const int MultiplyDeBruijnBitPosition2[32] = { @@ -1811,7 +1787,7 @@ static const int MultiplyDeBruijnBitPosition2[32] = r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27]; ``` -+ Eric Cole devised this January 8, 2006 after reading about the entry below to round up to a power of 2 and the method below for @@ -1820,15 +1796,15 @@ with a multiply and lookup using a DeBruijn sequence. On December 10, 2009, Mark Dickinson shaved off a couple operations by requiring v be rounded up to one less than the next power of 2 rather than the power of 2. -
-
+ This method takes 6 more operations than IntegerLogBase2. It may be sped up (on machines with fast memory access) by modifying the log base 2 table-lookup method above so that the entries hold what is computed for t (that is, pre-add, -mulitply, and -shift). Doing so would require a total of only 9 operations to find the log base 10, assuming 4 tables were used (one for each byte of v). -
+ Eric Cole suggested I add a version of this on January 7, 2006. -
-
+ On April 18, 2007, Emanuel Hoogeveen suggested a variation on this where the conditions used divisions, which were not as fast as simple comparisons. -
-
+ On June 11, 2005, Falk Hüffner pointed out that ISO C99 6.5/7 left the type punning idiom *(int *)& undefined, and he suggested using memcpy. -
-
+ Jim Cole suggested I add a linear-time method for counting the trailing zeros on August 15, 2007. On October 22, 2007, Jason Cunningham pointed out that I had neglected to paste the unsigned modifier for v. -
+ Bill Burdick suggested an optimization, reducing the time from 4 * lg(N) on February 4, 2011. -
-
+ Matt Whitlock suggested this on January 25, 2006. Andrew Shapira shaved a couple operations off on Sept. 5, 2007 (by setting c=1 and unconditionally subtracting at the end). -
-
-
-
+ On October 8, 2005 Andrew Shapira suggested I add this. Dustin Spicuzza asked me on April 14, 2009 to cast the result of the multiply to a 32-bit type so it would work when compiled with 64-bit ints. -
-
+ Quick and dirty version, for domain of 1 < v < (1<<25): -
+ ```c float f = (float)(v - 1); r = 1U << ((*(unsigned int*)(&f) >> 23) - 126); @@ -2221,20 +2197,20 @@ it is roughly three times slower than the (which involves 12 operations) when benchmarked on an Athlon™ XP 2100+ CPU. Some CPUs will fare better with it, though. -+ On September 27, 2005 Andi Smithers suggested I include a technique for casting to floats to find the lg of a number for rounding up to a power of 2. Similar to the quick and dirty version here, his version worked with values less than (1<<25), due to mantissa rounding, but it used one more operation. -
-
+ You might alternatively compute the next higher power of 2 in only 8 or 9 operations using a lookup table for floor(lg(v)) and then evaluating 1<<(1+floor(lg(v))); Atul Divekar suggested I mention this on September 5, 2010. -
+ Devised by Sean Anderson, Sepember 14, 2001. Pete Hart pointed me to a couple newsgroup posts by him and William Lewis in February of 1997, where they arrive at the same algorithm. -
-
-
-
+ For an additional improvement, a fast pretest that requires only 4 operations may be performed to determine if the word may have a zero byte. The test also returns true if the high byte is 0x80, so there are occasional false positives, but the slower and more reliable version above may then be used on candidates for an overall increase in speed with correct output. -
-
+ + ```c bool hasZeroByte = ((v + 0x7efefeff) ^ ~v) & 0x81010100; if (hasZeroByte) // or may just have 0x80 in the high byte @@ -2473,13 +2449,13 @@ if (hasZeroByte) // or may just have 0x80 in the high byte hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F); } ``` -
+
There is yet a faster method —
-use hasless
(v, 1),
+use `hasless`(v, 1),
which is defined below; it
works in 4 operations and requires no subsquent verification. It simplifies
to
-
+
Paul Messmer suggested the fast pretest improvement on October 2, 2004.
-Juha Järvi later suggested hasless(v, 1)
+Juha Järvi later suggested `hasless(v, 1)`
on April 6, 2005, which
he found on Paul
Hsieh's Assembly Lab; previously it was written in a newsgroup post
on April 27, 1987 by Alan Mycroft.
-
-
haszero
.
+`haszero`.
```c
#define hasvalue(x,n) \
(haszero((x) ^ (~0UL/255 * (n))))
```
-
+
Stephen M Bennet suggested this on December 13, 2009 after reading the entry
-for haszero
.
-
+for `haszero`. -
+ Requirements: x>=0; 0<=n<=128 -
+ ```c #define hasless(x,n) (((x)-~0UL/255*(n))&~(x)&~0UL/255*128) ``` @@ -2546,24 +2522,24 @@ To count the number of bytes in x that are less than n in 7 operations, use (((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255) ``` -
+
Juha Järvi sent this clever technique to me on April 6, 2005. The
-countless
macro was added by Sean Anderson on
-April 10, 2005, inspired by Juha's countmore
, below.
-
+`countless` macro was added by Sean Anderson on +April 10, 2005, inspired by Juha's `countmore`, below. -
+ Requirements: x>=0; 0<=n<=127 -
+ ```c #define hasmore(x,n) (((x)+~0UL/255*(127-(n))|(x))&~0UL/255*128) ``` @@ -2572,31 +2548,31 @@ To count the number of bytes in x that are more than n in 6 operations, use: #define countmore(x,n) \ (((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255) ``` -
-The macro hasmore
was suggested by Juha Järvi on
-April 6, 2005, and he added countmore
on April 8, 2005.
-
-
-Note: Bytes that equal n can be reported by likelyhasbetween
+
+Note: Bytes that equal n can be reported by `likelyhasbetween`
as false positives,
so this should be checked by character if a certain result is needed.
-
+ Requirements: x>=0; 0<=m<=127; 0<=n<=128 -
-
+ + ```c #define likelyhasbetween(x,m,n) \ ((((x)-~0UL/255*(n))&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128) @@ -2613,19 +2589,19 @@ in 10 operations, use: ```c #define countbetween(x,m,n) (hasbetween(x,m,n)/128%255) ``` -
-Juha Järvi suggested likelyhasbetween
on April 6, 2005.
+
+Juha Järvi suggested `likelyhasbetween` on April 6, 2005.
From there,
-Sean Anderson created hasbetween
and
-countbetween
on April 10, 2005.
-
+Sean Anderson created `hasbetween` and +`countbetween` on April 10, 2005. -
+ Here is another version that tends to be slower because of its division operator, but it does not require counting the trailing zeros. -
+ ```c unsigned int t = (v | (v - 1)) + 1; w = t | ((((t & -t) / (v & -v)) >> 1) - 1); ``` -+ Thanks to Dario Sneidermanis of Argentina, who provided this on November 28, 2009. -
+ A Belorussian translation (provided by Webhostingrating) is available.