Update README.md

This commit is contained in:
周伯威 Po-Wei Chou 2016-05-05 17:31:14 -04:00
parent de38c8c162
commit e9818518be

231
README.md
View File

@ -87,11 +87,16 @@ Individually, the code snippets here are in the public domain (unless otherwise
---
<a name="OperationCounting">
### About the operation counting methodology
</a>
When totaling the number of operations for algorithms here, any C operator is counted as one operation. Intermediate assignments, which need not be written to RAM, are not counted. Of course, this operation counting approach only serves as an approximation of the actual number of machine instructions and CPU time. All operations are assumed to take the same amount of time, which is not true in reality, but CPUs have been heading increasingly in this direction over time. There are many nuances that determine how fast a system will run a given sample of code, such as cache sizes, memory bandwidths, instruction sets, etc. In the end, benchmarking is the best way to determine whether one method is really faster than another, so consider the techniques below as possibilities to test on your target architecture.
<a name="CopyIntegerSign">
### Compute the sign of an integer
</a>
```c
int v; // we want to find the sign of v
int sign; // the result goes here
@ -128,33 +133,25 @@ sign = 1 ^ ((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then 0
Caveat: On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. For greater portability, Toby Speight suggested on September 28, 2005 that CHAR_BIT be used here and throughout rather than assuming bytes were 8 bits long. Angus recommended the more portable versions above, involving casting on March 4, 2006. [Rohit Garg](http://rpg-314.blogspot.com/) suggested the version for non-negative integers on September 12, 2009.
<hr>
<a name="DetectOppositeSigns">
### Detect if two integers have opposite signs
</a>
```c
int x, y; // input values to compare signs
bool f = ((x ^ y) < 0); // true iff x and y have opposite signs
```
Manfred Weis suggested I add this entry on November 26, 2009.
<hr>
<a name="IntegerAbs">
### Compute the integer absolute value (abs) without branching
</a>
```c
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
@ -162,72 +159,34 @@ int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v + mask) ^ mask;
```
Patented variation:
```c
r = (v ^ mask) - mask;
```
Some CPUs don't have an integer absolute value instruction (or the
compiler fails to use them). On machines where branching is expensive,
the above expression can be faster than the obvious approach,
r = (v < 0) ? -(unsigned)v : v, even though the number of operations
is the same.
Some CPUs don't have an integer absolute value instruction (or the compiler fails to use them). On machines where branching is expensive, the above expression can be faster than the obvious approach, `r = (v < 0) ? -(unsigned)v : v`, even though the number of operations is the same.
On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C
specification leaves the result of signed right-shift implementation-defined,
so on some systems this hack might not work. I've read that ANSI C does not
require values to be represented as two's complement, so it may not work
for that reason as well (on a diminishingly small number of old machines
that still use one's complement).
On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. I've read that ANSI C does not require values to be represented as two's complement, so it may not work for that reason as well (on a diminishingly small number of old machines that still use one's complement).
On March 14, 2004, Keith H. Duggar sent me the patented variation above; it is
superior to the one I initially came up with,
`r=(+1|(v>>(sizeof(int)*CHAR_BIT-1)))*v`,
because a multiply is not used.
Unfortunately, this method has been <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/search-adv.htm&r=1&f=G&l=50&d=ptxt&S1=6073150&OS=6073150&RS=6073150">
patented</a> in the USA on June 6, 2000 by Vladimir Yu Volkonsky and
assigned to <a href="http://www.sun.com/">Sun Microsystems</a>.
On March 14, 2004, Keith H. Duggar sent me the patented variation above; it is superior to the one I initially came up with, `r=(+1|(v>>(sizeof(int)*CHAR_BIT-1)))*v`, because a multiply is not used. Unfortunately, this method has been [patented](http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/search-adv.htm&r=1&f=G&l=50&d=ptxt&S1=6073150&OS=6073150&RS=6073150) in the USA on June 6, 2000 by Vladimir Yu Volkonsky and assigned to [Sun Microsystems](http://www.sun.com/).
On August 13, 2006, Yuriy
Kaminskiy told me that the patent is likely invalid because the method
was published well before the patent was even filed, such as in
<a href="http://www.goof.com/pcg/doc/pentopt.txt">How to Optimize for
the Pentium Processor</a> by Agner Fog, dated November, 9, 1996. Yuriy also
mentioned that this document was translated to Russian in 1997, which
Vladimir could have read. Moreover, the Internet Archive also has an old
<a href="http://web.archive.org/web/19961201174141/www.x86.org/ftp/articles/pentopt/PENTOPT.TXT">link</a> to it.
On August 13, 2006, Yuriy Kaminskiy told me that the patent is likely invalid because the method was published well before the patent was even filed, such as in [How to Optimize for the Pentium Processor](http://www.goof.com/pcg/doc/pentopt.txt) by Agner Fog, dated November, 9, 1996. Yuriy also mentioned that this document was translated to Russian in 1997, which
Vladimir could have read. Moreover, the Internet Archive also has an old [link](http://web.archive.org/web/19961201174141/www.x86.org/ftp/articles/pentopt/PENTOPT.TXT) to it.
On January 30, 2007, Peter Kankowski shared with me an
<a href="http://smallcode.weblogs.us/2007/01/31/microsoft-probably-uses-the-abs-function-patented-by-sun/">abs version</a>
he discovered that was inspired by Microsoft's Visual C++ compiler output.
It is featured here as the primary solution.
On January 30, 2007, Peter Kankowski shared with me an [abs version](http://smallcode.weblogs.us/2007/01/31/microsoft-probably-uses-the-abs-function-patented-by-sun) he discovered that was inspired by Microsoft's Visual C++ compiler output. It is featured here as the primary solution.
On December 6, 2007, Hai Jin complained that the result was signed, so when
computing the abs of the most negative value, it was still negative.
On April 15, 2008 Andrew Shapira pointed out that the obvious approach
could overflow, as it lacked an (unsigned) cast then;
for maximum portability he suggested
<code>(v < 0) ? (1 + ((unsigned)(-1-v))) : (unsigned)v</code>.
But citing the ISO C99 spec on July 9, 2008,
Vincent Lefèvre convinced me
to remove it becasue even on non-2s-complement machines -(unsigned)v
will do the right thing. The evaluation of -(unsigned)v first converts
the negative value of v to an unsigned by adding 2**N,
yielding a 2s complement representation of v's value that I'll call U.
Then, U is negated, giving the desired result,
-U = 0 - U = 2**N - U = 2**N - (v+2**N) = -v = abs(v).
On December 6, 2007, Hai Jin complained that the result was signed, so when computing the abs of the most negative value, it was still negative. On April 15, 2008 Andrew Shapira pointed out that the obvious approach
could overflow, as it lacked an (unsigned) cast then; for maximum portability he suggested `(v < 0) ? (1 + ((unsigned)(-1-v))) : (unsigned)v`
But citing the ISO C99 spec on July 9, 2008, Vincent Lefèvre convinced me to remove it becasue even on non-2s-complement machines `-(unsigned)v` will do the right thing. The evaluation of `-(unsigned)v` first converts the negative value of v to an unsigned by adding 2\*\*N, yielding a 2s complement representation of v's value that I'll call U. Then, U is negated, giving the desired result, `-U = 0 - U = 2**N - U = 2**N - (v+2**N) = -v = abs(v)`.
<hr>
<a name="IntegerMinOrMax">
### Compute the minimum (min) or maximum (max) of two integers without branching
</a>
```c
int x; // we want to find the minimum of x and y
int y;
@ -236,20 +195,9 @@ int r; // the result goes here
r = y ^ ((x ^ y) & -(x < y)); // min(x, y)
```
On some rare machines where branching is very expensive and no condition
move instructions exist, the above expression
might be faster than the obvious approach, r = (x < y) ? x : y, even
though it involves two more instructions.
(Typically, the obvious approach is best, though.)
It works because if x&nbsp;<&nbsp;y,
then -(x&nbsp;<&nbsp;y) will be all ones,
so r&nbsp;= y ^ (x ^ y) & ~0 = y ^ x ^ y = x.
Otherwise, if x&nbsp;>=&nbsp;y,
then -(x&nbsp;<&nbsp;y) will be all zeros,
so r&nbsp;= y ^ ((x ^ y) & 0) = y.
On some machines, evaluating (x < y) as 0
or 1 requires a branch instruction, so there may be no
advantage.
On some rare machines where branching is very expensive and no condition move instructions exist, the above expression
might be faster than the obvious approach, `r = (x < y) ? x : y`, even though it involves two more instructions.
(Typically, the obvious approach is best, though.) It works because if `x < y`, then `-(x < y)` will be all ones, so `r = y ^ (x ^ y) & ~0 = y ^ x ^ y = x`. Otherwise, if `x >= y`, then `-(x < y)` will be all zeros, so `r = y ^ ((x ^ y) & 0) = y`. On some machines, evaluating `(x < y)` as 0 or 1 requires a branch instruction, so there may be no advantage.
To find the maximum, use:
@ -257,46 +205,24 @@ To find the maximum, use:
r = x ^ ((x ^ y) & -(x < y)); // max(x, y)
```
<h4>Quick and dirty versions:</h4>
#### Quick and dirty versions:
If you know that INT_MIN <= x - y <= INT_MAX,
then you can use the following, which
are faster because (x - y) only needs to be evaluated once.
If you know that `INT_MIN <= x - y <= INT_MAX`, then you can use the following, which are faster because `(x - y)` only needs to be evaluated once.
```c
r = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)
r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)
```
Note that the 1989 ANSI C specification doesn't specify the result of
signed right-shift, so these aren't portable.
If exceptions are thrown on overflows, then the values of x
and y should be unsigned or cast to unsigned for the subtractions
to avoid unnecessarily throwing an exception, however the right-shift
needs a signed operand to produce all one bits when negative, so cast
to signed there.
On March 7, 2003, Angus Duggan pointed out the right-shift portability issue.
On May 3, 2005, Randal E. Bryant alerted me to the need for the
precondition, INT_MIN <= x&nbsp;-&nbsp;y <= INT_MAX,
and suggested the non-quick and dirty version as a fix.
Both of these issues concern only the quick and dirty version.
Nigel Horspoon observed on July 6, 2005 that gcc produced the
same code on a Pentium as the obvious solution because of how it
evaluates (x < y). On July 9, 2008 Vincent Lefèvre pointed out
the potential for overflow exceptions with subtractions in
r = y + ((x - y) & -(x < y)), which was the previous version.
Timothy B. Terriberry suggested using xor rather than add and subract
to avoid casting and the risk of overflows on June 2, 2009.
Note that the 1989 ANSI C specification doesn't specify the result of signed right-shift, so these aren't portable. If exceptions are thrown on overflows, then the values of x and y should be unsigned or cast to unsigned for the subtractions to avoid unnecessarily throwing an exception, however the right-shift needs a signed operand to produce all one bits when negative, so cast to signed there.
On March 7, 2003, Angus Duggan pointed out the right-shift portability issue. On May 3, 2005, Randal E. Bryant alerted me to the need for the precondition, `INT_MIN <= x - y <= INT_MAX`, and suggested the non-quick and dirty version as a fix.
Both of these issues concern only the quick and dirty version. Nigel Horspoon observed on July 6, 2005 that gcc produced the same code on a Pentium as the obvious solution because of how it evaluates `(x < y)`. On July 9, 2008 Vincent Lefèvre pointed out the potential for overflow exceptions with subtractions in `r = y + ((x - y) & -(x < y))`, which was the previous version. Timothy B. Terriberry suggested using xor rather than add and subract to avoid casting and the risk of overflows on June 2, 2009.
<hr>
<a name="DetermineIfPowerOf2">
### Determining if an integer is a power of 2
</a>
```c
unsigned int v; // we want to see if v is a power of 2
bool f; // the result goes here
@ -304,33 +230,18 @@ bool f; // the result goes here
f = (v & (v - 1)) == 0;
```
Note that 0 is incorrectly considered a power of 2 here. To remedy
this, use:
Note that 0 is incorrectly considered a power of 2 here. To remedy this, use:
```c
f = v && !(v & (v - 1));
```
<hr>
<a name="FixedSignExtend">
### Sign extending from a constant bit-width
</a>
Sign extension is automatic for built-in types, such as chars and ints.
But suppose you have a signed two's complement number, x, that is stored
using only b bits. Moreover, suppose you want to convert x to an int,
which has more than b bits. A simple copy will work if x
is positive, but if negative, the sign must be extended. For example,
if we have only 4 bits to store a number, then -3 is represented as 1101
in binary. If we have 8 bits, then -3 is 11111101. The most-significant
bit of the 4-bit representation is replicated sinistrally to fill
in the destination when we convert to a representation with more bits;
this is sign extending.
In C, sign extension from a constant bit-width is trivial, since bit
fields may be specified in structs or unions.
For example, to convert from 5 bits to an full integer:
Sign extension is automatic for built-in types, such as chars and ints. But suppose you have a signed two's complement number, x, that is stored using only b bits. Moreover, suppose you want to convert x to an int, which has more than b bits. A simple copy will work if x is positive, but if negative, the sign must be extended. For example, if we have only 4 bits to store a number, then -3 is represented as `1101` in binary. If we have 8 bits, then -3 is `11111101`. The most-significant bit of the 4-bit representation is replicated sinistrally to fill in the destination when we convert to a representation with more bits; this is sign extending. In C, sign extension from a constant bit-width is trivial, since bit fields may be specified in structs or unions. For example, to convert from 5 bits to an full integer:
```c
int x; // convert this from using 5 bits to a full int
@ -339,9 +250,7 @@ struct {signed int x:5;} s;
r = s.x = x;
```
The following is a C++ template function that uses the same
language feature to convert from B bits in one operation (though
the compiler is generating more, of course).
The following is a C++ template function that uses the same language feature to convert from B bits in one operation (though the compiler is generating more, of course).
```c
template <typename T, unsigned B>
@ -354,21 +263,15 @@ inline T signextend(const T x)
int r = signextend<signed int,5>(x); // sign extend 5 bit number x to r
```
John Byrd caught a typo in the code (attributed to html formatting)
on May 2, 2005. On March 4, 2006, Pat Wood pointed out that the ANSI C
standard requires that the bitfield have the keyword "signed" to be signed;
otherwise, the sign is undefined.
<hr>
John Byrd caught a typo in the code (attributed to html formatting) on May 2, 2005. On March 4, 2006, Pat Wood pointed out that the ANSI C standard requires that the bitfield have the keyword "signed" to be signed; otherwise, the sign is undefined.
<hr>
<a name="VariableSignExtend">
### Sign extending from a variable bit-width
</a>
Sometimes we need to extend the sign of a number but we don't know a priori
the number of bits, b, in which it is represented. (Or we could be
programming in a language like Java, which lacks bitfields.)
Sometimes we need to extend the sign of a number but we don't know a priori the number of bits, b, in which it is represented. (Or we could be programming in a language like Java, which lacks bitfields.)
```c
unsigned b; // number of bits representing the number in x
@ -380,47 +283,23 @@ x = x & ((1U << b) - 1); // (Skip this if bits in x above position b are alread
r = (x ^ m) - m;
```
The code above requires four operations, but when the bitwidth is a
constant rather than variable, it requires only two fast operations,
assuming the upper bits are already zeroes.
A slightly faster but less portable method that doesn't depend on
the bits in x above position b being zero is:
The code above requires four operations, but when the bitwidth is a constant rather than variable, it requires only two fast operations, assuming the upper bits are already zeroes.
A slightly faster but less portable method that doesn't depend on the bits in x above position b being zero is:
```c
int const m = CHAR_BIT * sizeof(x) - b;
r = (x << m) >> m;
```
Sean A. Irvine suggested that I
add sign extension methods to this page on June 13, 2004, and he provided
<code>m = (1 << (b - 1)) - 1; r = -(x & ~m) | x;</code>
as a starting point from which I optimized to get
m = 1U << (b - 1); r = -(x & m) | x.
But then on May 11, 2007, Shay Green suggested the version above,
which requires one less operation than mine. Vipin Sharma suggested
I add a step to deal with situations where x had possible ones in bits
other than the b bits we wanted to sign-extend on Oct. 15, 2008.
On December 31, 2009 Chris Pirazzi suggested I add the faster version,
which requires two operations for constant bit-widths and three
for variable widths.
Sean A. Irvine suggested that I add sign extension methods to this page on June 13, 2004, and he provided `m = (1 << (b - 1)) - 1; r = -(x & ~m) | x;` as a starting point from which I optimized to get `m = 1U << (b - 1); r = -(x & m) | x`. But then on May 11, 2007, Shay Green suggested the version above, which requires one less operation than mine. Vipin Sharma suggested I add a step to deal with situations where x had possible ones in bits other than the b bits we wanted to sign-extend on Oct. 15, 2008. On December 31, 2009 Chris Pirazzi suggested I add the faster version, which requires two operations for constant bit-widths and three for variable widths.
<hr>
<a name="VariableSignExtendRisky">
### Sign extending from a variable bit-width in 3 operations
</a>
The following may be slow on some machines, due to the effort required for
multiplication and division. This version is 4 operations. If you
know that your initial bit-width, b, is greater than 1, you might do this
type of sign extension in 3 operations by using
r&nbsp;= (x * multipliers[b]) / multipliers[b],
which requires only one array lookup.
The following may be slow on some machines, due to the effort required for multiplication and division. This version is 4 operations. If you know that your initial bit-width, b, is greater than 1, you might do this type of sign extension in 3 operations by using `r = (x * multipliers[b]) / multipliers[b]`, which requires only one array lookup.
```c
unsigned b; // number of bits representing the number in x
@ -447,23 +326,17 @@ static int const divisors[] =
r = (x * multipliers[b]) / divisors[b];
```
The following variation is not portable,
but on architectures that employ an arithmetic right-shift,
maintaining the sign, it should be fast.
The following variation is not portable, but on architectures that employ an arithmetic right-shift, maintaining the sign, it should be fast.
```c
const int s = -b; // OR: sizeof(x) * CHAR_BIT - b;
r = (x << s) >> s;
```
Randal E. Bryant pointed out a bug on May 3, 2005 in an earlier version
(that used multipliers[] for divisors[]), where it failed on the case of
x=1 and b=1.
Randal E. Bryant pointed out a bug on May 3, 2005 in an earlier version (that used `multipliers[]` for `divisors[]`), where it failed on the case of `x=1` and `b=1`.
<hr>
<a name="ConditionalSetOrClearBitsWithoutBranching">
### Conditionally set or clear bits without branching
</a>
@ -480,15 +353,7 @@ w ^= (-f ^ w) & m;
w = (w & ~m) | (-f & m);
```
On some architectures, the lack of branching can more than make up for
what appears to be twice as many operations. For instance, informal
speed tests on an AMD Athlon™ XP 2100+ indicated it was 5-10%
faster. An Intel Core 2 Duo ran the superscalar version about 16%
faster than the first.
Glenn Slayden informed me of the first expression on
December 11, 2003. Marco Yu shared the superscalar version with me on
April 3, 2007 and alerted me to a typo 2 days later.
On some architectures, the lack of branching can more than make up for what appears to be twice as many operations. For instance, informal speed tests on an AMD Athlon™ XP 2100+ indicated it was 5-10% faster. An Intel Core 2 Duo ran the superscalar version about 16% faster than the first. Glenn Slayden informed me of the first expression on December 11, 2003. Marco Yu shared the superscalar version with me on April 3, 2007 and alerted me to a typo 2 days later.
<hr>
@ -497,9 +362,7 @@ April 3, 2007 and alerted me to a typo 2 days later.
### Conditionally negate a value without branching
</a>
If you need to negate only when a flag is false, then use the following
to avoid branching:
If you need to negate only when a flag is false, then use the following to avoid branching:
```c
bool fDontNegate; // Flag indicating we should not negate v.
@ -518,17 +381,10 @@ int r; // result = fNegate ? -v : v;
r = (v ^ -fNegate) + fNegate;
```
Avraham Plotnitzky suggested I add the first version on June 2, 2009.
Motivated to avoid the multiply, I came up with the second version on
June 8, 2009. Alfonso De Gregorio pointed out that some parens were
missing on November 26, 2009, and received a bug bounty.
Avraham Plotnitzky suggested I add the first version on June 2, 2009. Motivated to avoid the multiply, I came up with the second version on June 8, 2009. Alfonso De Gregorio pointed out that some parens were missing on November 26, 2009, and received a bug bounty.
<hr>
<a name="MaskedMerge">
### Merge bits from two values according to a mask
</a>
@ -543,17 +399,10 @@ unsigned int r; // result of (a & ~mask) | (b & mask) goes here
r = a ^ ((a ^ b) & mask);
```
This shaves one operation from the obvious way of combining two sets of bits
according to a bit mask. If the mask is a constant, then there may be no
advantage.
Ron Jeffery sent this to me on February 9, 2006.
This shaves one operation from the obvious way of combining two sets of bits according to a bit mask. If the mask is a constant, then there may be no advantage. Ron Jeffery sent this to me on February 9, 2006.
<hr>
<a name="CountBitsSetNaive">
### Counting bits set (naive way)
</a>