Thoughts on IEEE P754

Mike Cowlishaw MFC at uk.ibm.com
Fri Aug 22 01:28:42 PDT 2008


Waldemar wrote:
 
> I just had some time to spend reading the proposed draft of the new 
> floating point spec.  It quickly became apparent that ES3.1 is far 
> out of conformance with that draft and it would be a significant 
> undertaking to update it to conform to the new standard.  In 
> addition, doing so would introduce breaking incompatibilities into 
> the language.

Not necessarily -- see below.  IEEE 754-2008 does not mandate how a 
language should conform, and in particular it does not, in general, 
mandate that the 754 behavior be the default.

The discussion of whether ES should attempt to become a conforming 
implementation is a good one, however!

> Here is a smattering of major issues.  The chapter and section 
> numbers refer to the draft IEEE P754:
> 
> - Many, if not most, of the mandatory functions in chapter 5 are not
> implemented in ES3.1.  There are a lot that we'd also need to 
> implement on binary floating-point numbers.

This is true.
 
> - IEEE P754 5.12.1 mandates that (decimal or binary) -0 print as 
> "-0".  This is not going to happen.

It only requires that there be a function which converts to and from a 
string which preserves (among other things) the sign of a zero.  (For 
example, there could be a 'toIEEEstring' function, just as Java has an 
IEEEremainder static method in the Math class.) 
 
> - IEEE P754 5.12 mandates that binary floating-point numbers convert
> exactly to a string format with a hexadecimal mantissa and back.  An
> implementation must provide these conversions.

This is just one of the mandatory functions in Clause 5 already mentioned. 
 This one is trivial because this is the hexadecimal string format in C99, 
so C compilers should already provide this.

> A few other tidbits about IEEE P754:
> 
> - There are many different ways of representing the same decimal 
> number, which the standard calls a cohort.  Some numbers, such as 
> zeroes, have thousands of different variants within the cohort. 
> Fortunately, printing a decimal number is not required to indicate 
> which member of a cohort is chosen.  This avoids problems for us 
> such as having the numbers x == y but a[x] and a[y] silently 
> referring to different array elements (which is what would happen if
> we printed 5.00m differently from 5.m).

As with -0, one is not required to preserve the zeros by default, but I 
would strongly recommend that it is the default behavior (it is required 
that there be some function that does preserve the zeros).  IEEE 754-2008 
requires that it be possible to convert a decimal number to a string and 
back again and end up with the same sign, significand, and exponent as the 
original number.

For why this is important, see: 
http://speleotrove.com/decimal/decifaq1.html#tzeros

(The default convert-decimal-to-string could/should preserve the sign of 
zeros, too, as this won't break anything.)

> - The standard now supports signed NaN's which are distinguishable 
> from each other in various ways (total order predicate, isSignMinus,
> etc.).  There has been some talk about deferring to the IEEE spec 
> instead of listing the permissible binary floating-point values 
> explicitly as was done in ES3, but we must maintain something like 
> the existing language to avoid getting into this quagmire.

This is an interesting one.  IEEE 754-1985 really had its head in the sand 
on the sign of NaNs; anything that could look at the encoding of a binary 
floating-point number could detect the sign of a NaN (even though it had 
no meaning other than for diagnostics).  However, the copy operations such 
as abs and negate (optional and ill-defined in 754-1985 but now required) 
define sign manipulations, and it was agreed that these manipulations 
should be predictable whether the value is a NaN or not.  Hence 754-2008 
had to tighten up the definition of the sign of a NaN in several ways (but 
it is still undefined after most operations -- for example, hardware 
might, and often does, XOR the signs during a multiply, regardless of the 
remaining bits of the encoding).
 
> - Implementors of the standard could not agree on the binary 
> representation of decimal numbers.  There are two mutually 
> incompatible ways of representing them -- the mantissa can be either
> a binary integer or an array of 10-bit groups, each representing 
> three decimal digits.  I can't tell whether they behave differently 
> in some scenarios -- if the format is wide enough, the integral 
> representation can represent more values, and their handling of 
> noncanonicals differs.

They both describe an identical set of values, so an implementation could 
use either.  We use the decimal encoding because it's faster in software 
(and almost certainly in hardware, too) because decimal rounding is so 
much simpler with a decimal encoding -- see 
http://speleotrove.com/decimal/decperf.html.  All the hardware 
implementations use the decimal encoding.
 
> - Some parts of IEEE P754 are ambiguous.  For example, the notion of
> a subnormal is unclear when applied to decimal numbers.  It appears 
> that you can have two equal values x and y of the same decimal 
> format where x == y, x is subnormal, but y is not subnormal.

Not so:

  2.1.51 subnormal number: In a particular format, a non-zero 
  floating-point number with magnitude less than the magnitude 
  of that format's smallest normal number

and:

  2.1.38 normal number: For a particular format, a finite non-zero 
  floating-point number with magnitude greater than or equal to a 
  minimum b^emin value, where b is the radix. 

(where ^ indicates superscript).  emin is 1-emax, and emax is part of the 
definition of a format and is a constant for a format.

(Also, underflow is more strictly defined for decimal than for binary.  We 
were unable to clear up the messiness there for binary floating-point.)

> Some tidbits about our embedding of decimal:
> 
> - Contagion should be towards decimal if decimal and binary are 
> mixed as operands.  5.3m + 1 should be 6.3m, not 6.3.  If we use 
> 128-bit decimal, this also makes the behavior of heterogeneous 
> comparisons (binary compared to decimal) sensible.

Yes, this could work.  To answer Sam's later question:

  What should 5.3m + 1.0000000000000001 produce?

this would convert the binary constant (which will have been rounded to 1 
on conversion from the string to double) to decimal, so the result would 
be 6.3m.

It doesn't really help with comparisons, for example, comparing 0.1 to 
0.1m would still compare false.  (The double representation of 0.1 is 
exactly 0.1000000000000000055511151231257827021181583404541015625, which 
when converted to a decimal128 is 0.1000000000000000055511151231257827.)
 
> - We should not print the trailing zeroes.  IEEE P754 does not 
> require it, and it breaks things such as array lookup.  There is 
> precedence for this in ECMAScript:  -0 and +0 both print as "0".  If
> someone really wants to distinguish among equal numbers, he can do 
> it with a function call to one of the mandated functions.

(See comment above.)

Mike






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU








More information about the Es-discuss mailing list