std/num/ddouble▲toc

Operations on double double 128-bit floating point numbers.

The ddoublestd/num/ddouble/ddouble: V type implements double double 128-bit floating point numbers as a pair of IEEE doublestd/core/double: V values. This extends the precision to 31 decimal digits (versus 15 for doublestd/core/double: V), but keeps the same range as a doublestd/core/double: V with a maximum value of about 1.8·10308 (dd-maxstd/num/ddouble/dd-max: ddouble). Because doublestd/core/double: Vs usually have hardware support, a ddoublestd/num/ddouble/ddouble: V is usually much faster than arbitrary precision floating point numbers.

Precision

If you add two regular 64-bit doublestd/core/double: V values you can quickly notice the imprecision due to decimal fractions that cannot be represented precisely. For example:

> 0.1 + 0.2
0.30000000000000004
> 0.1 + 0.2
0.30000000000000004

This happens because the decimal 0.1 and 0.2 cannot be represented exactly by a doublestd/core/double: V which is encoded in base 2. For example, if we show 0.1 to 20 digits, we get:

> 0.1.show(20)
"0.10000000000000000555"
> 0.1.show(20)
"0.10000000000000000555"

When we convert the constant 0.1 to a ddoublestd/num/ddouble/ddouble: V we can see with even more precision how it is approximated:

> 0.1.ddouble
0.1000000000000000055511151231258
> 0.1.ddouble
0.1000000000000000055511151231258

However, if we convert the number 0.1 directly to a ddoublestd/num/ddouble/ddouble: V (instead of going through a doublestd/core/double: V constant), we can represent 0.1 (almost) exactly:

> ddouble("0.1")   // for convenience; using `ddouble-exp(1,-1)` is more efficient
0.1
> ddouble("0.1")   // for convenience; using `ddouble-exp(1,-1)` is more efficient
0.1

This is possible because a ddoublestd/num/ddouble/ddouble: V uses two doublestd/core/double: Vs to represent numbers; in this case the first (hi) double is (the inexact) 0.1 while the second one (lo) is the correction to that number. The show-sumstd/num/ddouble/show-sum: (x : ddouble, prec : ?int) -> string (or show-hex) function shows this internal pair of numbers:

> ddouble("0.1").show-sum(20)
"0.10000000000000000555 + -5.55111512312578301027e-18"

> ddouble("0.1") + ddouble("0.2")
0.3
> ddouble("0.1").show-sumstd/num/ddouble/show-sum: (x : ddouble, prec : ?int) -> string(20)
"0.10000000000000000555 + -5.55111512312578301027e-18"

> ddouble("0.1") + ddouble("0.2")
0.3

Generally, a ddoublestd/num/ddouble/ddouble: V d is represented by a pair of doublestd/core/double: Vs, hi and lo, such that d equals hi+lo, where |lo| ≤ 0.5·ulp(hi). Note that despite the extra precision, underflow/overflow and machine precision are not as well-defined as with a regular doublestd/core/double: V [1, 6, 8]. Nevertheless, using a ddoublestd/num/ddouble/ddouble: V often prevents instability in practice over doublestd/core/double: V computations due to rounding errors or combining very large and small numbers.

Take the “thin triangle” problem for example [2, 7]. The challenge is to compute the area of a very thin triangle with sides $a$, $b$, and $c$. The short sides $b$ and $c$ are just 3 units in the last place (ulp) shorter than the longest side $a$. Using the value $s = (a + b + c)/2$, the area of a triangle is $A = \sqrt{s(s-a)(s-b)(s-c)}$. This is troublesome if $s$ is close to $a$ since $s-a$ magnifies any rounding error. Let's take $a=7$, and $b = c = 7/2 + 3\cdot2^{-111}$; according to [3], “If the units are in light-years, then the short sides are only longer than half the long side by $1/200$th the diameter of a proton. Yet that pops the triangle up to about the width of a doorway at the thickest point”. Using various 128-bit numbers we get:

exact          : 3.14784204874900425235885265494550774498...e-16
128-bit ddouble: 3.14784204874900425235885265494547e-16
128-bit posit  : 3.14784204874900425235885265494550774439e-16
128-bit ieee   : 3.63481490842332134725920516158057682878e-16

For this kind of example, a ddoublestd/num/ddouble/ddouble: V has better precision than a regular 128-bit IEEE float since it can combine very large and small values. (Kahan [7] shows how to rewrite the equations to avoid magnifying rounding errors – in that case the result for IEEE 128-bit floats becomes:

128-bit ieee x : 3.14784204874900425235885265494550792210e-16

)

The implementation is based closely on the QD C++ library [4, 5], and assumes proper 64-bit IEEE doublestd/core/double: Vs with correct rounding. Integers can be represented precisely up to 30 decimal digits (and a bit more… up to 2106 - 2).

References

[1]T. Dekker. A Floating-Point Technique for Extending the Available Precision. Numerische Mathematik, vol. 18 (3), June 1971, 224–242. pdf🔎
[2]David Goldberg. What Every Computer Scientist Should Know About Floating-point Arithmetic. ACM Computing Survey. vol. 23 (1), March 1991. doi: 10.1145/103162.103163. pdf🔎
[3]John L. Gustafson and Isaac Yonemoto. Beating Floating Point at its Own Game: Posit Arithmetic. 2017. pdf🔎
[4]Yozo Hida, Xiaoye S. Li, and David H. Bailey. Quad-Double Arithmetic: Algorithms, Implementation, and Application. Lawrence Berkeley National Laboratory Technical Report LBNL-46996. 2000. pdf🔎
[5]Yozo Hida, Xiaoye S. Li, and David H. Bailey. Library for double-double and quad-double arithmetic. (2007). pdf🔎
[6]Seppo Linnainmaa. Software for Doubled-Precision Floating-Point Computations. ACM Transactions on Mathematical Software (TOMS), vol. 7 (3), Sept. 1981, 272–283. 🔎
[7]William Kahan. Miscalculating Area and Angles of a Needle-like Triangle. Lecture notes, 2004, pdf🔎
[8]Jonathan Richard Shewchuk. Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates. Discrete & Computational Geometry, vol. 18, 305–363, 1997. pdf 🔎

.

The doublestd/core/double: V type implements double double 128-bit floating point numbers as a pair of IEEE doublestd/core/double: V values. This extends the precision to 31 decimal digits (versus 15 for doublestd/core/double: V), but keeps the same range as a doublestd/core/double: V with a maximum value of about 1.8·10308. Because doublestd/core/double: Vs usually have hardware support, a ddoublestd/num/ddouble/ddouble: V is usually much faster than arbitrary precision floating point numbers.

Internally a ddoublestd/num/ddouble/ddouble: V d is represented as a pair of doublestd/core/double: Vs, hi and lo, such that the number represented by d is hi+lo, where |lo| ≤ 0.5·ulp(hi).

Return if two ddoublestd/num/ddouble/ddouble: Vs are nearly equal with respect to an epsilon of 10*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble. See also nearly-eq which takes an explicit epsilon.

Return the absolute value.

The arc-cosine of x. Returns the angle in radians.

The area hyperbolic cosine of x.

The arc-sine of x. Returns the angle in radians.

The area hyperbolic sine of x.

The arc-tangent of x. Returns the angle in radians.

The arc-tangent of a point (x,y). Returns the angle with respect to the x-axis in radians between -π and π.

The area hyperbolic tangent of x.

The cosine function of a given angle in radians.

The hyperbolic cosine of x.

The e constant.

The ‘machine epsilon’: this is not well-defined for a ddoublestd/num/ddouble/ddouble: V in general since the difference between 1.0 and the next representable ddoublestd/num/ddouble/ddouble: V value is dd-true-minstd/num/ddouble/dd-true-min: ddouble. Instead, we take the square of dbl-epsilonstd/num/double/dbl-epsilon: double, i.e. 2-104.

The natural logarithm of 10.

The base-10 logarithm of e.

The natural logarithm of 2.

The base-2 logarithm of e.

The smallest positive ddoublestd/num/ddouble/ddouble: V that is still normalized.

Negative infinity.

Positive infinity.

The square-root of 2.

The smallest positive ddoublestd/num/ddouble/ddouble: V (which is subnormal).

Create a ddoublestd/num/ddouble/ddouble: V from an intstd/core/int: V. A ddoublestd/num/ddouble/ddouble: V can represent integers precisely up to 30 digits. If an integer is passed that is out of range an infinity is returned.

Parse a floating point number with up to 31 digits precision. Return dd-nanstd/num/ddouble/dd-nan: ddouble if the string is an invalid number.

Convert a ddoublestd/num/ddouble/ddouble: V to a decimalstd/num/decimal/decimal: V up to a given precision prec (= -1). A negative precision converts precisely. Returns 0 for non-finite ddoublestd/num/ddouble/ddouble: V's.

Return 10 to the power of exp.

Return 2 to the power of exp.

Return exp(x - 1.0). Avoids rounding errors for values of x very close to 1.0.

The floored fraction of x. This is always positive, such that x.floor + x.ffraction == x.

Is this a finite ddoublestd/num/ddouble/ddouble: V? (i.e. not nan? or inf?).

The fraction of x such that x.truncate + x.fraction == x.

The hypotenuse of x and y: sqrt(x*x + y*y). Prevents overflow for large numbers.

The square root of the sum of the squares of three doubles. Prevents overflow for large numbers.

The square root of the sum of squares of a list of doubles. Prevents overflow for large numbers and uses Kahan-Babuškan-Neumaier summation for precision.

Is this an infinite value.

Convert a ddoublestd/num/ddouble/ddouble: V to the nearest integer (rounding to the nearest even number in case of a tie).

‘Load exponent’: returns x·2exp.

The natural logarithm (in base e) of x.

The logarithm in base 10 of x.

Return log(1.0 + x). Avoids potential imprecision for small x where adding 1.0 explicitly may lead to rounding errors.

The logarithm in base 2 of x.

Returns log(exp(x) + exp(y)). Avoids overlow/underflow errors.

Returns log2( exp2(x) + exp2(y) ). Avoids overlow/underflow errors.

Return if two ddoublestd/num/ddouble/ddouble: Vs are nearly equal with respect to some epsilon (=8*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble). The epsilon is the nearest difference for numbers around 1.0. The routine automatically scales the epsilon for larger and smaller numbers, and for numbers close to zero.

Does x equal negative infinity?

The n-th root of a ddoublestd/num/ddouble/ddouble: V number x. n must be positive, and if n is even, then x must not be negative.

Does x equal positive infinity?

Round a ddoublestd/num/ddouble/ddouble: V to the nearest integer, rounding to the nearest even number in case of a tie.

Round a ddoublestd/num/ddouble/ddouble: V to a specified precision. Uses round if the precision is smaller or equal to zero.

Show a ddoublestd/num/ddouble/ddouble: V x with a given precision prec (=-31). The precision specifies the number of digits after the dot (in either scientific of fixed-point notation). If the precision is negative, at most prec digits are displayed, while for a positive precision, exactly prec digits behind the dot are displayed. This uses show-fixed when the exponent of x in scientific notation is larger than -5 and smaller than the precision (or 15 in case of a negative precision), otherwise it uses show-exp.

Show a ddouble x with a given precision prec (=-31) in scientific notation. The precision specifies the number of digits after the dot, i.e. the number of significant digits is prec+1. If the precision is negative, at most prec digits are displayed, and if it is positive exactly prec digits are used.

> 1.1.ddouble.show-exp
"1.1000000000000000888178419700125"
> 1.1.ddouble.show-exp(-100)
"1.100000000000000088817841970012523233890533447265625"
> 1.1.ddouble.show-exp(5)
"1.10000"
> 1.1.ddouble.show-exp(-5)
"1.1"
> 1.1.ddouble.show-exp
"1.1000000000000000888178419700125"
> 1.1.ddouble.show-exp(-100)
"1.100000000000000088817841970012523233890533447265625"
> 1.1.ddouble.show-exp(5)
"1.10000"
> 1.1.ddouble.show-exp(-5)
"1.1"

.

Show a ddouble x with a given precision prec (=-31) in fixed-point notation. The precision specifies the number of digits after the dot. If the precision is negative, at most prec digits after the dot are displayed, while for a positive precision, exactly prec digits are used.

> 0.1.ddouble.show-fixed
"0.1000000000000000055511151231258"
> 0.1.ddouble.show-fixed(-100)
"0.1000000000000000055511151231257827021181583404541015625"
> 0.1.ddouble.show-fixed(5)
"0.10000"
> 0.1.ddouble.show-fixed(-5)
"0.1"
> 0.1.ddouble.show-fixed
"0.1000000000000000055511151231258"
> 0.1.ddouble.show-fixed(-100)
"0.1000000000000000055511151231257827021181583404541015625"
> 0.1.ddouble.show-fixed(5)
"0.10000"
> 0.1.ddouble.show-fixed(-5)
"0.1"

.

Show a ddoublestd/num/ddouble/ddouble: V x precisely as the sum of two doublestd/core/double: Vs in hexadecimal notation. Use this if you need to guarantee that you can parse back ddoublestd/num/ddouble/ddouble: Vs exactly, i.e. x == x.show-hex.ddouble.

> 0.1.ddouble.show-hex
"0x1.999999999999Ap-4 + 0x0.0p+0"
> "0.1".ddouble.show-hex
"0x1.999999999999Ap-4 + -0x1.999999999999Ap-58"
> dd-pi.show-hex
"0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53"
> dd-max.show-hex
"0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969"
> 0.1.ddouble.show-hex
"0x1.999999999999Ap-4 + 0x0.0p+0"
> "0.1".ddouble.show-hex
"0x1.999999999999Ap-4 + -0x1.999999999999Ap-58"
> dd-pistd/num/ddouble/dd-pi: ddouble.show-hex
"0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53"
> dd-maxstd/num/ddouble/dd-max: ddouble.show-hex
"0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969"

.

Show a ddoublestd/num/ddouble/ddouble: V as the sum of doublestd/core/double: Vs with an optional precision. Note: use show-hex for reliable round-trip parsing.

The sine function of a given angle in radians.

Calculate sine and cosine on an angle in radians.

Return the sum of a list of doubles. Uses Kahan-Babuškan-Neumaier summation to minimize rounding errors. This is more precise as Kahan summation and about as fast.
[1.0e3,1.0e97,1.0e3,-1.0e97].sum == 2000.0
A. Neumaier, Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen. Math. Mechanik, 54:39–51, 1974.

The tangent of a given angle in radians.

The hyperbolic tangent of x.

Ten (10.ddouble).

maximal precision in decimal digits of a ddoublestd/num/ddouble/ddouble: V.

Parse a floating point number with up to 31 digits precision. Returns Nothingstd/core/Nothing: forall<a> maybe<a> on invalid input.