128-bit double-double floating point numbers.

The ddoublestd/num/ddouble/ddouble: V type implements double double 128-bit floating point numbers as a pair of IEEE float64std/core/types/float64: V values. This extends the precision to 31 decimal digits (versus 15 for float64std/core/types/float64: V), but keeps the same range as a float64std/core/types/float64: V with a maximum value of about 1.8·10³⁰⁸ (dd-maxstd/num/ddouble/dd-max: ddouble). Because float64std/core/types/float64: Vs usually have hardware support, a ddoublestd/num/ddouble/ddouble: V is usually much faster than arbitrary precision floating point numbers.

Precision

If you add two regular 64-bit float64std/core/types/float64: V values you can quickly notice the imprecision due to decimal fractions that cannot be represented precisely. For example:

> 0.1 + 0.2
0.30000000000000004
> 0.1 + 0.2
0.30000000000000004

This happens because the decimal 0.1 and 0.2 cannot be represented exactly by a float64std/core/types/float64: V which is encoded in base 2. For example, if we show 0.1 to 20 digits, we get:

> 0.1.show(20)
"0.10000000000000000555"
> 0.1.show(20)
"0.10000000000000000555"

When we convert the constant 0.1 to a ddoublestd/num/ddouble/ddouble: V we can see with even more precision how it is approximated:

> 0.1.ddouble
0.1000000000000000055511151231258
> 0.1.ddouble
0.1000000000000000055511151231258

However, if we convert the number 0.1 directly to a ddoublestd/num/ddouble/ddouble: V (instead of going through a float64std/core/types/float64: V constant), we can represent 0.1 more precisely:

> ddouble("0.1")   // for convenience; using `ddouble-exp(1,-1)` is more efficient
0.1
> ddouble("0.1")   // for convenience; using `ddouble-exp(1,-1)` is more efficient
0.1

This is possible because a ddoublestd/num/ddouble/ddouble: V uses two float64std/core/types/float64: Vs to represent numbers; in this case the first (hi) float64 is (the inexact) 0.1 while the second one (lo) is the correction to that number. The show-sumstd/num/ddouble/show-sum: (x : ddouble, prec : ? int) -> string (or show-hex) function shows this internal pair of numbers:

> ddouble("0.1").show-sum(20)
"0.10000000000000000555 + -5.55111512312578301027e-18"

> ddouble("0.1") + ddouble("0.2")
0.3
> ddouble("0.1").show-sumstd/num/ddouble/show-sum: (x : ddouble, prec : ? int) -> string(20)
"0.10000000000000000555 + -5.55111512312578301027e-18"

> ddouble("0.1") + ddouble("0.2")
0.3

Generally, a ddoublestd/num/ddouble/ddouble: V d is represented by a pair of float64std/core/types/float64: Vs, hi and lo, such that d equals hi+lo, where |lo| ≤ 0.5·ulp(hi). Note that despite the extra precision, underflow/overflow and machine precision are not as well-defined as with a regular float64std/core/types/float64: V [1, 6, 8]. Nevertheless, using a ddoublestd/num/ddouble/ddouble: V often prevents instability in practice over float64std/core/types/float64: V computations due to rounding errors or combining very large and small numbers.

Take the “thin triangle” problem for example [2, 7]. The challenge is to compute the area of a very thin triangle with sides , , and . The short sides and are just 3 units in the last place (ulp) shorter than the longest side . Using the value , the area of a triangle is . This is troublesome if is close to since magnifies any rounding error. Let's take , and ; according to [3], “If the units are in light-years, then the short sides are only longer than half the long side by th the diameter of a proton. Yet that pops the triangle up to about the width of a doorway at the thickest point”. Using various 128-bit numbers we get:

exact          : 3.14784204874900425235885265494550774498...e-16
128-bit ddouble: 3.14784204874900425235885265494547e-16
128-bit posit  : 3.14784204874900425235885265494550774439e-16
128-bit ieee   : 3.63481490842332134725920516158057682878e-16

For this kind of example, a ddoublestd/num/ddouble/ddouble: V has better precision than a regular 128-bit IEEE float since it can combine very large and small values. Note that Kahan [7] shows how to rewrite the area equation to avoid magnifying rounding errors – in that case the result for IEEE 128-bit floats becomes:

128-bit ieee x : 3.14784204874900425235885265494550792210e-16

The implementation is based closely on the excellent QD C++ library [4, 5], and assumes proper 64-bit IEEE float64std/core/types/float64: Vs with correct rounding. Integers can be represented precisely up to 30 decimal digits (and a bit more… up to 2¹⁰⁶ - 2).

References

[1]T. Dekker. A Floating-Point Technique for Extending the Available Precision. Numerische Mathematik, vol. 18 (3), June 1971, 224–242. pdf. 🔎

[2]David Goldberg. What Every Computer Scientist Should Know About Floating-point Arithmetic. ACM Computing Survey. vol. 23 (1), March 1991. doi: 10.1145/103162.103163. pdf. 🔎

[3]John L. Gustafson and Isaac Yonemoto. Beating Floating Point at its Own Game: Posit Arithmetic. 2017. pdf. 🔎

[4]Yozo Hida, Xiaoye S. Li, and David H. Bailey. Quad-Double Arithmetic: Algorithms, Implementation, and Application. Lawrence Berkeley National Laboratory Technical Report LBNL-46996. 2000. pdf. 🔎

[5]Yozo Hida, Xiaoye S. Li, and David H. Bailey. Library for double-double and quad-double arithmetic. (2007). pdf. 🔎

[6]Seppo Linnainmaa. Software for Doubled-Precision Floating-Point Computations. ACM Transactions on Mathematical Software (TOMS), vol. 7 (3), Sept. 1981, 272–283. 🔎

[7]William Kahan. Miscalculating Area and Angles of a Needle-like Triangle. Lecture notes, 2004, pdf. 🔎

[8]Jonathan Richard Shewchuk. Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates. Discrete & Computational Geometry, vol. 18, 305–363, 1997. pdf 🔎

The ddoublestd/num/ddouble/ddouble: V type implements double double 128-bit floating point numbers as a pair of IEEE float64std/core/types/float64: V values. This extends the precision to 31 decimal digits (versus 15 for float64std/core/types/float64: V), but keeps the same range as a float64std/core/types/float64: V with a maximum value of about 1.8·10³⁰⁸. Because float64std/core/types/float64: Vs usually have hardware support, a ddoublestd/num/ddouble/ddouble: V is usually much faster than arbitrary precision floating point numbers.

Internally a ddoublestd/num/ddouble/ddouble: V d is represented as a pair of float64std/core/types/float64: Vs, hi and lo, such that the number represented by d is hi+lo, where |lo| ≤ 0.5·ulp(hi).

Create a ddoublestd/num/ddouble/ddouble: V from a float64std/core/types/float64: V.

Create a ddoublestd/num/ddouble/ddouble: V x such that x equals d·10^e.

Create a ddoublestd/num/ddouble/ddouble: V from an intstd/core/types/int: V. A ddoublestd/num/ddouble/ddouble: V can represent integers precisely up to 30 digits. If an integer is passed that is out of range an infinity is returned.

Create a ddoublestd/num/ddouble/ddouble: V x such that x equals i·10^e.

The square root of the sum of squares of a list of doubles. Prevents overflow for large numbers and uses Kahan-Babuškan-Neumaier summation for precision.

Return the sum of a list of doubles. Uses Kahan-Babuškan-Neumaier summation to minimize rounding errors. This is more precise as Kahan summation and about as fast.
[1.0e3,1.0e97,1.0e3,-1.0e97].sum == 2000.0
A. Neumaier, Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen. Math. Mechanik, 54:39–51, 1974.

Parse a floating point number with up to 31 digits precision. Return dd-nanstd/num/ddouble/dd-nan: ddouble if the string is an invalid number.

The square root of the sum of the squares of three doubles. Prevents overflow for large numbers.

Remainder of two ddoublestd/num/ddouble/ddouble: Vs.

Multiply two ddoublestd/num/ddouble/ddouble: Vs.

Add two ddoublestd/num/ddouble/ddouble: Vs.

Subtract two values.

Divide two ddoublestd/num/ddouble/ddouble: Vs.

Negate a ddoublestd/num/ddouble/ddouble: V.

Return if two ddoublestd/num/ddouble/ddouble: Vs are nearly equal with respect to an epsilon of 10*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble. See also nearly-eq which takes an explicit epsilon.

Return the absolute value.

The arc-cosine of x. Returns the angle in radians.

The area hyperbolic cosine of x.

The arc-sine of x. Returns the angle in radians.

The area hyperbolic sine of x.

The arc-tangent of x. Returns the angle in radians.

The arc-tangent of a point (x,y). Returns the angle with respect to the x-axis in radians between -π and π.

The area hyperbolic tangent of x.

Round to positive infinity.

Compare two ddoublestd/num/ddouble/ddouble: V values.

The cosine function of a given angle in radians.

The hyperbolic cosine of x.

The e constant.

The ‘machine epsilon’: this is not well-defined for a ddoublestd/num/ddouble/ddouble: V in general since the difference between 1.0 and the next representable ddoublestd/num/ddouble/ddouble: V value is dd-true-minstd/num/ddouble/dd-true-min: ddouble. Instead, we take the square of flt-epsilonstd/num/float64/flt-epsilon: float64, i.e. 2^-104.

Euler's constant.

The natural logarithm of 10.

The natural logarithm of 2.

The base-10 logarithm of e.

The base-2 logarithm of e.

The maximum representable ddoublestd/num/ddouble/ddouble: V.

The smallest positive ddoublestd/num/ddouble/ddouble: V that is still normalized.

Not-A-Number.

Negative infinity.

π.

π/2.

3π/4.

π/4.

Positive infinity.

1.0 / sqrt(2.0).

The square-root of 2.

The smallest positive ddoublestd/num/ddouble/ddouble: V (which is subnormal).

2π.

Create a ddoublestd/num/ddouble/ddouble: V as the sum of two float64std/core/types/float64: V's.

Decrement by one.

Convert a ddoublestd/num/ddouble/ddouble: V to a decimalstd/num/decimal/decimal: V up to a given precision prec (= -1). A negative precision converts precisely. Returns 0 for non-finite ddoublestd/num/ddouble/ddouble: V's.

Decode a ddoublestd/num/ddouble/ddouble: V d into two doubles (hi,lo) such that d equals hi+lo, where lo ≤ 0.5·ulp(hi).

Division and remainder of two ddoublestd/num/ddouble/ddouble: Vs.

Encode a ddoublestd/num/ddouble/ddouble: V d from two doubles (hi,lo) such that d equals hi+lo.

Return e (dd-estd/num/ddouble/dd-e: ddouble) to the power of x.

Return 10 to the power of exp.

Return 2 to the power of exp.

Return exp(x - 1.0). Avoids rounding errors for values of x very close to 1.0.

The floored fraction of x. This is always positive, such that x.floor + x.ffraction == x.

Convert a ddoublestd/num/ddouble/ddouble: V to a float64std/core/types/float64: V (losing precision).

Round to negative infinity.

The fraction of x such that x.truncate + x.fraction == x.

The hypotenuse of x and y: sqrt(x*x + y*y). Prevents overflow for large numbers.

Increment by one.

Convert a ddoublestd/num/ddouble/ddouble: V to the nearest integer (rounding to the nearest even number in case of a tie).

Is this a finite ddoublestd/num/ddouble/ddouble: V? (i.e. not is-nan or is-inf).

Is this an infinite value.

Is this ddoublestd/num/ddouble/ddouble: V not-a-number?

Is this ddoublestd/num/ddouble/ddouble: V negative?

Does x equal negative infinity?

Is this ddoublestd/num/ddouble/ddouble: V positive?

Does x equal positive infinity?

Return the sign of the ddoublestd/num/ddouble/ddouble: V.

Is this ddoublestd/num/ddouble/ddouble: V equal to is-zero.

‘Load exponent’: returns x·2^exp.

The natural logarithm (in base e) of x.

Return ln(1.0 + x). Avoids potential imprecision for small x where adding 1.0 explicitly may lead to rounding errors.

Returns ln(exp(x) + exp(y)). Avoids overlow/underflow errors.

Return the logarithm in some base b of a ddoublestd/num/ddouble/ddouble: V x.

The logarithm in base 10 of x.

The logarithm in base 2 of x.

Returns log2( exp2(x) + exp2(y) ). Avoids overlow/underflow errors.

The maximum of x and y.

The minimum of x and y.

Return if two ddoublestd/num/ddouble/ddouble: Vs are nearly equal with respect to some epsilon (=8*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble). The epsilon is the nearest difference for numbers around 1.0. The routine automatically scales the epsilon for larger and smaller numbers, and for numbers close to zero.

The n-th root of a ddoublestd/num/ddouble/ddouble: V number x. n must be positive, and if n is even, then x must not be negative.

One.

x to the power of y both as ddoublestd/num/ddouble/ddouble: V.

Round a ddoublestd/num/ddouble/ddouble: V to the nearest integer, rounding to the nearest even number in case of a tie.

Round a ddoublestd/num/ddouble/ddouble: V to a specified precision. Uses round if the precision is smaller or equal to zero.

Show a ddoublestd/num/ddouble/ddouble: V x with a given precision prec (=-31). The precision specifies the number of digits after the dot (in either scientific of fixed-point notation). If the precision is negative, at most prec digits are displayed, while for a positive precision, exactly prec digits behind the dot are displayed. This uses show-fixed when the exponent of x in scientific notation is larger than -5 and smaller than the precision (or 15 in case of a negative precision), otherwise it uses show-exp.

Show a ddouble x with a given precision prec (=-31) in scientific notation. The precision specifies the number of digits after the dot, i.e. the number of significant digits is prec+1. If the precision is negative, at most prec digits are displayed, and if it is positive exactly prec digits are used.

> 1.1.ddouble.show-exp
"1.1000000000000000888178419700125"
> 1.1.ddouble.show-exp(-100)
"1.100000000000000088817841970012523233890533447265625"
> 1.1.ddouble.show-exp(5)
"1.10000"
> 1.1.ddouble.show-exp(-5)
"1.1"
> 1.1.ddouble.show-exp
"1.1000000000000000888178419700125"
> 1.1.ddouble.show-exp(-100)
"1.100000000000000088817841970012523233890533447265625"
> 1.1.ddouble.show-exp(5)
"1.10000"
> 1.1.ddouble.show-exp(-5)
"1.1"

Show a ddouble x with a given precision prec (=-31) in fixed-point notation. The precision specifies the number of digits after the dot. If the precision is negative, at most prec digits after the dot are displayed, while for a positive precision, exactly prec digits are used.

> 0.1.ddouble.show-fixed
"0.1000000000000000055511151231258"
> 0.1.ddouble.show-fixed(-100)
"0.1000000000000000055511151231257827021181583404541015625"
> 0.1.ddouble.show-fixed(5)
"0.10000"
> 0.1.ddouble.show-fixed(-5)
"0.1"
> 0.1.ddouble.show-fixed
"0.1000000000000000055511151231258"
> 0.1.ddouble.show-fixed(-100)
"0.1000000000000000055511151231257827021181583404541015625"
> 0.1.ddouble.show-fixed(5)
"0.10000"
> 0.1.ddouble.show-fixed(-5)
"0.1"

Show a ddoublestd/num/ddouble/ddouble: V x precisely as the sum of two float64std/core/types/float64: Vs in hexadecimal notation. Use this if you need to guarantee that you can parse back ddoublestd/num/ddouble/ddouble: Vs exactly, i.e. x == x.show-hex.ddouble.

> 0.1.ddouble.show-hex
"0x1.999999999999Ap-4 + 0x0.0p+0"
> "0.1".ddouble.show-hex
"0x1.999999999999Ap-4 + -0x1.999999999999Ap-58"
> dd-pi.show-hex
"0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53"
> dd-max.show-hex
"0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969"
> 0.1.ddouble.show-hex
"0x1.999999999999Ap-4 + 0x0.0p+0"
> "0.1".ddouble.show-hex
"0x1.999999999999Ap-4 + -0x1.999999999999Ap-58"
> dd-pistd/num/ddouble/dd-pi: ddouble.show-hex
"0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53"
> dd-maxstd/num/ddouble/dd-max: ddouble.show-hex
"0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969"

Show a ddoublestd/num/ddouble/ddouble: V as the sum of float64std/core/types/float64: Vs with an optional precision. Note: use show-hex for reliable round-trip parsing.

The sine function of a given angle in radians.

Calculate sine and cosine on an angle in radians.

The hyperbolic sine of x.

Multiply x with itself.

The square root of a non-negative ddoublestd/num/ddouble/ddouble: V x. For negative x, dd-nanstd/num/ddouble/dd-nan: ddouble is returned.

The tangent of a given angle in radians.

The hyperbolic tangent of x.

Ten (10.ddouble).

Round towards zero.

Return x with the sign of y.

Zero constant.

std/num/ddouble▲toc

128-bit double-double floating point numbers.

Precision

References

std/num/ddouble ▲toc