std/num/ddouble▲toc

128-bit double-double floating point numbers.

The ddoublestd/num/ddouble/ddouble: V type implements double double 128-bit floating point numbers as a pair of IEEE float64std/core/types/float64: V values. This extends the precision to 31 decimal digits (versus 15 for float64std/core/types/float64: V), but keeps the same range as a float64std/core/types/float64: V with a maximum value of about 1.8·10308 (dd-maxstd/num/ddouble/dd-max: ddouble). Because float64std/core/types/float64: Vs usually have hardware support, a ddoublestd/num/ddouble/ddouble: V is usually much faster than arbitrary precision floating point numbers.

Precision

If you add two regular 64-bit float64std/core/types/float64: V values you can quickly notice the imprecision due to decimal fractions that cannot be represented precisely. For example:

> 0.1 + 0.2
0.30000000000000004
> 0.1 + 0.2
0.30000000000000004

This happens because the decimal 0.1 and 0.2 cannot be represented exactly by a float64std/core/types/float64: V which is encoded in base 2. For example, if we show 0.1 to 20 digits, we get:

> 0.1.show(20)
"0.10000000000000000555"
> 0.1.show(20)
"0.10000000000000000555"

When we convert the constant 0.1 to a ddoublestd/num/ddouble/ddouble: V we can see with even more precision how it is approximated:

> 0.1.ddouble
0.1000000000000000055511151231258
> 0.1.ddouble
0.1000000000000000055511151231258

However, if we convert the number 0.1 directly to a ddoublestd/num/ddouble/ddouble: V (instead of going through a float64std/core/types/float64: V constant), we can represent 0.1 more precisely:

> ddouble("0.1")   // for convenience; using `ddouble-exp(1,-1)` is more efficient
0.1
> ddouble("0.1")   // for convenience; using `ddouble-exp(1,-1)` is more efficient
0.1

This is possible because a ddoublestd/num/ddouble/ddouble: V uses two float64std/core/types/float64: Vs to represent numbers; in this case the first (hi) float64 is (the inexact) 0.1 while the second one (lo) is the correction to that number. The show-sumstd/num/ddouble/show-sum: (x : ddouble, prec : ? int) -> string (or show-hex) function shows this internal pair of numbers:

> ddouble("0.1").show-sum(20)
"0.10000000000000000555 + -5.55111512312578301027e-18"

> ddouble("0.1") + ddouble("0.2")
0.3
> ddouble("0.1").show-sumstd/num/ddouble/show-sum: (x : ddouble, prec : ? int) -> string(20)
"0.10000000000000000555 + -5.55111512312578301027e-18"

> ddouble("0.1") + ddouble("0.2")
0.3

Generally, a ddoublestd/num/ddouble/ddouble: V d is represented by a pair of float64std/core/types/float64: Vs, hi and lo, such that d equals hi+lo, where |lo| ≤ 0.5·ulp(hi). Note that despite the extra precision, underflow/overflow and machine precision are not as well-defined as with a regular float64std/core/types/float64: V [1, 6, 8]. Nevertheless, using a ddoublestd/num/ddouble/ddouble: V often prevents instability in practice over float64std/core/types/float64: V computations due to rounding errors or combining very large and small numbers.

Take the “thin triangle” problem for example [2, 7]. The challenge is to compute the area of a very thin triangle with sides $a$, $b$, and $c$. The short sides $b$ and $c$ are just 3 units in the last place (ulp) shorter than the longest side $a$. Using the value $s = (a + b + c)/2$, the area of a triangle is $A = \sqrt{s(s-a)(s-b)(s-c)}$. This is troublesome if $s$ is close to $a$ since $s-a$ magnifies any rounding error. Let's take $a=7$, and $b = c = 7/2 + 3\cdot2^{-111}$; according to [3], “If the units are in light-years, then the short sides are only longer than half the long side by $1/200$th the diameter of a proton. Yet that pops the triangle up to about the width of a doorway at the thickest point”. Using various 128-bit numbers we get:

exact          : 3.14784204874900425235885265494550774498...e-16
128-bit ddouble: 3.14784204874900425235885265494547e-16
128-bit posit  : 3.14784204874900425235885265494550774439e-16
128-bit ieee   : 3.63481490842332134725920516158057682878e-16

For this kind of example, a ddoublestd/num/ddouble/ddouble: V has better precision than a regular 128-bit IEEE float since it can combine very large and small values. Note that Kahan [7] shows how to rewrite the area equation to avoid magnifying rounding errors – in that case the result for IEEE 128-bit floats becomes:

128-bit ieee x : 3.14784204874900425235885265494550792210e-16

The implementation is based closely on the excellent QD C++ library [4, 5], and assumes proper 64-bit IEEE float64std/core/types/float64: Vs with correct rounding. Integers can be represented precisely up to 30 decimal digits (and a bit more… up to 2106 - 2).

References

[1]T. Dekker. A Floating-Point Technique for Extending the Available Precision. Numerische Mathematik, vol. 18 (3), June 1971, 224–242. pdf🔎
[2]David Goldberg. What Every Computer Scientist Should Know About Floating-point Arithmetic. ACM Computing Survey. vol. 23 (1), March 1991. doi: 10.1145/103162.103163. pdf🔎
[3]John L. Gustafson and Isaac Yonemoto. Beating Floating Point at its Own Game: Posit Arithmetic. 2017. pdf🔎
[4]Yozo Hida, Xiaoye S. Li, and David H. Bailey. Quad-Double Arithmetic: Algorithms, Implementation, and Application. Lawrence Berkeley National Laboratory Technical Report LBNL-46996. 2000. pdf🔎
[5]Yozo Hida, Xiaoye S. Li, and David H. Bailey. Library for double-double and quad-double arithmetic. (2007). pdf🔎
[6]Seppo Linnainmaa. Software for Doubled-Precision Floating-Point Computations. ACM Transactions on Mathematical Software (TOMS), vol. 7 (3), Sept. 1981, 272–283. 🔎
[7]William Kahan. Miscalculating Area and Angles of a Needle-like Triangle. Lecture notes, 2004, pdf🔎
[8]Jonathan Richard Shewchuk. Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates. Discrete & Computational Geometry, vol. 18, 305–363, 1997. pdf 🔎

.

type ddoublestd/num/ddouble/ddouble: V

The ddoublestd/num/ddouble/ddouble: V type implements double double 128-bit floating point numbers as a pair of IEEE float64std/core/types/float64: V values. This extends the precision to 31 decimal digits (versus 15 for float64std/core/types/float64: V), but keeps the same range as a float64std/core/types/float64: V with a maximum value of about 1.8·10308. Because float64std/core/types/float64: Vs usually have hardware support, a ddoublestd/num/ddouble/ddouble: V is usually much faster than arbitrary precision floating point numbers.

Internally a ddoublestd/num/ddouble/ddouble: V d is represented as a pair of float64std/core/types/float64: Vs, hi and lo, such that the number represented by d is hi+lo, where |lo| ≤ 0.5·ulp(hi).

Create a ddoublestd/num/ddouble/ddouble: V from an intstd/core/types/int: V. A ddoublestd/num/ddouble/ddouble: V can represent integers precisely up to 30 digits. If an integer is passed that is out of range an infinity is returned.

The square root of the sum of squares of a list of doubles. Prevents overflow for large numbers and uses Kahan-Babuškan-Neumaier summation for precision.

Return the sum of a list of doubles. Uses Kahan-Babuškan-Neumaier summation to minimize rounding errors. This is more precise as Kahan summation and about as fast.
[1.0e3,1.0e97,1.0e3,-1.0e97].sum == 2000.0
A. Neumaier, Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen. Math. Mechanik, 54:39–51, 1974.

Parse a floating point number with up to 31 digits precision. Return dd-nanstd/num/ddouble/dd-nan: ddouble if the string is an invalid number.

The square root of the sum of the squares of three doubles. Prevents overflow for large numbers.

Return if two ddoublestd/num/ddouble/ddouble: Vs are nearly equal with respect to an epsilon of 10*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble. See also nearly-eq which takes an explicit epsilon.

Return the absolute value.

The arc-cosine of x. Returns the angle in radians.

The area hyperbolic cosine of x.

The arc-sine of x. Returns the angle in radians.

The area hyperbolic sine of x.

The arc-tangent of x. Returns the angle in radians.

The arc-tangent of a point (x,y). Returns the angle with respect to the x-axis in radians between -π and π.

The area hyperbolic tangent of x.

The cosine function of a given angle in radians.

The hyperbolic cosine of x.

The e constant.

The ‘machine epsilon’: this is not well-defined for a ddoublestd/num/ddouble/ddouble: V in general since the difference between 1.0 and the next representable ddoublestd/num/ddouble/ddouble: V value is dd-true-minstd/num/ddouble/dd-true-min: ddouble. Instead, we take the square of flt-epsilonstd/num/float64/flt-epsilon: float64, i.e. 2-104.

The natural logarithm of 10.

The natural logarithm of 2.

The base-10 logarithm of e.

The base-2 logarithm of e.

The smallest positive ddoublestd/num/ddouble/ddouble: V that is still normalized.

Negative infinity.

Positive infinity.

The square-root of 2.

The smallest positive ddoublestd/num/ddouble/ddouble: V (which is subnormal).

Convert a ddoublestd/num/ddouble/ddouble: V to a decimalstd/num/decimal/decimal: V up to a given precision prec (= -1). A negative precision converts precisely. Returns 0 for non-finite ddoublestd/num/ddouble/ddouble: V's.

Decode a ddoublestd/num/ddouble/ddouble: V d into two doubles (hi,lo) such that d equals hi+lo, where lo ≤ 0.5·ulp(hi).

Encode a ddoublestd/num/ddouble/ddouble: V d from two doubles (hi,lo) such that d equals hi+lo.

Return 10 to the power of exp.

Return 2 to the power of exp.

Return exp(x - 1.0). Avoids rounding errors for values of x very close to 1.0.

The floored fraction of x. This is always positive, such that x.floor + x.ffraction == x.

The fraction of x such that x.truncate + x.fraction == x.

The hypotenuse of x and y: sqrt(x*x + y*y). Prevents overflow for large numbers.

Convert a ddoublestd/num/ddouble/ddouble: V to the nearest integer (rounding to the nearest even number in case of a tie).

Is this an infinite value.

Does x equal negative infinity?

Does x equal positive infinity?

‘Load exponent’: returns x·2exp.

The natural logarithm (in base e) of x.

Return ln(1.0 + x). Avoids potential imprecision for small x where adding 1.0 explicitly may lead to rounding errors.

Returns ln(exp(x) + exp(y)). Avoids overlow/underflow errors.

The logarithm in base 10 of x.

The logarithm in base 2 of x.

Returns log2( exp2(x) + exp2(y) ). Avoids overlow/underflow errors.

Return if two ddoublestd/num/ddouble/ddouble: Vs are nearly equal with respect to some epsilon (=8*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble). The epsilon is the nearest difference for numbers around 1.0. The routine automatically scales the epsilon for larger and smaller numbers, and for numbers close to zero.

The n-th root of a ddoublestd/num/ddouble/ddouble: V number x. n must be positive, and if n is even, then x must not be negative.

Round a ddoublestd/num/ddouble/ddouble: V to the nearest integer, rounding to the nearest even number in case of a tie.

Round a ddoublestd/num/ddouble/ddouble: V to a specified precision. Uses round if the precision is smaller or equal to zero.

Show a ddoublestd/num/ddouble/ddouble: V x with a given precision prec (=-31). The precision specifies the number of digits after the dot (in either scientific of fixed-point notation). If the precision is negative, at most prec digits are displayed, while for a positive precision, exactly prec digits behind the dot are displayed. This uses show-fixed when the exponent of x in scientific notation is larger than -5 and smaller than the precision (or 15 in case of a negative precision), otherwise it uses show-exp.

Show a ddouble x with a given precision prec (=-31) in scientific notation. The precision specifies the number of digits after the dot, i.e. the number of significant digits is prec+1. If the precision is negative, at most prec digits are displayed, and if it is positive exactly prec digits are used.

> 1.1.ddouble.show-exp
"1.1000000000000000888178419700125"
> 1.1.ddouble.show-exp(-100)
"1.100000000000000088817841970012523233890533447265625"
> 1.1.ddouble.show-exp(5)
"1.10000"
> 1.1.ddouble.show-exp(-5)
"1.1"
> 1.1.ddouble.show-exp
"1.1000000000000000888178419700125"
> 1.1.ddouble.show-exp(-100)
"1.100000000000000088817841970012523233890533447265625"
> 1.1.ddouble.show-exp(5)
"1.10000"
> 1.1.ddouble.show-exp(-5)
"1.1"

.

Show a ddouble x with a given precision prec (=-31) in fixed-point notation. The precision specifies the number of digits after the dot. If the precision is negative, at most prec digits after the dot are displayed, while for a positive precision, exactly prec digits are used.

> 0.1.ddouble.show-fixed
"0.1000000000000000055511151231258"
> 0.1.ddouble.show-fixed(-100)
"0.1000000000000000055511151231257827021181583404541015625"
> 0.1.ddouble.show-fixed(5)
"0.10000"
> 0.1.ddouble.show-fixed(-5)
"0.1"
> 0.1.ddouble.show-fixed
"0.1000000000000000055511151231258"
> 0.1.ddouble.show-fixed(-100)
"0.1000000000000000055511151231257827021181583404541015625"
> 0.1.ddouble.show-fixed(5)
"0.10000"
> 0.1.ddouble.show-fixed(-5)
"0.1"

.

Show a ddoublestd/num/ddouble/ddouble: V x precisely as the sum of two float64std/core/types/float64: Vs in hexadecimal notation. Use this if you need to guarantee that you can parse back ddoublestd/num/ddouble/ddouble: Vs exactly, i.e. x == x.show-hex.ddouble.

> 0.1.ddouble.show-hex
"0x1.999999999999Ap-4 + 0x0.0p+0"
> "0.1".ddouble.show-hex
"0x1.999999999999Ap-4 + -0x1.999999999999Ap-58"
> dd-pi.show-hex
"0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53"
> dd-max.show-hex
"0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969"
> 0.1.ddouble.show-hex
"0x1.999999999999Ap-4 + 0x0.0p+0"
> "0.1".ddouble.show-hex
"0x1.999999999999Ap-4 + -0x1.999999999999Ap-58"
> dd-pistd/num/ddouble/dd-pi: ddouble.show-hex
"0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53"
> dd-maxstd/num/ddouble/dd-max: ddouble.show-hex
"0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969"

.

Show a ddoublestd/num/ddouble/ddouble: V as the sum of float64std/core/types/float64: Vs with an optional precision. Note: use show-hex for reliable round-trip parsing.

The sine function of a given angle in radians.

Calculate sine and cosine on an angle in radians.

The tangent of a given angle in radians.

The hyperbolic tangent of x.

Ten (10.ddouble).

val dd-max-prec: intstd/core/types/int: V

Maximal precision in decimal digits of a ddoublestd/num/ddouble/ddouble: V.

fun parse-ddouble( s : stringstd/core/types/string: V ) : maybestd/core/types/maybe: V -> V<ddoublestd/num/ddouble/ddouble: V>
private import std/core/typesstd/core/types, std/core/hndstd/core/hnd, std/core/exnstd/core/exn, std/core/boolstd/core/bool, std/core/orderstd/core/order, std/core/charstd/core/char, std/core/intstd/core/int, std/core/vectorstd/core/vector, std/core/stringstd/core/string, std/core/sslicestd/core/sslice, std/core/liststd/core/list, std/core/maybestd/core/maybe, std/core/eitherstd/core/either, std/core/tuplestd/core/tuple, std/core/showstd/core/show, std/core/debugstd/core/debug, std/core/delayedstd/core/delayed, std/core/consolestd/core/console, std/corestd/core, std/core/undivstd/core/undiv, std/num/float64std/num/float64, std/num/decimalstd/num/decimal, std/text/parsestd/text/parse