The ddoublestd/num/ddouble/ddouble: V
type implements double double 128-bit floating point numbers
as a pair of IEEE float64std/core/types/float64: V
values. This extends the precision to 31 decimal digits
(versus 15 for float64std/core/types/float64: V
), but keeps the same range as
a float64std/core/types/float64: V
with a maximum value of about 1.8·10308. Because
float64std/core/types/float64: V
s usually have hardware support, a ddoublestd/num/ddouble/ddouble: V
is usually much faster
than arbitrary precision floating point numbers.
Internally a ddoublestd/num/ddouble/ddouble: V
d is represented as a pair of float64std/core/types/float64: V
s, hi and lo,
such that the number represented by d is hi+lo, where |lo| ≤ 0.5·ulp(hi).
Create a ddoublestd/num/ddouble/ddouble: V
from a float64std/core/types/float64: V
.
Create a ddoublestd/num/ddouble/ddouble: V
x
such that x
equals d
·10e
.
Create a ddoublestd/num/ddouble/ddouble: V
from an intstd/core/types/int: V
.
A ddoublestd/num/ddouble/ddouble: V
can represent integers precisely up to 30 digits.
If an integer is passed that is out of range an infinity is returned.
Create a ddoublestd/num/ddouble/ddouble: V
x
such that x
equals i
·10e
.
The square root of the sum of squares of a list of doubles. Prevents overflow for large numbers and uses Kahan-Babuškan-Neumaier summation for precision.
Return the sum of a list of doubles.
Uses Kahan-Babuškan-Neumaier summation
to minimize rounding errors. This
is more precise as Kahan summation and about as fast.
[1.0e3,1.0e97,1.0e3,-1.0e97].sum == 2000.0
A. Neumaier, Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen.
Math. Mechanik, 54:39–51, 1974.
Parse a floating point number with up to 31 digits precision.
Return dd-nanstd/num/ddouble/dd-nan: ddouble
if the string is an invalid number.
The square root of the sum of the squares of three doubles. Prevents overflow for large numbers.
Remainder of two ddoublestd/num/ddouble/ddouble: V
s.
Multiply two ddoublestd/num/ddouble/ddouble: V
s.
Add two ddoublestd/num/ddouble/ddouble: V
s.
Subtract two values.
Divide two ddoublestd/num/ddouble/ddouble: V
s.
Negate a ddoublestd/num/ddouble/ddouble: V
.
Return if two ddoublestd/num/ddouble/ddouble: V
s are nearly equal with respect to an epsilon
of 10*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble
.
See also nearly-eq
which takes an explicit epsilon
.
Return the absolute value.
The arc-cosine of x
. Returns the angle in radians.
The area hyperbolic cosine of x
.
The arc-sine of x
. Returns the angle in radians.
The area hyperbolic sine of x
.
The arc-tangent of x
. Returns the angle in radians.
The arc-tangent of a point (x
,y
). Returns the angle with respect to the x-axis in radians between -π and π.
The area hyperbolic tangent of x
.
Round to positive infinity.
Compare two ddoublestd/num/ddouble/ddouble: V
values.
The cosine function of a given angle in radians.
The hyperbolic cosine of x
.
The e constant.
The ‘machine epsilon’: this is not well-defined for a ddoublestd/num/ddouble/ddouble: V
in general since
the difference between 1.0 and the next representable ddoublestd/num/ddouble/ddouble: V
value is dd-true-minstd/num/ddouble/dd-true-min: ddouble
.
Instead, we take the square of flt-epsilonstd/num/float64/flt-epsilon: float64
, i.e. 2-104.
The natural logarithm of 10.
The natural logarithm of 2.
The base-10 logarithm of e.
The base-2 logarithm of e.
The maximum representable ddoublestd/num/ddouble/ddouble: V
.
The smallest positive ddoublestd/num/ddouble/ddouble: V
that is still normalized.
Not-A-Number.
Negative infinity.
π/2.
3π/4.
π/4.
Positive infinity.
1.0 / sqrt(2.0)
.
The square-root of 2.
The smallest positive ddoublestd/num/ddouble/ddouble: V
(which is subnormal).
Create a ddoublestd/num/ddouble/ddouble: V
as the sum of two float64std/core/types/float64: V
's.
Decrement by one.
Convert a ddoublestd/num/ddouble/ddouble: V
to a decimalstd/num/decimal/decimal: V
up to a given precision prec
(= -1
).
A negative precision converts precisely. Returns 0 for non-finite ddoublestd/num/ddouble/ddouble: V
's.
Decode a ddoublestd/num/ddouble/ddouble: V
d
into two doubles (hi,lo)
such that d
equals hi
+lo
,
where lo
≤ 0.5·ulp(hi
).
Division and remainder of two ddoublestd/num/ddouble/ddouble: V
s.
Encode a ddoublestd/num/ddouble/ddouble: V
d
from two doubles (hi,lo)
such that d
equals hi
+lo
.
Return e (dd-estd/num/ddouble/dd-e: ddouble
) to the power of x
.
Return 10 to the power of exp
.
Return 2 to the power of exp
.
Return exp(x - 1.0)
.
Avoids rounding errors for values of x
very close to 1.0
.
The floored fraction of x
. This is always positive, such that x.floor + x.ffraction == x
.
Convert a ddoublestd/num/ddouble/ddouble: V
to a float64std/core/types/float64: V
(losing precision).
Round to negative infinity.
The fraction of x
such that x.truncate + x.fraction == x
.
The hypotenuse of x
and y
: sqrt(x*x + y*y)
.
Prevents overflow for large numbers.
Increment by one.
Convert a ddoublestd/num/ddouble/ddouble: V
to the nearest integer (rounding to the nearest even number in case of a tie).
Is this a finite ddoublestd/num/ddouble/ddouble: V
? (i.e. not is-nan
or is-inf
).
Is this an infinite value.
Is this ddoublestd/num/ddouble/ddouble: V
not-a-number?
Is this ddoublestd/num/ddouble/ddouble: V
negative?
Does x
equal negative infinity?
Is this ddoublestd/num/ddouble/ddouble: V
positive?
Does x
equal positive infinity?
Return the sign of the ddoublestd/num/ddouble/ddouble: V
.
Is this ddoublestd/num/ddouble/ddouble: V
equal to is-zero.
‘Load exponent’: returns x
·2exp
.
The natural logarithm (in base e) of x
.
Return ln(1.0 + x)
.
Avoids potential imprecision for small x
where adding 1.0
explicitly
may lead to rounding errors.
Returns ln(exp(x) + exp(y))
.
Avoids overlow/underflow errors.
Return the logarithm in some base b
of a ddoublestd/num/ddouble/ddouble: V
x
.
The logarithm in base 10 of x
.
The logarithm in base 2 of x
.
Returns log2( exp2(x) + exp2(y) )
.
Avoids overlow/underflow errors.
The maximum of x
and y
.
The minimum of x
and y
.
Return if two ddoublestd/num/ddouble/ddouble: V
s are nearly equal with respect to some epsilon
(=8*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble
).
The epsilon is the nearest difference for numbers around 1.0. The routine automatically
scales the epsilon for larger and smaller numbers, and for numbers close to zero.
The n
-th root of a ddoublestd/num/ddouble/ddouble: V
number x
.
n
must be positive, and if n
is even, then
x
must not be negative.
One.
x
to the power of y
both as ddoublestd/num/ddouble/ddouble: V
.
Round a ddoublestd/num/ddouble/ddouble: V
to the nearest integer, rounding to the nearest even number in case of a tie.
Round a ddoublestd/num/ddouble/ddouble: V
to a specified precision.
Uses round
if the precision is smaller or equal to zero.
Show a ddoublestd/num/ddouble/ddouble: V
x
with a given precision prec
(=-31
).
The precision specifies the number of digits after the dot (in either scientific of fixed-point notation).
If the precision is negative, at most prec
digits are displayed, while for a positive
precision, exactly prec
digits behind the dot are displayed.
This uses show-fixed
when the exponent of x
in scientific notation is larger than -5
and smaller than the precision (or 15 in case of a negative precision), otherwise it uses show-exp
.
Show a ddouble x
with a given precision prec
(=-31
) in scientific notation.
The precision specifies the number of digits after the dot, i.e.
the number of significant digits is prec+1
.
If the precision is negative, at most prec
digits are displayed, and if
it is positive exactly prec
digits are used.
> 1.1.ddouble.show-exp "1.1000000000000000888178419700125" > 1.1.ddouble.show-exp(-100) "1.100000000000000088817841970012523233890533447265625" > 1.1.ddouble.show-exp(5) "1.10000" > 1.1.ddouble.show-exp(-5) "1.1" > 1.1.ddouble.show-exp "1.1000000000000000888178419700125" > 1.1.ddouble.show-exp(-100) "1.100000000000000088817841970012523233890533447265625" > 1.1.ddouble.show-exp(5) "1.10000" > 1.1.ddouble.show-exp(-5) "1.1"
.
Show a ddouble x
with a given precision prec
(=-31
) in fixed-point notation.
The precision specifies the number of digits after the dot.
If the precision is negative, at most prec
digits after the dot are displayed,
while for a positive precision, exactly prec
digits are used.
> 0.1.ddouble.show-fixed "0.1000000000000000055511151231258" > 0.1.ddouble.show-fixed(-100) "0.1000000000000000055511151231257827021181583404541015625" > 0.1.ddouble.show-fixed(5) "0.10000" > 0.1.ddouble.show-fixed(-5) "0.1" > 0.1.ddouble.show-fixed "0.1000000000000000055511151231258" > 0.1.ddouble.show-fixed(-100) "0.1000000000000000055511151231257827021181583404541015625" > 0.1.ddouble.show-fixed(5) "0.10000" > 0.1.ddouble.show-fixed(-5) "0.1"
.
Show a ddoublestd/num/ddouble/ddouble: V
x
precisely as the sum of two float64std/core/types/float64: V
s
in hexadecimal notation.
Use this if you need to guarantee that you can parse back ddoublestd/num/ddouble/ddouble: V
s exactly,
i.e. x == x.show-hex.ddouble
.
> 0.1.ddouble.show-hex "0x1.999999999999Ap-4 + 0x0.0p+0" > "0.1".ddouble.show-hex "0x1.999999999999Ap-4 + -0x1.999999999999Ap-58" > dd-pi.show-hex "0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53" > dd-max.show-hex "0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969" > 0.1.ddouble.show-hex "0x1.999999999999Ap-4 + 0x0.0p+0" > "0.1".ddouble.show-hex "0x1.999999999999Ap-4 + -0x1.999999999999Ap-58" > dd-pistd/num/ddouble/dd-pi: ddouble.show-hex "0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53" > dd-maxstd/num/ddouble/dd-max: ddouble.show-hex "0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969"
.
Show a ddoublestd/num/ddouble/ddouble: V
as the sum of float64std/core/types/float64: V
s with an optional precision.
Note: use show-hex
for reliable round-trip parsing.
The sine function of a given angle in radians.
Calculate sine and cosine on an angle in radians.
The hyperbolic sine of x
.
Multiply x
with itself.
The square root of a non-negative ddoublestd/num/ddouble/ddouble: V
x
.
For negative x
, dd-nanstd/num/ddouble/dd-nan: ddouble
is returned.
The tangent of a given angle in radians.
The hyperbolic tangent of x
.
Ten (10.ddouble
).
Round towards zero.
Return x
with the sign of y
.
Zero constant.
Maximal precision in decimal digits of a ddoublestd/num/ddouble/ddouble: V
.
128-bit double-double floating point numbers.
The
ddoublestd/num/ddouble/ddouble: V
type implements double double 128-bit floating point numbers as a pair of IEEEfloat64std/core/types/float64: V
values. This extends the precision to 31 decimal digits (versus 15 forfloat64std/core/types/float64: V
), but keeps the same range as afloat64std/core/types/float64: V
with a maximum value of about 1.8·10308 (dd-maxstd/num/ddouble/dd-max: ddouble
). Becausefloat64std/core/types/float64: V
s usually have hardware support, addoublestd/num/ddouble/ddouble: V
is usually much faster than arbitrary precision floating point numbers.Precision
If you add two regular 64-bit
float64std/core/types/float64: V
values you can quickly notice the imprecision due to decimal fractions that cannot be represented precisely. For example:This happens because the decimal
0.1
and0.2
cannot be represented exactly by afloat64std/core/types/float64: V
which is encoded in base 2. For example, if we show0.1
to 20 digits, we get:When we convert the constant
0.1
to addoublestd/num/ddouble/ddouble: V
we can see with even more precision how it is approximated:However, if we convert the number
0.1
directly to addoublestd/num/ddouble/ddouble: V
(instead of going through afloat64std/core/types/float64: V
constant), we can represent0.1
more precisely:This is possible because a
ddoublestd/num/ddouble/ddouble: V
uses twofloat64std/core/types/float64: V
s to represent numbers; in this case the first (hi) float64 is (the inexact)0.1
while the second one (lo) is the correction to that number. Theshow-sumstd/num/ddouble/show-sum: (x : ddouble, prec : ? int) -> string
(orshow-hex
) function shows this internal pair of numbers:Generally, a
ddoublestd/num/ddouble/ddouble: V
d is represented by a pair offloat64std/core/types/float64: V
s, hi and lo, such that d equals hi+lo, where |lo| ≤ 0.5·ulp(hi). Note that despite the extra precision, underflow/overflow and machine precision are not as well-defined as with a regularfloat64std/core/types/float64: V
[1, 6, 8]. Nevertheless, using addoublestd/num/ddouble/ddouble: V
often prevents instability in practice overfloat64std/core/types/float64: V
computations due to rounding errors or combining very large and small numbers.Take the “thin triangle” problem for example [2, 7]. The challenge is to compute the area of a very thin triangle with sides , , and . The short sides and are just 3 units in the last place (ulp) shorter than the longest side . Using the value , the area of a triangle is . This is troublesome if is close to since magnifies any rounding error. Let's take , and ; according to [3], “If the units are in light-years, then the short sides are only longer than half the long side by th the diameter of a proton. Yet that pops the triangle up to about the width of a doorway at the thickest point”. Using various 128-bit numbers we get:
For this kind of example, a
ddoublestd/num/ddouble/ddouble: V
has better precision than a regular 128-bit IEEE float since it can combine very large and small values. Note that Kahan [7] shows how to rewrite the area equation to avoid magnifying rounding errors – in that case the result for IEEE 128-bit floats becomes:The implementation is based closely on the excellent QD C++ library [4, 5], and assumes proper 64-bit IEEE
float64std/core/types/float64: V
s with correct rounding. Integers can be represented precisely up to 30 decimal digits (and a bit more… up to 2106 - 2).References
.