The ddoublestd/num/ddouble/ddouble: V type implements double double 128-bit floating point numbers
as a pair of IEEE float64std/core/types/float64: V values. This extends the precision to 31 decimal digits
(versus 15 for float64std/core/types/float64: V), but keeps the same range as
a float64std/core/types/float64: V with a maximum value of about 1.8·10308. Because
float64std/core/types/float64: Vs usually have hardware support, a ddoublestd/num/ddouble/ddouble: V is usually much faster
than arbitrary precision floating point numbers.
Internally a ddoublestd/num/ddouble/ddouble: V d is represented as a pair of float64std/core/types/float64: Vs, hi and lo,
such that the number represented by d is hi+lo, where |lo| ≤ 0.5·ulp(hi).
Create a ddoublestd/num/ddouble/ddouble: V from a float64std/core/types/float64: V.
Create a ddoublestd/num/ddouble/ddouble: V x such that x equals d·10e.
Create a ddoublestd/num/ddouble/ddouble: V from an intstd/core/types/int: V.
A ddoublestd/num/ddouble/ddouble: V can represent integers precisely up to 30 digits.
If an integer is passed that is out of range an infinity is returned.
Create a ddoublestd/num/ddouble/ddouble: V x such that x equals i·10e.
The square root of the sum of squares of a list of doubles. Prevents overflow for large numbers and uses Kahan-Babuškan-Neumaier summation for precision.
Return the sum of a list of doubles.
Uses Kahan-Babuškan-Neumaier summation
to minimize rounding errors. This
is more precise as Kahan summation and about as fast.
[1.0e3,1.0e97,1.0e3,-1.0e97].sum == 2000.0
A. Neumaier, Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen.
Math. Mechanik, 54:39–51, 1974.
Parse a floating point number with up to 31 digits precision.
Return dd-nanstd/num/ddouble/dd-nan: ddouble if the string is an invalid number.
The square root of the sum of the squares of three doubles. Prevents overflow for large numbers.
Remainder of two ddoublestd/num/ddouble/ddouble: Vs.
Multiply two ddoublestd/num/ddouble/ddouble: Vs.
Add two ddoublestd/num/ddouble/ddouble: Vs.
Subtract two values.
Divide two ddoublestd/num/ddouble/ddouble: Vs.
Negate a ddoublestd/num/ddouble/ddouble: V.
Return if two ddoublestd/num/ddouble/ddouble: Vs are nearly equal with respect to an epsilon of 10*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble.
See also nearly-eq which takes an explicit epsilon.
Return the absolute value.
The arc-cosine of x. Returns the angle in radians.
The area hyperbolic cosine of x.
The arc-sine of x. Returns the angle in radians.
The area hyperbolic sine of x.
The arc-tangent of x. Returns the angle in radians.
The arc-tangent of a point (x,y). Returns the angle with respect to the x-axis in radians between -π and π.
The area hyperbolic tangent of x.
Round to positive infinity.
Compare two ddoublestd/num/ddouble/ddouble: V values.
The cosine function of a given angle in radians.
The hyperbolic cosine of x.
The e constant.
The ‘machine epsilon’: this is not well-defined for a ddoublestd/num/ddouble/ddouble: V in general since
the difference between 1.0 and the next representable ddoublestd/num/ddouble/ddouble: V value is dd-true-minstd/num/ddouble/dd-true-min: ddouble.
Instead, we take the square of flt-epsilonstd/num/float64/flt-epsilon: float64, i.e. 2-104.
The natural logarithm of 10.
The natural logarithm of 2.
The base-10 logarithm of e.
The base-2 logarithm of e.
The maximum representable ddoublestd/num/ddouble/ddouble: V.
The smallest positive ddoublestd/num/ddouble/ddouble: V that is still normalized.
Not-A-Number.
Negative infinity.
π/2.
3π/4.
π/4.
Positive infinity.
1.0 / sqrt(2.0).
The square-root of 2.
The smallest positive ddoublestd/num/ddouble/ddouble: V (which is subnormal).
Create a ddoublestd/num/ddouble/ddouble: V as the sum of two float64std/core/types/float64: V's.
Decrement by one.
Convert a ddoublestd/num/ddouble/ddouble: V to a decimalstd/num/decimal/decimal: V up to a given precision prec (= -1).
A negative precision converts precisely. Returns 0 for non-finite ddoublestd/num/ddouble/ddouble: V's.
Decode a ddoublestd/num/ddouble/ddouble: V d into two doubles (hi,lo) such that d equals hi+lo,
where lo ≤ 0.5·ulp(hi).
Division and remainder of two ddoublestd/num/ddouble/ddouble: Vs.
Encode a ddoublestd/num/ddouble/ddouble: V d from two doubles (hi,lo) such that d equals hi+lo.
Return e (dd-estd/num/ddouble/dd-e: ddouble) to the power of x.
Return 10 to the power of exp.
Return 2 to the power of exp.
Return exp(x - 1.0).
Avoids rounding errors for values of x very close to 1.0.
The floored fraction of x. This is always positive, such that x.floor + x.ffraction == x.
Convert a ddoublestd/num/ddouble/ddouble: V to a float64std/core/types/float64: V (losing precision).
Round to negative infinity.
The fraction of x such that x.truncate + x.fraction == x.
The hypotenuse of x and y: sqrt(x*x + y*y).
Prevents overflow for large numbers.
Increment by one.
Convert a ddoublestd/num/ddouble/ddouble: V to the nearest integer (rounding to the nearest even number in case of a tie).
Is this a finite ddoublestd/num/ddouble/ddouble: V? (i.e. not is-nan or is-inf).
Is this an infinite value.
Is this ddoublestd/num/ddouble/ddouble: V not-a-number?
Is this ddoublestd/num/ddouble/ddouble: V negative?
Does x equal negative infinity?
Is this ddoublestd/num/ddouble/ddouble: V positive?
Does x equal positive infinity?
Return the sign of the ddoublestd/num/ddouble/ddouble: V.
Is this ddoublestd/num/ddouble/ddouble: V equal to is-zero.
‘Load exponent’: returns x·2exp.
The natural logarithm (in base e) of x.
Return ln(1.0 + x).
Avoids potential imprecision for small x where adding 1.0 explicitly
may lead to rounding errors.
Returns ln(exp(x) + exp(y)).
Avoids overlow/underflow errors.
Return the logarithm in some base b of a ddoublestd/num/ddouble/ddouble: V x.
The logarithm in base 10 of x.
The logarithm in base 2 of x.
Returns log2( exp2(x) + exp2(y) ).
Avoids overlow/underflow errors.
The maximum of x and y.
The minimum of x and y.
Return if two ddoublestd/num/ddouble/ddouble: Vs are nearly equal with respect to some epsilon (=8*dd-epsilonstd/num/ddouble/dd-epsilon: ddouble).
The epsilon is the nearest difference for numbers around 1.0. The routine automatically
scales the epsilon for larger and smaller numbers, and for numbers close to zero.
The n-th root of a ddoublestd/num/ddouble/ddouble: V number x.
n must be positive, and if n is even, then
x must not be negative.
One.
x to the power of y both as ddoublestd/num/ddouble/ddouble: V.
Round a ddoublestd/num/ddouble/ddouble: V to the nearest integer, rounding to the nearest even number in case of a tie.
Round a ddoublestd/num/ddouble/ddouble: V to a specified precision.
Uses round if the precision is smaller or equal to zero.
Show a ddoublestd/num/ddouble/ddouble: V x with a given precision prec (=-31).
The precision specifies the number of digits after the dot (in either scientific of fixed-point notation).
If the precision is negative, at most prec digits are displayed, while for a positive
precision, exactly prec digits behind the dot are displayed.
This uses show-fixed when the exponent of x in scientific notation is larger than -5
and smaller than the precision (or 15 in case of a negative precision), otherwise it uses show-exp.
Show a ddouble x with a given precision prec (=-31) in scientific notation.
The precision specifies the number of digits after the dot, i.e.
the number of significant digits is prec+1.
If the precision is negative, at most prec digits are displayed, and if
it is positive exactly prec digits are used.
> 1.1.ddouble.show-exp "1.1000000000000000888178419700125" > 1.1.ddouble.show-exp(-100) "1.100000000000000088817841970012523233890533447265625" > 1.1.ddouble.show-exp(5) "1.10000" > 1.1.ddouble.show-exp(-5) "1.1" > 1.1.ddouble.show-exp "1.1000000000000000888178419700125" > 1.1.ddouble.show-exp(-100) "1.100000000000000088817841970012523233890533447265625" > 1.1.ddouble.show-exp(5) "1.10000" > 1.1.ddouble.show-exp(-5) "1.1"
.
Show a ddouble x with a given precision prec (=-31) in fixed-point notation.
The precision specifies the number of digits after the dot.
If the precision is negative, at most prec digits after the dot are displayed,
while for a positive precision, exactly prec digits are used.
> 0.1.ddouble.show-fixed "0.1000000000000000055511151231258" > 0.1.ddouble.show-fixed(-100) "0.1000000000000000055511151231257827021181583404541015625" > 0.1.ddouble.show-fixed(5) "0.10000" > 0.1.ddouble.show-fixed(-5) "0.1" > 0.1.ddouble.show-fixed "0.1000000000000000055511151231258" > 0.1.ddouble.show-fixed(-100) "0.1000000000000000055511151231257827021181583404541015625" > 0.1.ddouble.show-fixed(5) "0.10000" > 0.1.ddouble.show-fixed(-5) "0.1"
.
Show a ddoublestd/num/ddouble/ddouble: V x precisely as the sum of two float64std/core/types/float64: Vs
in hexadecimal notation.
Use this if you need to guarantee that you can parse back ddoublestd/num/ddouble/ddouble: Vs exactly,
i.e. x == x.show-hex.ddouble.
> 0.1.ddouble.show-hex "0x1.999999999999Ap-4 + 0x0.0p+0" > "0.1".ddouble.show-hex "0x1.999999999999Ap-4 + -0x1.999999999999Ap-58" > dd-pi.show-hex "0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53" > dd-max.show-hex "0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969" > 0.1.ddouble.show-hex "0x1.999999999999Ap-4 + 0x0.0p+0" > "0.1".ddouble.show-hex "0x1.999999999999Ap-4 + -0x1.999999999999Ap-58" > dd-pistd/num/ddouble/dd-pi: ddouble.show-hex "0x1.921FB54442D18p+1 + 0x1.1A62633145C07p-53" > dd-maxstd/num/ddouble/dd-max: ddouble.show-hex "0x1.FFFFFFFFFFFFFp+1023 + 0x1.FFFFFFFFFFFFFp+969"
.
Show a ddoublestd/num/ddouble/ddouble: V as the sum of float64std/core/types/float64: Vs with an optional precision.
Note: use show-hex for reliable round-trip parsing.
The sine function of a given angle in radians.
Calculate sine and cosine on an angle in radians.
The hyperbolic sine of x.
Multiply x with itself.
The square root of a non-negative ddoublestd/num/ddouble/ddouble: V x.
For negative x, dd-nanstd/num/ddouble/dd-nan: ddouble is returned.
The tangent of a given angle in radians.
The hyperbolic tangent of x.
Ten (10.ddouble).
Round towards zero.
Return x with the sign of y.
Zero constant.
Maximal precision in decimal digits of a ddoublestd/num/ddouble/ddouble: V.
128-bit double-double floating point numbers.
The
ddoublestd/num/ddouble/ddouble: Vtype implements double double 128-bit floating point numbers as a pair of IEEEfloat64std/core/types/float64: Vvalues. This extends the precision to 31 decimal digits (versus 15 forfloat64std/core/types/float64: V), but keeps the same range as afloat64std/core/types/float64: Vwith a maximum value of about 1.8·10308 (dd-maxstd/num/ddouble/dd-max: ddouble). Becausefloat64std/core/types/float64: Vs usually have hardware support, addoublestd/num/ddouble/ddouble: Vis usually much faster than arbitrary precision floating point numbers.Precision
If you add two regular 64-bit
float64std/core/types/float64: Vvalues you can quickly notice the imprecision due to decimal fractions that cannot be represented precisely. For example:This happens because the decimal
0.1and0.2cannot be represented exactly by afloat64std/core/types/float64: Vwhich is encoded in base 2. For example, if we show0.1to 20 digits, we get:When we convert the constant
0.1to addoublestd/num/ddouble/ddouble: Vwe can see with even more precision how it is approximated:However, if we convert the number
0.1directly to addoublestd/num/ddouble/ddouble: V(instead of going through afloat64std/core/types/float64: Vconstant), we can represent0.1more precisely:This is possible because a
ddoublestd/num/ddouble/ddouble: Vuses twofloat64std/core/types/float64: Vs to represent numbers; in this case the first (hi) float64 is (the inexact)0.1while the second one (lo) is the correction to that number. Theshow-sumstd/num/ddouble/show-sum: (x : ddouble, prec : ? int) -> string(orshow-hex) function shows this internal pair of numbers:Generally, a
ddoublestd/num/ddouble/ddouble: Vd is represented by a pair offloat64std/core/types/float64: Vs, hi and lo, such that d equals hi+lo, where |lo| ≤ 0.5·ulp(hi). Note that despite the extra precision, underflow/overflow and machine precision are not as well-defined as with a regularfloat64std/core/types/float64: V[1, 6, 8]. Nevertheless, using addoublestd/num/ddouble/ddouble: Voften prevents instability in practice overfloat64std/core/types/float64: Vcomputations due to rounding errors or combining very large and small numbers.Take the “thin triangle” problem for example [2, 7]. The challenge is to compute the area of a very thin triangle with sides , , and . The short sides and are just 3 units in the last place (ulp) shorter than the longest side . Using the value , the area of a triangle is . This is troublesome if is close to since magnifies any rounding error. Let's take , and ; according to [3], “If the units are in light-years, then the short sides are only longer than half the long side by th the diameter of a proton. Yet that pops the triangle up to about the width of a doorway at the thickest point”. Using various 128-bit numbers we get:
For this kind of example, a
ddoublestd/num/ddouble/ddouble: Vhas better precision than a regular 128-bit IEEE float since it can combine very large and small values. Note that Kahan [7] shows how to rewrite the area equation to avoid magnifying rounding errors – in that case the result for IEEE 128-bit floats becomes:The implementation is based closely on the excellent QD C++ library [4, 5], and assumes proper 64-bit IEEE
float64std/core/types/float64: Vs with correct rounding. Integers can be represented precisely up to 30 decimal digits (and a bit more… up to 2106 - 2).References
.