log package:logfloat

Since the normal log throws an error on zero, we have to redefine it in order for things to work right. Arguing from limits we can see that log 0 == negativeInfinity. Newer versions of GHC have this behavior already, but older versions and Hugs do not. This function will raise an error when taking the log of negative numbers, rather than returning notANumber as the newer GHC implementation does. The reason being that typically this is a logical error, and notANumber allows the error to propagate silently. In order to improve portability, the Transfinite class is required to indicate that the Floating type does in fact have a representation for negative infinity. Both native floating types (Double and Float) are supported. If you define your own instance of Transfinite, verify the above equation holds for your 0 and negativeInfinity. If it doesn't, then you should avoid importing our log and will probably want converters to handle the discrepancy. For GHC, this version of log has rules for fusion with exp. These can give different behavior by preventing overflow to infinity and preventing errors for taking the logarithm of negative values. For Double and Float they can also give different answers due to eliminating floating point fuzz. The rules strictly improve mathematical accuracy, however they should be noted in case your code depends on the implementation details.
Log-domain floating point numbers This module presents a type for storing numbers in the log-domain. The main reason for doing this is to prevent underflow when multiplying many probabilities as is done in Hidden Markov Models. It is also helpful for preventing overflow.
Compute log (1 + x) without losing precision. Standard C libraries provide a special definition for this function, which is more accurate than doing the naive thing, especially for very small arguments. For example, the naive version underflows around 2 ** -53, whereas the specialized version underflows around 2 ** -1074. N.B. The statistics:Statistics.Math module provides a pure Haskell implementation of log1p for those who are interested. We do not copy it here because it relies on the vector package which is non-portable. If there is sufficient interest, a portable variant of that implementation could be made. Contact the maintainer if the FFI and naive implementations are insufficient for your needs. This installation was compiled to use the FFI version.
Constructor which does semantic conversion from normal-domain to log-domain. Throws errors on negative and NaN inputs. If p is non-negative, then following equivalence holds:
logFloat p == logToLogFloat (log p)
If p is NaN or negative, then the two sides differ only in which error is thrown.
Return the log-domain value itself without conversion.
Constructor which assumes the argument is already in the log-domain. Throws errors on notANumber inputs.
Compute log (1 - exp x) without losing precision.
Compute log (1 + exp x) without losing precision. Algebraically this is 0 ⊔ x, which is the log-domain's analogue of 1 + x.
O(n). Log-domain softmax, aka: (fmap log . softmax). N.B., this requires three passes over the data: two for the logSumExp, and a third for the normalization itself. Thus, it is not amenable to list fusion, and hence will use a lot of memory when summing long lists.
O(n). Log-domain summation, aka: (log . sum . fmap exp). Algebraically this is ⨆ xs, which is the log-domain equivalent of ∑ xs. N.B., this function requires two passes over the input. Thus, it is not amenable to list fusion, and hence will use a lot of memory when summing long lists.
The quantile function; aka, the inverse of sigmoid. > logit x = log (x / (1 - x)) > logit x = 2 * atanh (2*x - 1)
A variant of logit for when the argument is already in the log-domain; hence, logitExp = logit . exp
This module presents a type for storing numbers in the log-domain. The main reason for doing this is to prevent underflow when multiplying many small probabilities as is done in Hidden Markov Models and other statistical models often used for natural language processing. The log-domain also helps prevent overflow when multiplying many large numbers. In rare cases it can speed up numerical computation (since addition is faster than multiplication, though logarithms are exceptionally slow), but the primary goal is to improve accuracy of results. A secondary goal has been to maximize efficiency since these computations are frequently done within a O(n^3) loop. The LogFloat of this module is restricted to non-negative numbers for efficiency's sake, see Data.Number.LogFloat.Signed for doing signed log-domain calculations.
A LogFloat is just a Double with a special interpretation. The logFloat function is presented instead of the constructor, in order to ensure semantic conversion. At present the Show instance will convert back to the normal-domain, and hence will underflow at that point. This behavior may change in the future. At present, the Read instance parses things in the normal-domain and then converts them to the log-domain. Again, this behavior may change in the future. Because logFloat performs the semantic conversion, we can use operators which say what we mean rather than saying what we're actually doing to the underlying representation. That is, equivalences like the following are true[1] thanks to type-class overloading:
logFloat (p + q) == logFloat p + logFloat q
logFloat (p * q) == logFloat p * logFloat q
(Do note, however, that subtraction can and negation will throw errors: since LogFloat can only represent the positive half of Double. Num is the wrong abstraction to put at the bottom of the numeric type-class hierarchy; but alas, we're stuck with it.) Performing operations in the log-domain is cheap, prevents underflow, and is otherwise very nice for dealing with miniscule probabilities. However, crossing into and out of the log-domain is expensive and should be avoided as much as possible. In particular, if you're doing a series of multiplications as in lp * logFloat q * logFloat r it's faster to do lp * logFloat (q * r) if you're reasonably sure the normal-domain multiplication won't underflow; because that way you enter the log-domain only once, instead of twice. Also note that, for precision, if you're doing more than a few multiplications in the log-domain, you should use product rather than using (*) repeatedly. Even more particularly, you should avoid addition whenever possible. Addition is provided because sometimes we need it, and the proper implementation is not immediately apparent. However, between two LogFloats addition requires crossing the exp/log boundary twice; with a LogFloat and a Double it's three times, since the regular number needs to enter the log-domain first. This makes addition incredibly slow. Again, if you can parenthesize to do normal-domain operations first, do it!
  • 1 That is, true up-to underflow and floating point fuzziness. Which is, of course, the whole point of this module.
Semantically convert our log-domain value back into the normal-domain. Beware of overflow/underflow. The following equivalence holds (without qualification):
fromLogFloat == exp . logFromLogFloat