High-precision arithmetic for calculating N-gram probabilities
wren ng thornton
wren at freegeek.org
Mon Oct 17 20:55:44 BST 2011
On 10/16/11 8:26 AM, Dan Maftei wrote:
> Further, do we often sum probabilities in NLP?
Depends who "we" are I suppose. To give an example, consider the
approach of using HMMs for tagging.
If we're only ever interested in the most likely tag sequence (or the
probability thereof) then we'll use the Viterbi algorithm which is using
the tropical semiring ---i.e., the (max,+) semiring for log-probs; or
the exponential of the tropical semiring, (max,*), for normal probabilities.
However, if we're interested in the probability of all possible tag
sequences (which is useful for getting the n-best sequences) then we'll
use the forward--backward algorithm which is using the (+,*) semiring
for normal probabilities and (logSum,+) for log-probs.
This is pretty common in all domains of statistical NLP: some problems
use (max,*)/(max,+) whereas other problems use (+,*)/(logSum,+). It just
depends on whether you want only the best widget or whether you want
information about all the widgets. So, how often addition is used will
depend on the specific problems you choose to tackle and your approach
to tackling them.
> 2 I'm unclear what log (_ + 1) and (exp _) - 1 refer to,
I mean the functions (\x -> log (1+x)) and (\x -> exp x - 1). They're
inaccurate for very small x because of details about how Float/Double
are represented. The details aren't too important, but just think about
what happens when we add a really small number to 1. There are many
values where 1+x == 1 because we can't represent 1+x with the available
precision and 1 is the closest thing we can represent.
Suffice it to say that (\x -> log (1+x)) will underflow around x=2**-53
for Double, whereas we could implement a denotationally equivalent
function (\x -> log1p x) which only underflows around x=2**-1074 for
Double. The same idea goes for the naive (\x -> exp x - 1) vs a
denotationally equivalent implementation (\x -> expm1 x).
N.B., (log1p . expm1) == id, and (expm1 . log1p) == id
> Same goes for what a and b represent in logSum (== a + log(1+ exp(b-a)).
logSum :: Double -> Double -> Double
logSum a b = a + log (1 + exp (b-a))
that's the basic idea anyways. As I said, we actually want to check
whether a or b is bigger; also we'll want to check to make sure it does
the right thing when a or b is infinite; blah blah blah.
> If I want to multiply
> probabilities, I use exp $ foldl' (+) 0 (map log probs), which is equivalent
> to foldl' (*) 1 probs. Is there something inaccurate or incorrect about
Well, when you do the exp at the end you'll underflow just the same as
if you did (foldl' (*) 1 probs). In general, once you put things into
the log-domain you never want to take them back out, because you can
lose information (underflow) when taking them out.
But the real question is: how do I translate (foldl' (+) 0 probs) into
the log domain? Getting (log 0) isn't too hard, though there are
portability issues to beware of since some Haskell implementations will
throw an error rather than give you negative infinity back. And we also
have to figure out a function (log (+))--- i.e., figure out what
normal-domain addition becomes when we pass it through the logarithmic
homomorphism. The answer is logSum.
More generally, the logarithmic and exponential homomorphisms preserve
the ordering structure, the monoid structures, and the semiring
structures (and probably other structures too); all according to the
-- On types:
log NonnegativeReal = Real
log ZeroToOneReal = NonpositiveReal
-- On values:
log 0 = negativeInfinity
log 1 = 0
log positiveInfinity = positiveInfinity
-- On functions:
log (*) = (+)
log (+) = logSum
log max = max
log min = min
-- On types:
exp Real = NonnegativeReal
exp NonpositiveReal = ZeroToOneReal
-- On values:
exp negativeInfinity = 0
exp 0 = 1
exp positiveInfinity = positiveInfinity
-- On functions:
exp (+) = (*)
exp logSum = (+)
exp max = max
exp min = min
We can extend this homomorphism to types other than the reals. We just
need for the type to support multiple monoids or semirings in a way that
matches the above. E.g., any type with a semiring will have
logarithmic/exponential monoid-homomorphisms between the additive monoid
and the multiplicative monoid of that semiring.
More information about the NLP