High-precision arithmetic for calculating N-gram probabilities

wren ng thornton wren at freegeek.org
Sun Oct 16 01:40:42 BST 2011

On 10/13/11 2:40 PM, Dan Maftei wrote:
> Yeah, log sums are actually a necessity when calculating perplexity. :)
> If I ever get around to profiling Rationals vs. Doubles I'll let people know
> what I found. But upon reflection, I see that they shouldn't make a
> significant difference in performance.

The big performance problem with Rationals is that they're stored in 
normal form. That means the implementation has to run gcd every time you 
manipulate them. If you're really interested in performance, then you'd 
want to try working with unnormalized ratios rather than using the 
Rational type. Of course, then you have to be careful about not letting 
the numbers get too large, otherwise the cost of working with full 
Integers will be the new performance sinkhole.

Live well,

More information about the NLP mailing list