You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 25, 2019. It is now read-only.
(Originally reported by itkach on Feb 11, 2009 at BitBucket)
LZMA promises better compression ratio then gzip and bzip2 and faster than bzip2 decompression. Using LZMA in aard format to compress articles may result in smaller .aar files and better word lookup performance.
The text was updated successfully, but these errors were encountered:
(Commented by itkach on Feb 17, 2009 at BitBucket)
Initial evaluation didn't indicate any substantial improvements from using
LZMA compression. Compiled with LZMA , Simple English wiki 20081126 dump is
55Mb instead of 56 Mb, first volume of English Wikipedia 2337 is Mb instead of
2384 Mb - in both cases size is reduced only by ~ 2%. This is with pylzma
0.3. Decompression is also only marginally faster then bz2 - ~ 5% on
medium size articles (~15 Kb).
(Commented by anonymous on Mar 12, 2009 at BitBucket)
aha, I'm dissapointed, are you using the default compression or you use -9
i.e. maximum compression? And do you also compress using pyhton or python is
used just to decompress in the reader?
also i have found another python implementation, which seems to support also
the new format xz called pyliblzma
(Commented by itkach on Mar 16, 2009 at BitBucket)
pylzma was used both for compression and decompression, with default
compression parameters. I tried some variations, but defaults seemed to yield
best results.
I'll see if pyliblzma can do better. I wouldn't hold my breath though: each
article is compressed individually, so neither bzip2 nor lzma demonstrate the
same data compression ratios as with gigantic files. In fact, a significant
number of articles is just too short to benefit from any compression:
compressed text plus compression format headers is bigger than original
uncompressed text. LZMA compression not being part of Python standard library
is also a significant obstacle: adopting it would mean compiling and packaging
it for Windows and Maemo and possibly other platforms where it's not easy for
users to get or build binaries.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
(Originally reported by itkach on Feb 11, 2009 at BitBucket)
LZMA promises better compression ratio then gzip and bzip2 and faster than bzip2 decompression. Using LZMA in aard format to compress articles may result in smaller .aar files and better word lookup performance.
The text was updated successfully, but these errors were encountered: