[−][src]Module compress::bwt
BWT (Burrows-Wheeler Transform) forward and backward transformation. Requires bwt
feature, enabled by default
This module contains a bruteforce implementation of BWT encoding in Rust as well as standard decoding.
These are exposed as a standard Reader
and Writer
interfaces wrapping an underlying stream.
BWT output stream places together symbols with similar leading contexts. This reshaping of the entropy allows further stages to deal with repeated sequences of symbols for better compression.
Typical compression schemes are: BWT + RLE (+ EC) RLE + BWT + MTF + RLE + EC : bzip2 BWT + DC + EC : ybs
Where the stage families are: BWT: BWT (Burrows-Wheeler Transform), ST (Shindler transform) RLE: RLE (Run-Length Encoding) MTF: MTF (Move-To-Front), WFC (Weighted Frequency Coding) DC: DC (Distance Coding), IF (Inverse Frequencies) EC (Entropy Coder): Huffman, Arithmetic, RC (Range Coder)
Example
use std::io::{BufWriter, BufReader, Read, Write}; use compress::bwt; // Encode some text let text = "some text"; let mut e = bwt::Encoder::new(BufWriter::new(Vec::new()), 4 << 20); e.write(text.as_bytes()).unwrap(); let (encoded, _) = e.finish(); let inner = encoded.into_inner().unwrap(); // Decode the encoded text let mut d = bwt::Decoder::new(BufReader::new(&inner[..]), true); let mut decoded = Vec::new(); d.read_to_end(&mut decoded).unwrap(); assert_eq!(&decoded[..], text.as_bytes());
Credit
This is an original (mostly trivial) implementation.
Modules
dc | DC (Distance Coding) forward and backward transformation. Designed to be used on BWT block output for compression. |
mtf | MTF (Move To Front) encoder/decoder Produces a rank for each input character based on when it was seen last time. Useful for BWT output encoding, which produces a lot of zeroes and low ranks. |
Structs
Decoder | This structure is used to decode a stream of BWT blocks. This wraps an internal reader which is read from when this decoder's read method is called. |
Encoder | This structure is used to compress a stream of bytes using the BWT. This is a wrapper around an internal writer which bytes will be written to. |
InverseIterator | An iterator over inverse BWT Run time: O(N), memory: N words (table) |
Radix | Radix sorting primitive |
TransformIterator | An iterator over BWT output |
Constants
ALPHABET_SIZE |
Functions
compute_inversion_table | Compute an inversion jump table, needed for BWT decoding |
compute_suffixes | Compute a suffix array from a given input string Resulting suffixes are guaranteed to be alphabetically sorted Run time: O(N^3), memory: N words (suf_array) + ALPHABET_SIZE words (Radix) |
decode | Decode a BWT block, given it's origin, and using 'table' temporarily |
decode_simple | A simplified BWT decode function, which allocates a temporary suffix array |
encode | Encode BWT of a given input, using the 'suf_array' |
encode_simple | Transform an input block into the output slice, all-inclusive version. Returns the index of the original string in the output matrix. |
Type Definitions
Symbol | A base element for the transformation |