Abstract Binary Block Order Rouen Transform

Transcription

Abstract Binary Block Order Rouen Transform
Abstract
Binary Block Order Rouen Transform
Jacqueline W. Daykin1,2 , Richard Groult3,4 , Yannick Guesnet4 , Thierry Lecroq4 ,
Arnaud Lefebvre4 , Martine Léonard4 , Élise Prieur-Gaston4
1
Department of Computer Science, Aberystwyth University (Mauritius Branch Campus), Quartier
Militaire, Mauritius
2
Department of Computer Science, Royal Holloway, University of London, UK
3
Modélisation, Information et Systèmes (MIS), Université de Picardie Jules Verne, Amiens, France
4
Normandie Univ., UNIROUEN, UNIHAVRE, INSA Rouen, LITIS, 76000 Rouen, France
Abstract
We introduce bijective Burrows-Wheeler type transforms for binary strings [1].
These twin transforms originated in problem solving sessions of the Rouen 2012
StringMasters workshop, hence the name Rouen Transform. The original method
by Burrows and Wheeler [2] is based on lexicographic order for general alphabets,
and the transform is defined to be the last column of the ordered BWT matrix.
The new approach applies binary block order, B-order, which yields not one, but
twin transforms: one based on Lyndon words, the other on a repetition of Lyndon
words. These binary B-BWT transforms are constructed here for B-words, analogous
structures to Lyndon words. A key computation in the transforms is the application
of a linear-time suffix-sorting technique, such as [3], to sort the cyclic rotations of
a binary input string into their B-order. Moreover, like the original lexicographic
transform, we show that computing the B-BWT inverses is also achieved in linear
time by using straightforward combinatorial arguments.
Some preliminary experimental results demonstrated that it may be worthwhile
in practice to implement the Rouen Transform as preprocessing for compression.
An obvious quest for future research is to devise a fully bijective linear transform
for binary block order over arbitrary inputs. If the given string is not a B-word, then
it should be factored into these patterned words.
It also remains to see if pattern matching can be efficiently performed using this
kind of transforms as it is the case with the usual Burrows-Wheeler transform.
References
[1]
J. W. Daykin, R. Groult, Y. Guesnet, T. Lecroq, A. Lefebvre, M. Léonard, and
É. Prieur-Gaston. Binary block order Rouen transform. Theoret. Comput. Sci.,
2016. DOI: 10.1016/j.tcs.2016.05.028.
[2]
M. Burrows and D. J. Wheeler. A block sorting lossless data compression
algorithm. Technical Report 124, Digital Equipment Corporation, 1994.
[3]
P. Ko and S. Aluru. Space efficient linear time construction of suffix arrays. In
Proc. 14th Annual Symposium on Combinatorial Pattern Matching (CPM), pages
200–210, 2003.
1