Abstract Computations and Applications of Minimal Absent Words
Transcription
Abstract Computations and Applications of Minimal Absent Words
Abstract Computations and Applications of Minimal Absent Words Carl Barton1 , Alice Heliou2,3 *,Laurent Mouchard4 , Solon P. Pissis 5 1 The Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, UK 2 Laboratoire d’Informatique de l’École Polytechnique (LIX), CNRS UMR 7161, France 3 Inria Saclay-Île de France, AMIB, Bâtiment Alan Turing, France 4 University of Rouen, LITIS EA 4108, TIBS, Rouen, France 5 Department of Informatics, King’s College London, London, UK *Corresponding author: [email protected] Abstract An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an O(n)-time and O(n)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix array [1]. Then two other algorithms were designed: one that can be executed in parallel [2], whose implementation achieves near-optimal speed-ups. An other that can run with as little as 1 GB of RAM (Heliou et al. writing in process). Thus, minimal absent words can be computed for the whole human genome on a desktop computer in a few hours. Here, we will briefly present these algorithms and explain the different applications of minimal absent words. References [1] C. Barton, A. Heliou, L. Mouchard, and S. P. Pissis. Linear-time computation of minimal absent words using suffix array. BMC Bioinformatics, 15:388, 2014. [2] C. Barton, A. Heliou, L. Mouchard, and S. P. Pissis. Parallelising the computation of minimal absent words. In 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), 2015. 1