SeqAn3 – a modern C++ library for efficient sequence analysis.

Location: 
HS07
Organizers: 

Rene Rahn⁽¹⁾, Hannes Hauswedell⁽¹⁾, Svenja Mehringer⁽¹⁾ and Knut Reinert⁽¹⁾
1 Freie Universität Berlin, Germany

Description of the Tutorial:

Technological advances in sequencing and computer science have made it possible to generate enormous volumes of data in continuously decreasing time intervals, demanding highly efficient and fast algorithms as well as supporting data structures for their analysis. However, efficient and robust implementation and maintenance of these is difficult and thus can become a critical bottleneck for the cost-effectiveness of many research projects.

To resolve this bottleneck, we developed SeqAn, a general purpose and generic C++ software library that focuses on sequence analysis. SeqAn contains a wide range of accelerated and efficient algorithms, all important data structures as well as efficient routines for file I/O in various formats. We recently started a major redesign of the library under the name SeqAn3, which is based on modern features and improvements from C++17/20. The goal of this endeavour is to simplify the programming interface to provide a much easier access to complex and system dependent algorithms and data structures.

In this de.NBI/ELIXIR hands-on tutorial, we will demonstrate the supremacy to other “bio”-packages and programming languages and convince you of, both the simplicity of our new API and the gains in performance. As a showcase, we will implement a read mapper using SeqAn3 and show how application development can be simplified with our software.

Software/Data Requirements: 

This tutorial is mostly suited for computational biologist and bioinformaticians with research focus on sequence analysis (e.g., genomics, metagenomics, assembly, read alignment, variant detection, etc.) Attendees should have an intermediate knowledge in programming. Some basic C++- knowledge is strongly recommended. Attendees must bring their own laptop. Software for the tutorial can be installed beforehand, but we will also dedicate some extra time for installing required software during the tutorial. The following systems/software is required:

• MacOS (g++-7 or higher) or Linux or BSD (g++-7 or higher)
• Git
• Cmake-3.0 or higher
• [optional] VirtualBox (We will provide a fully integrated Ubuntu VM with all necessary software preinstalled)

References: 

Rahn, R., Budach, S., Costanza, P., Ehrhardt, M., Hancox, J. and Reinert, K., 2018. Generic accelerated sequence alignment in SeqAn using vectorization and multi- threading. Bioinformatics, bty380.

Reinert, K., Dadi, T.H., Ehrhardt, M., Hauswedell, H., Mehringer, S., Rahn, R., Kim, J., Pockrandt, C., Winkler, J., Siragusa, E. and Urgese, G., 2017. The SeqAn C++ template library for efficient sequence analysis: a resource for programmers. Journal of biotechnology, 261, pp.157-168.

Rahn, R., Weese, D. and Reinert, K., 2014. Journaled string tree—a scalable data structure for analyzing thousands of similar genomes on your laptop. Bioinformatics, 30(24), pp.3499-3505.