Due to the advent of 2nd generation sequencing technologies, exhaustive multi-genome comparisons of 100s or 1000s of human genomes will soon become a reality. This will have profound benefits for personalised disease diagnostics and therapeutics.
However, the informatics challenges presented by these technologies are already creating sequence mapping, assembly and analysis bottlenecks. As reads generated from these low cost ultra-high-throughput sequencing solutions are typically 25bp to 200bp, with error rates as high as 5%, genome assembly and analysis become exponentially more complex.
Synamatix has addressed many of these issues by using SynaBASE™, which is a scalable, high-throughput database solution that leverages sequence complexity and exhaustive word-based searching to yield optimum results. SynaBASE exhaustively identifies all k-mers within biological sequences, storing the data as k-mers structured on the basis of their inter-relationships. By using a search application built on top of SynaBASE, 1.68 million 120mer reads were mapped back to the human genome in 5 hours.
In a similar experiment to handle 25mer reads, a non-heuristic search strategy employing a scoring matrix for sequence quality was used. Mapping was achieved at an average rate of over 1,000 reads/sec back to a SynaBASE of the human genome. The sensitivity and performance improvements of several magnitudes of this approach over conventional tools such as MegaBLAST validate the potential technology fit between 2nd generation sequencers and SynaBASE.
|