Google for DNA' Brings Order to Biology's Big Data
Briefly

Google for DNA' Brings Order to Biology's Big Data
"They set a new standard for analysing raw biological data including DNA, RNA and protein sequences from databases that can contain millions of billions of DNA letters, amounting to petabases' of information, more entries than all the webpages in Google's vast index. Although MetaGraph is tagged as Google for DNA', Chikhi likens the tool to a search engine for YouTube, because the tasks are more computationally demanding."
"The motivation behind MetaGraph was to address an accessibility problem in sequencing data sets. The size of these repositories has risen at a blistering pace in the past few decades, but this growth has presented challenges for the scientists using the data they contain. Raw sequencing reads are fragmented, noisy and too numerous to search directly. The volume of the data, paradoxically, is the main inhibitor of us actually using the data, says Artem Babaian, a computational biologist at the University of Toronto in Canada."
MetaGraph is a search engine designed to index and query raw biological sequencing data including DNA, RNA and protein sequences. The tool can handle databases containing millions of billions of DNA letters, amounting to petabases of information, exceeding the number of entries in many web indexes. MetaGraph retrieves genetic patterns that are not explicitly annotated by searching noisy, fragmented raw reads across massive repositories. The system is computationally demanding but sets a new standard for analysing raw sequencing data. The explosive growth of public sequencing repositories has created accessibility challenges because raw reads are too numerous and noisy to search directly.
Read at www.nature.com
Unable to calculate read time
[
|
]