Better way to search DNA databases

Chinese scientist has devised a new empirical rule to index DNA sequences. NGram indexing has been known to computer science. Chinese characters are indexed using NGrams. However, Wang Liang, a computer scientist at, one of the search engines in China, found that the ideal NGram length to index for DNA words is 12 nucleotide character long (ATCG combinations).

