Nanopore sequencing is emerging as a powerful tool for storing digital information in DNA molecules. This technique offers several advantages over traditional methods, making it an attractive area of research. In this work, we focus on a simplified model of the nanopore sequencing process, represented as a channel.
This channel operates by taking a DNA sequence and analyzing it one segment at a time. It uses a sliding window of a specific length, denoted by ℓ, to scan the sequence. The window is then shifted by δ characters, and the process repeats. The output of the channel called the read vector, is a collection of values where each value represents the sum of the bases (like A, C, G, T) within a particular window.
The channel’s capacity signifies the maximum rate of information transmission through it. Prior research has established capacity values for specific combinations of ℓ and δ. In this study, we delve deeper, demonstrating that when δ < ℓ < 2δ, the channel’s capacity can be expressed as (1/δ) log (1/2 (ℓ + 1 +√((ℓ + 1)2 − 4(ℓ − δ)(ℓ − δ + 1))). Furthermore, we establish an upper bound on the capacity when 2δ is less than ℓ. Finally, to enhance the model’s complexity, we extendit to a two-dimensional scenario and present various findings on its capacity. This extended model brings us closer to mimicking the real intricacies of nanopore sequencing and its potential for DNA storage