There are many considerations you will need to take into account when designing your NGS experiment depending on your experimental objectives, including the run type (paired end or single read), read length, number of reads and depth of coverage required.
Runs can be either single read (SR) or paired end (PE). Single read runs involve sequencing from only one end of the fragment. This is the simplest way to generate NGS data, quickly and economically. Paired-end runs allow sequencing from both ends of the fragment, generating high quality, precisely alignable data. The additional positional information provided by the longer contigs produced are ideal for de novo genome assembly, and for detection of structural rearrangements such as insertions, deletions and inversions.
Read length is determined by the number of sequencing cycles performed during the run, with one base pair sequenced per cycle. The HiSeq 2500 allows flexibility in the number of cycles (base pairs) carried out, typically 50, 100, 125 or 150 cycles. Shorter read lengths are generally sufficient for mapping of reads to a reference genome, RNA-seq profiling or counting experiments. Longer read lengths generate higher amounts of output data, providing greater accuracy in the positioning of base pairs in a genome, useful for genome or transcriptome studies. However longer reads can also affect the overall quality of the data since the quality score (Q-score) drops as the reads get longer. The Library fragment size must be larger than the read length selected to prevent the adapters being sequenced and causing subsequent errors in alignment.
Number of Reads
During a sequencing run, sequenced base pairs or 'reads' are generated. The number of reads required to achieve your research goals will determine the number of lanes needed to complete your experiment. The HiSeq 2500 System is able to generate up to 200-250 million single reads per lane or 400-500 million paired-end reads per lane in High Output mode, or up to 150 million single reads per lane or 300 million paired-end reads per lane in Rapid Run mode. The number of reads generated per run depends upon various experimental factors, including sample quality, cluster density, and run parameters selected. We aim to generate the highest number of reads possible during each run, however it should be noted that beyond a certain threshold the read quality is compromised.
During a sequencing run reads are generated which sample a genome randomly. The reads are not distributed evenly across the genome, so some bases may be covered by fewer reads while other bases are covered by more reads than average. Coverage refers to the average number of times a single base is read during a sequencing run. For example, if the coverage was 30x, each base was sequenced on average 30 times.
Coverage can be calculated using the Lander/Waterman equation: C=LN/G, where C stands for coverage, G is the haploid genome length, L is the read length and N is the number of reads.
Requirements for coverage will vary according to your type of study, but the standards are ultimately set by journals and your chosen scientific field. The 'Recent Publications' section on Illumina's website provides a resource to search publications and is recommended as a starting point for determining the target depth of coverage for Whole Genome Resequencing, De Novo Sequencing, Targeted Re-sequencing, Transcriptomics and many other fields. For RNA sequencing the ENCODE project is a useful resource.
Multiplexing of libraries
Libraries may be multiplexed by adding sample specific index sequences or 'barcodes'. This allows multiple libraries to be sequenced in the same lane, then identified and sorted bioinformatically during data analysis. Please note it is extremely important that correct index combinations are selected when planning your experiment. Illumina uses a green laser to sequence G/T and a red laser to sequence A/C. At each cycle, for any given index combination, each position must have a G/T and and A/C represented to ensure proper registration. It is important to maintain colour balance for each base of the index read being sequenced, otherwise index read sequencing could fail due to registration failure. You must follow the pooling guidelines depending on the sample prep kit you are using.