Department of Biosciences

Troubleshooting Guide

This page is currently under construction and information is continually being added. Please continue to check back for more troubleshooting advice or contact a member of our team for help.

You will have been provided with a text file (.seq) and a chromatogram file (.ab1). In order to properly evaluate your data it is important to view the information contained in the chromatogram file. The text file is automatically generated by a software program, which can and does make mistakes!

Evaluating your chromatogram data

A good quality sequence should contain evenly spaced peaks of only one colour per peak. Peak heights will vary, but peaks should be distinct from the baseline and any background noise should be minimal. The raw data trace should show an even signal intensity throughout, with no sudden or significant drops. Signal intensity should ideally be between 2000 and 5000 rfu, but it is possible to obtain reliable data at lower or higher intensities. It is common for errors in basecalling to occur at the beginning of the trace as mass differences between the bases are more significant at these short lengths. It is therefore important to check and correct any inaccurate basecalls, and when necessary truncate the sequence where errors have become too frequent.Usually automated DNA sequencing is a robust technique and good data is generated, however various problems can and do arise. Fortunately most common sequencing problems have a limited number of causes and can be easily identified and rectified. The following troubleshooting guide covers the most frequently occurring problems, with advice on how to resolve them. Occasionally identifying the cause of a poor result can be difficult, if a particular sequencing problem has numerous underlying causes, or is the result of multiple interacting factors. Often the only way to determine the root cause of a particular problem is to perform a process of elimination.

Signal too low

Chromatogram data shows intereference from background noise and raw data shows low signal intensities, typically under 500rfu. The processing software may increase the size of the peaks to fill the space available in each panel, but unfortunately any background noise is also amplified and appears as a chaotic pattern underlying any genuine peaks.

The primary cause of low signal is DNA with low concentration. When the amount is too low, few complexes form and the signal level is too low to be accurately extracted from the background noise.

DNA concentration should be accurately quantified and if necessary increased to be within the appropriate range. BigDye concentrations may need to be adjusted and it may be beneficial to increase the number of cycles during thermal cycling.

Signal too high

Peaks appear to be truncated, and the signal intensity is offscale. There may be 'pull-ups' beneath the peaks - contaminating peaks of different colours - caused by spectral overlap of the dyes.

Too much template is usually the cause, and in extreme cases can cause inhbition of the sequencing reaction (see below).

DNA concentration should be accurately quantified and if necessary be reduced to be within the appropriate range. It may also be necessary to reduce the concentration of BigDye used in the reaction and/or use fewer cycles during thermal cycling.

Inhibition of sequencing reaction

Both the chromatogram and the raw data show peaks which start off high and decline rapidly down to baseline (the 'ski-slope' effect).

The main cause of inhibition is the presence of contaminants such as salt or ethanol in the template DNA. Contaminants can affect the progress of the polymerase, leading to a predominance of short, prematurely terminated fragments. Overloading the reaction with DNA can also cause inhibition, as the nucleotides in the reaction are distributed over too many growing chains, resulting in an over-abundance of short fragments.

DNA concentration should be accurately quanitfied to ensure that overloading of the reaction has not occurred. Where overloading of DNA is not found to be the cause of the inhibition, samples should be carefully cleaned using cleanup columns to ensure complete removal of contaminants. Highly concentrated samples have a tendency to overload cleanup columns resulting in incomplete removal of contaminants, so it may be necessary to dilute samples of high concentration before performing a cleanup.

Secondary Structure

The sequence proceeds normally, until there is an abrupt drop in signal strength where the secondary structure occurs. The drop may be small and unobtrusive, but in extreme cases can lead to a dead stop.

Regions of secondary structure are detrimental to PCR based sequencing due to the high melting temperature of these problem areas. The sequencing polymerase may be unable to progress through the region normally. A percentage of the polymerases may stop completely, causing the drop in signal. Some may progress through the secondary structure while fewer still may skip the region altogether, as in the case of hairpin loops. This can lead to the appearance of mixed data following the signal drop. Some vectors are more prone to secondary structure problems, such as Gateway vectors, due to the presence of palindromic sequences.

Some possible solutions to the problem are sequencing from another primer at a different position, sequencing the opposite strand, or in extreme cases subclone smaller sections of the DNA. Alternative cycling conditions are sometimes helpful and can be requested from the facility. We can advise where we think this method may be beneficial.

Dye Blobs

A portion of the sequence is obscured by a large peak, usually red or blue, within the first 100bp of data. The peak shape is clearly abnormal. The sequence beneath the peak may have been miscalled by the software.

The peak is a result of excess unincorporated dyes in the reaction. These are usually removed at the cleanup stage, but may persist in some cases. It is often possible to manually read the bases which lie beneath the peak, as in the example above. However in extreme cases the cause of the dye blob would need to be addressed.

In some cases if our sample cleanup has not gone well, excess dyes may remain. In this instance we would automatically re-run the reaction to remove the dye blob. The example below shows data containing a dye blob, along with the same sample having been re-run to remove the dye blob.

If the dye blob should remain after re-running the reaction, this would indicate that the sample itself may contain a contaminant which binds unincorporated dyes. In this case the DNA should be cleaned by running through a cleanup column.

If signals are particularly low for any reason, dye blobs are more likely to be visible. Since the reaction has worked inefficiently, the amount of leftover dye which was not incorporated during cycle sequencing appears as large dye blob peaks. Furthermore, the sequence itself may have a lower signal intensity than that of the unincorporated dyes in the background. Re-running these reactions is not usually effective, and the cause of the low signals would need to be addressed in order to resolve the problem.

Data Spikes

Sharp spikes are visible in the sequence, normally higher than the surrounding peaks but narrower, being less than 10 data points in width. All four colours may be visible within the spike.

The cause of data spikes is not fully understood, but they are thought to be a result of either crystals within the polymer, or small gas bubbles forming in the capillaries. These refract the laser light back into the camera as they pass by the detector zone. The laser light contains the emission spectra of all four dyes and for this reason the spikes consist of all four colours.

If a data spike is detected we will automatically re-run the reaction. As this artifact is not present in the sample itself, re-running the reaction eliminates the problem.