Resources | Sequencing | Retrieving & Understanding the Data

The information below provides detailed (step-by-step) support on interpreting the data files you receive and subsequent analysis of them. If your results are not what you anticipated, please consider reviewing our trouble shooting page.

It may be most helpful if you follow each step in order, since only then can you be confident that

a)    you have performed a full analysis of the data and

b)    that your sequence is accurate 

Retrieving Data

The information below provides detailed support on data retrieval issues. If your results are not what you anticipated, please consider reviewing our trouble shooting page.

Should you not find the answer to your question/problem here, please contact us and we will do our best to help.

Email Notification

Once your sequencing results are ready, MRC PPU DNA Sequencing and Services will send you an email notification. This will include a link to access your results. Upon logging into the DNA Sequencing and Services system, your data files will be within the ‘Results’ section of the User Menu. Files can also be accessed from My Account.

 

Downloading Files

Each sequencing reaction will result in 2 files for customers: 

1.     A ".ab1" file for each result

2.     A ".seq" file for each result

Files may be downloaded from the ‘Results’ section of the User Menu either individually or as a group. If downloading a number files, please follow the steps for downloading and unzipping your results.

Interpreting Results | Overview

In order to interpret your sequencing files, you may wish to examine your data beyond the nucleotide level.  Thus evaluation of the raw .ab1 files may provide you with more information, as well as confidence around your results.  A full list of chromatogram viewers that will allow you to view these files is available on our site.   

We do not expect our customers to be experts in interpretation of chromatograms data and trouble-shooting failed/poor results, but customers will find that they will gain significant benefit from understanding basic information about the data they receive. We will always do our level best to help customers to obtain the best results that they can and assist them when they are having problems. However, we would like our customers to be able to make best use of all the information available to them so that they can undertake an analysis of their data and use the results of that analysis to satisfy themselves that the data is accurate or to assist them with trying (with our help if necessary) to identify why the data may not be accurate.

 

Interpreting Results | Reviewing the Sequence Peaks

As explained above, each sequence result comprises two files:

  • ".ab1" file: Chromatogram file containing sequence peak data
  • ".seq" file: A simple text file containing the sequence

It is important when reviewing the data to look at both of these files. Viewing the sequence itself is easy, since any text viewer (Notepad, Wordpad, TextEdit, etc) will allow this.

However, it is the ".ab1" file that gives access to the all important information that allows you to ascertain how accurate the sequence is.

To view this file, you will need a chromatogram viewer (please see above).

When you open up a chromatogram file, you will expect to see coloured peaks of sequence data together with the actual sequence of the bases. That is what we would like you to see as well. However, that may not always be the case.

Below are (simplified) descriptions of four types of chromatogram result you might see:

  1. Clean Peaks: This represents a good reaction within the ideal intensity range that is generating high quality sequence and exhibits the following characteristics:
    • Clean, well defined and evenly spaced peaks of fluorescence with corresponding sequence.
    • Low / no background.
    • Peaks continue like this for up to 1000 bases (assuming your template is that long).
  2. Poorly Defined Peaks: Some peaks of fluorescence together with sequence.
    • Peaks may be of poor definition, sequence may be short and there may be a high "background" signal as evidenced by additional peaks below and/or between the main peaks.
    • This usually means that the signal intensity of the reaction was low and so the sequence quality will be poor. This can be confirmed by looking at the signal strengths (see below).
    • Things to check to establish whether the reaction should have worked appropriately:
      • You should check that the template-primer combination is correct
      • Double check that the template appears to be the right one.
      • Also check the amount of template sent compared with our requirements.         
        • If the amount looks right, it might be that a contaminant is present that is inhibiting the reaction.
  3. Flat Tops
    • Initially, peaks with flat tops and/or additional peaks between certain main peaks.
      • Often these extra peaks will be "read", thereby generating errors, followed by clean peaks that generate high quality sequence due to the natural reduction in intensity as the sequence progresses.
        • This usually indicates that the signal was too high and has resulted in "peak pull-up" near the start of the read.
    • Potential Causes
      • Usually this is caused by there being too much template in the reaction
        • We suggest customers check the concentration of template and primer used to ensure that it matches what is needed by our facility
  4. "NNNNN": This means that there was nothing for the program to analyse
    • This is the worst case scenario that usually indicates a "dead" reaction that failed to produce any fluorescent products capable of being detected.
    • Things to check to establish whether the reaction should have worked appropriately:
      • You should check that the template-primer combination is correct and
      • Double check that the template appears to be the right one.

When looking at the peak information, it is very helpful to have an idea of what the signal intensities are. Please review the section below for information on that aspect of data interpretation.

Interpreting Results | Reviewing the Peak Intensities

The above are simplified descriptions of the many variations of result that can be seen. Additional examination of average individual nucleotide signal intensities can help customers distinguish between certain types of issue. This can be achieved by viewing the "information" panel of the chromatogram.

How one does this varies according to the software being used. Examples are:

  • 4Peaks: Click on the little "i" symbol in the bottom right corner of the window.
  • BioEdit: Click on "File -> Info" in the top menu bar.
  • FinchTV: Click on the "i" symbol at the top of the window and then click the "General" tab.
  • Chromas Lite: Does not allow this information to be displayed.

The average intensities are found on the line that looks like this:

SIGN=A=573,C=684,G=670,T=636

Values are in RFUs (Relative Fluorescence Units), which is an arbitrary scale.

Values can be roughly interpreted as follows:

  • <50: A complete failure (often associated with "NNNNN" [have this link to ‘NNNN’ above])
  • 50-100: Very weak and will give poor data
  • 100-500: Weak-good and will give reasonable-good data
  • 500-1000: Ideal signal strength. Will give good-very good data
  • 1000-5000: Somewhat too strong but usually manageable
  • >5000: Too strong and likely to see peak pull-up [have this link to ‘Flat Tops’ above].

The above figures give an indication of how much fluorescent signal was generated in the sequencing reaction and so provide an indication of how good a result will be.

However, it should be remembered that there are many other issues that can affect the overall sequence quality. 

To illustrate how the figures can be useful, consider these three examples:

 

  1. Two identical reads
    • A customer sequences a template with a "forward" and a "reverse" primer.
      • However, both results are "forward" reads when compared with the reference sequence – which could be due to human error..
    • Reviewing the average signal intensities shows that the "forward" read is a bit strong (averages around 5000 RFU), but the "reverse" read only has intensities of about 100 RFU.
      • This would be unlikely if we had genuinely set up two identical reactions.
      • What is actually most likely is that the "reverse" reaction failed for some reason (maybe the primer cannot bind, the primer was mis-diluted, we failed to add it, etc) and what is being seen is "bleed-through" from the adjacent capillary (the forward read).
    • Our sequencers are very sensitive and can detect very low levels of fluorescence.  Normally " bleed-through" is completely swamped by the genuine sequence that is present.
    • However, if there is no genuine sequence present, then small amounts of fluorescence from adjacent capillaries can be picked up.
  2. Multiple Peaks
    • A sequence is obtained that contains various errors relative to the reference sequence and inspection of the chromatogram shows that multiple peaks are present.
      • Is this because there is more than one sequence, or because of genuine "background"?
    • An inspection of the signal intensities reveals that they are all around 50 RFU which is a very weak signal level and at this intensity there will be significant peaks showing up that are due to background fluorescence.
    • One cannot discount the possibility that there is an additional sequence present, but it is more likely that the peaks are due to the background fluorescence.
    • One thing is certain though and that is that the sequence cannot be trusted and trouble shooting will be required to determine what has caused the reaction to work so poorly.
  3. Extra Bases Inserted
    • A sequence is obtained where there are extra bases seemingly inserted at random in the initial 200 bases of sequences.
      • After that, the sequence is clean and further inspection of the chromatogram shows that extra peaks are present. What is going on?
    • Looking at the average signal intensities shows that these are all around 10,000 RFU.
      • This is well above the acceptable intensity range and so what is happening is that the detector/software is being swamped with too much signal and is not able to cope.
    • The extra bases are therefore probably artefacts, but the sample should be diluted so that accurate data is obtained.

The above three scenarios illustrate some ways in which careful analysis of all the information available within the chromatogram file can help to determine what is going on.

The intensity values also allow the customer to ascertain whether they are using "too little", "too much" or "about the right" amount of template and so provide a way for customers to establish whether their quantitation methods are accurate.

If after reviewing the information above, you still have questions, please contact us to discuss this further so we can determine how best to help.

Performing Database Searches

After you have ascertained that your sequence is of good quality, we recommend that you perform Blast searches or do whatever other analysis / manipulation of the sequence that you need to do.

This is recommended as an integral part part of the analysis. By spending a small amount of time performing a thorough analysis of the results, it is possible to determine what the quality of the results is so that (should the need arise) you have all the information available when contacting us about your results.

Do you need any help? Please get in touch and we’ll be happy to lend a hand.