The Power of Computers in Biology

Worksheet

Task A: Identify the mysterious “Nucleotide Sequence R”

We provide a DNA sequence without any information on its function – Sequence R.

1. Open a new tab in your Web browser.

2. Go to Sequence R.

Copy the whole of it.

Keep this browser tab or window open.

To discover the role of this mysterious sequence, we will search for proteins in a database that show high similarity to a translation of Sequence R.

To do this we will use a sequence search engine, BLAST, to search sequences in the database at the National Center for Biotechnology Information (NCBI).

1. Open BLAST.

2. Click “blastx”.

3. In the box labelled “Enter Query sequence”, paste Sequence R.

4. Click “BLAST”.


BLASTX uses the genetic code to translate Sequence R, then compares it with every protein in the sequence database.

The BLASTX search may take a few minutes, during which time “Status” on the Web page is “Searching”.

Results will be shown in a long Web page. Scroll down to see a table of results. The best-matching sequence from the database is listed first.


Question 1

Sequence R has an excellent match to a known protein, indicated by its low E-value and high percentage identity. We assume Sequence R codes for this protein.

What is the name of this protein?


Hint: To find out more about this protein, click the link under “Accession”. Then you will see the protein name near the top of the page, in bold.


Question 2

In which organism is this protein found?

Hint: Look for the line beginning “SOURCE”.


Question 3

What is the biological role of the protein?

Hint: Do a Web search for the name of the protein (from your answer to Question 1).


 





Task B: Search for a match to Sequence R in the human genome


We will now perform another BLAST search to see if Sequence R has a match in the human genome.

This time, we will use BLASTN, which compares a DNA query with a DNA sequence database.

1. Copy Sequence R again from our Web page.

2. Go back to the main NCBI BLAST Web page. (You can just search for “NCBI BLAST”.)

3. Under “BLAST Genomes”, click “Human”.

BLAST

4. Click “blastn” at the top of the page, and paste in Sequence R.

BLASTN

5. Under “Program selection” > “Optimize for”, select “Somewhat similar sequences (blastn)”.

Optimize for

6. Click “BLAST”.

Question 4

On which human chromosome has the best match been found?

Click on the first result in the “Description” column of the table.

This will display fragments of Sequence R (the query sequence) aligned against human genomic DNA found by the BLASTN search (the subject sequences).

Look at the BLASTN results carefully.


In your results, you will see a good match between Sequence R and the human genomic DNA. However, the mouse (Sequence R) and human sequences differ, due to mutations.

Figure 1 shows evidence of substitution and frameshift mutations in BLASTN results.


Frameshift mutation point

Question 5

In the alignment between Sequence R and the human genomic DNA, can you see evidence of a substitution mutation?

If so: sketch the region that includes the substitution.

Question 6

Can you see evidence of an insertion or deletion mutation?

If so: sketch the region that includes the insertion or deletion.

A frameshift mutation is an insertion or deletion which disrupts the reading frame of a protein-coding sequence. From this point on, any protein sequence would be scrambled.

A frameshift mutation is strong evidence that the DNA no longer codes for a functional protein.

Question 7

Do you think the human genome includes a functional version of Sequence R?

Explain your answer.

Question 8

Sequence R comes from the house mouse and codes for L-gulonolactone oxidase, an enzyme that synthesizes vitamin C.

Vitamin C is vital for both humans and mice.

Does your answer to Question 7 tell us anything about how the diet of humans might differ from the diet of mice?

Daniel Barker, Heleen Plaisier, Laura CE Campbell, Stevie A Bain, Richard Fitzpatrick and Chenxi Zhang

4273pi Bioinformatics Education Project

Copyright and related rights waived via CC0 1.0 Public Domain Dedication.

Version 3.1 (short)