TAGSEARCH- search for SAGE tags in genomic sequences

To identify tags in genomic sequences, we developed a computer program that searches the genomic sequence at the 3’ end of a specified gene for each 14-bp tag, including a Nla III recognition sequence (CATG), represented in SAGE libraries. After the program identifies a tag, it searches for the polyadenylation signal sequences AATAAA and AATTAA. This software, written by John Keene of D. H. Keene Associates, runs on a Microsoft Windows platform and is available upon request.

 

INPUT files (both *.doc files):

- A list of SAGE tags (e.g. those represented in the library of interest)

File structure: 14 bases tag#1 (in red); n; 14 bases tag#2; n ....

For example: catgatgtcgatga ncatgacctggctagn ...

- A genomic sequence (that usually contains the gene of interest).

In our example, tag#1 is in red and the polyA signal is in blue

atggtctgcgatcgcatgatgtcgatga atcgagatagctagcgaataaa cgcatcgata ...

 

TAGSEARCH- will now get into action and will

- Scan the genomic sequence for the presence of each of the tags

- Scan X bases (default 1000 bases) upstream and downstream each tag,

for the two most common polyA signals (AATAAA, AATTAA).

- Generate a table (access file) listing all the tags that have been found

(tag sequence and tag number in the input file), the location of the

tag in the sequence file, the distance between the tag and a polyA

signal if found.

 

OUTPUT file:

tag sequence

tag number

AATAAA

AATTAA

catgatgtcgatga

1

16

0

 

 

To receive a free copy of the TAGSEARCH software, please contact:

Dror Sharon

Email: drorsharon@md.huji.ac.il

Phone: (617) 573-4347

FAX: (617) 573-3168