TAGSEARCH- search for SAGE tags in genomic sequences
To identify tags in genomic sequences, we developed a computer program that searches the genomic sequence at the 3’ end of a specified gene for each 14-bp tag, including a Nla III recognition sequence (CATG), represented in SAGE libraries. After the program identifies a tag, it searches for the polyadenylation signal sequences AATAAA and AATTAA. This software, written by John Keene of D. H. Keene Associates, runs on a Microsoft Windows platform and is available upon request.
INPUT files (both *.doc files):
- A list of SAGE tags (e.g. those represented in the library of interest)
File structure: 14 bases tag#1 (in red); n; 14 bases tag#2; n ....
For example: catgatgtcgatga ncatgacctggctagn ...
- A genomic sequence (that usually contains the gene of interest).
In our example, tag#1 is in red and the polyA signal is in blue
atggtctgcgatcgcatgatgtcgatga atcgagatagctagcgaataaa cgcatcgata ...
TAGSEARCH- will now get into action and will
- Scan the genomic sequence for the presence of each of the tags
- Scan X bases (default 1000 bases) upstream and downstream each tag,
for the two most common polyA signals (AATAAA, AATTAA).
- Generate a table (access file) listing all the tags that have been found
(tag sequence and tag number in the input file), the location of the
tag in the sequence file, the distance between the tag and a polyA
signal if found.
OUTPUT file:
|
tag sequence |
tag number |
AATAAA |
AATTAA |
|
catgatgtcgatga |
1 |
16 |
0 |
To receive a free copy of the TAGSEARCH software, please contact:
Dror Sharon
Email: drorsharon@md.huji.ac.il
Phone: (617) 573-4347
FAX: (617) 573-3168