Summer Research Fellowship Programme of India's Science Academies

Bioinformatic Analysis of CAG Repeats in Human Genome

Akshatha Nayak

IIIrd year B.Tech Computer Science, Rajiv Gandhi Institute of Technology, Kottayam

Dr. K. Muruga Poopathi Raja

Department of Physical Chemistry, School of Chemistry, Madurai Kamaraj University


Trinucleotide repeats are stretches of a particular codon in DNA, which result in runs of same amino acids, called homorepeats, in a protein sequence. The expansion of triplet repeats is linked to several hereditary and age-related disorders   . Of these, CAG repeats are the most common and of particular significance as the abnormal expansion of glutamine tracts (via CAG/CAA trinucleotide repeats) is associated with at least nine inherited neurodegenerative diseases which include Huntington’s disease, spinobulbar muscular atrophy, dentatorubral-pallidoluysian atrophy, and 6 forms of spinocerebellar ataxia (SCA type 1, 2, 3, 6, 7, and 17) . This study analyses the CAG and mixed CAG/CAA repeats in the human genome, which results in polyglutamine tracts in the protein sequence, in an attempt to find patterns that could give an insight into the cause of repeat expansions. The perfect and imperfect CAG/CAA repeats were studied by finding the occurrences of CAG/CAA repeat sequences in the human genome, extracting information associated to each repeat sequence and analysing the information by visualizing the data. The distribution of pure CAG and mixed CAG/CAA repeats in perfect and imperfect repeat sequences, and the occurrences of CAA codons within the mixed CAG/CAA repeats was analysed based on the percentage composition of CAA, and the position of CAA codons within the repeat sequence to determine codon bias in case of different repeat lengths and number of imperfections. Flanking codons were analysed for perfect and imperfect repeats, to find the most commonly occurring codons in the flanks, based on the length of the sequence and the number of imperfections in the repeats. The codons occurring as imperfections within each imperfect sequence were also analysed, based on their frequency of occurrence, positions within the repeat sequence, and presence of single pair mutations of CAG or CAA as imperfections in 1-imperfect and 2-imperfect repeat sequences.

Keywords : Computational analysis, Codon usage, Poly-Q, Homorepeats, Trinucleotide repeat expansion, Polyglutamine diseases

Written, reviewed, revised, proofed and published with