High-throughput DNA sequence analysis for thr investigation of genetic diseases by the candidate gene approach

James Joseph Earley, Thomas Jefferson University


A protocol for high-throughput DNA sequence analysis, using conventional equipment, was developed and then used to investigate the genetic basis of inherited diseases by the candidate gene approach. The dideoxy sequencing reaction was optimized to (1) perform reactions in microplates using manual and automated multichannel pipetors and (2) for sequencing 25 fmole of single- and double-stranded template DNA. This reduced the cost to about $1.00/reaction and allowed read-lengths of $>$650 nts while facilitating the automated analysis of sequencing gels. The protocol was then applied to three specific projects: (1)~the analysis of the type III procollagen gene for mutations, (2)~the analysis of 300 expressed sequence tags from a human aortic DNA library, and (3)~the sequence analysis of a 14 kb genomic fragment of the procollagen a2(I) gene. Several lines of evidence suggested that mutations in the type III procollagen gene were a common cause of aneurysms. Five asymmetrically amplified PCR products, covering $>$3200 nts of the procollagen types III mRNA, were sequenced directly, from 112 individuals diagnosed with aortic or intracranial aneurysms. Five nucleotide substitutions were identified which altered the amino acid sequence in the triple helical region of type III collagen. Three were rare polymorphisms with substitutions at prolines in the Y-position of the Gly-X-Y triple helical repeat sequence. The other two were Gly 1021 to Glu and Gly 136 to Arg, from patients with Ehlers-Danlos syndrome type IV and aortic aneurysms, respectively. Because substitutions for glycine distort the conformation of the triple helix, the two glycine substitutions were likely to have caused aneurysms in the two patients. The results confirm previous observations that mutations in the type III procollagen gene are a cause of aortic aneurysm in patients with for Ehlers-Danlos syndrome, while indicating that mutations in the triple helical region are rare in patients with aortic or intracranial aneurysms. Nearly 300 expressed sequence tags (ESTs) were analyzed from a newborn human aortic cDNA library. A total of 61 transcripts encoding nuclear proteins were identified. Nearly one-half of the ESTs were housekeeping genes while the remainder were associated primarily with signal transduction pathways involved in growth and development. Twenty-seven ESTs had no significant match to known proteins or ESTs and may represent new candidate genes specific for growth and development of the human aorta. A 14 kb region of the COL1A2 gene was sequenced, completing the 5$\sp\prime$ portion of the gene through to intron 21. A shotgun sequencing strategy was utilized. Coverage was obtained from 220 sequencing reactions, comparable to similar efforts utilizing automated DNA sequencers. An analysis of the first 20 introns of the COL1A2 gene revealed an exceptionally high number and density of consensus sequences for the binding of transcription factors in the second intron of the gene.

Subject Area

Genetics|Molecular biology

Recommended Citation

Earley, James Joseph, "High-throughput DNA sequence analysis for thr investigation of genetic diseases by the candidate gene approach" (1996). ProQuest ETD Collection - Thomas Jefferson University. AAI9709076.