Research Overview
Pneumocystis is a non-filamentous yeast-like fungal organism, which inhabits mammalian lungs. It causes a lethal pneumonia when the host immune system becomes compromised by Human Immunodeficiency Virus (HIV), malnutrition, chemotherapeutic agents, or other underlying diseases. Pneumocystis pneumonia (PCP) does not respond to standard anti-fungal drug treatments. Sulfa- based Trimethoprim-sulfamethoxazole is the primary drug treatment(s). However, mutations in the dihydropteroate synthase gene, the target of this sulfa-base drug treatment regiment have caused emerging drug resistance (Huang L, Crothers K, Atzori C et al. 2004). Atovaquone, which is used as a secondary PCP treatment alternative has become less effecitive due to emerging drug resistance mutations in the cytochrome b gene (Kazanjian P, Armstrong W, Hossler PA, et al., 2001).. Alternatives to these limited drug treatments are becoming increasingly important.
Experimental approaches for the study of Pneumocystis have been limited by the lack of an in vitro culture system. Research has relied on animal models of infection as a source of organisms for biochemical testing, drug evaluation, and microscopic visualization for life cycle analyses. Modern molecular biology techniques, augmented by computational systems have provided a exciting window of discovery into understanding this fungal parasite at a molecularand biochemical level.
My research is providing this window into medical discovery by sequenicng, assembling and analyzing the P. carinii transcriptome and genome. We are building a computational and biological platform for discovery.
cDNA Analysis of the P. carinii transcriptome
7531 Pc cDNA transcripts were aligned against 4272 Pc non-telomeric genome contigs (~6.2 million bp). 1781 introns (86,694 bp) and 3593 exons (705,555 bp) were identified after using a strict cDNA to genomic alignment criteria. 1710 of 1781 introns (~96%) have canonical (5’GU..AG3’) splice site. Intron lengths were tightly distributed with over 75% (1344 of 1781) between 40 to 50 bp; (av. length= 48.7 bp). The shortest intron was 36 basepairs. A branch signal (5 bps) was identified and occurred between 8-17 bp from the 3’ end of the splice site. The identified branch site pattern has a dominant polypyrimidine track between the 5’ splice site and the branch points of the introns.
We have annotated the P. carinii cDNA transcripts using sequence homology and populated many KEGG/KASS biochemical pathways maps for this fungi.
Genomic Assembly of the P. carinii genome
Pc genome specific challenges limited the effectiveness of heuristically based Arachne, Phrap and Cap35 assembly systems. An alternative assembly strategy was implemented that took advantage of all 3 programs. The resulting iterative assembly process produced a draft of the Pc genome containing ~6.2 million base pairs contained within 4272 contigs. Nearly half of the contigs (2010 of 4272) were merged into 878 directionally oriented supercontigs using sister read names and cDNA alignments. ~3400 Genes have been identified using BlastX/N, Blast2Go and the KEGG/KAAS annotation server. Homology based annotations using the KEGG-KASS annotation server placed 820 genes into biochemical pathways
Ab Initio Gene Prediction Research
Developing a Pneumocystis specific gene prediction system by incorporating organism specific features and biases observed in the P. carinii genome for coding regions and splice sites is expected to yield improved gene prediction. Computational and statistical are being applied using 10 fold cross validation to apply observed organsim biases to the P. carinii ab initio prediction system.