Lab Logo
J48. Fernando Racimo, David Gokhman, Matteo Fumagalli, Amy Ko, Torben Hansen, Ida Moltke, Anders Albrechtsen, Liran Carmel, Emilia Huerta-Snchez and Rasmus Nielsen (2017)
Archaic adaptive introgression in TBX15/WARS2
Molecular Biology and Evolution, 34:509-524. web pdf
A recent study conducted the first genome-wide scan for selection in Inuit from Greenland using SNP chip data. Here, we report that selection in the region with the second most extreme signal of positive selection in Greenlandic Inuit favored a deeply divergent haplotype that is closely related to the sequence in the Denisovan genome, and was likely introgressed from an archaic population. The region contains two genes, WARS2 and TBX15, and has previously been associated with body-fat distribution in humans. We show that the adaptively introgressed allele has been under selection in a much larger geographic region than just Greenland. Furthermore, it is associated with changes in expression of WARS2 and TBX15 in multiple tissues including the adrenal gland and subcutaneous adipose tissue.

J47. Topaz Halperin, Liran Carmel and Dror Hawlena (2017)
Movement correlates of lizards' dorsal pigmentation patterns
Functional Ecology 31:370-376. web pdf
Understanding the ecological function of an animal's pigmentation pattern is an intriguing research challenge. We used quantitative information on lizard foraging behavior to search for movement correlates of patterns across taxa. We hypothesized that noticeable longitudinal stripes that enhance escape by motion-dazzle are advantageous for mobile foragers that are highly detectable against the stationary background. Cryptic pigmentation patterns are beneficial for less-mobile foragers that rely on camouflage to reduce predation. Using an extensive literature survey and phylogenetically-controlled analyses, we found that striped lizards were substantially more mobile than lizards with cryptic patterns. The percent of time spent moving was the major behavioral index responsible for this difference. We provide empirical support for the hypothesized association between lizard dorsal pigmentation patterns and foraging behavior. Our simple yet comprehensive explanation may be relevant to many other taxa that present variation in body pigmentation patterns.

J46. Fouad Zahdeh and Liran Carmel (2016)
The role of nucleotide composition in premature termination codon recognition
BMC Bioinformatics 17:519. web pdf
Background. It is not fully understood how a termination codon is recognized as premature (PTC) by the nonsense-mediated decay (NMD) machinery. This is particularly true for transcripts lacking an exon junction complex (EJC) along their 3' untranslated region (3'UTR), and thus degrade through the EJC-independent NMD pathway.
Results. Here, we analyzed data of transcript stability change following NMD repression and identified over 200 EJC-independent NMD-targets. We examined many features characterizing these transcripts, and compared them to NMD-insensitive transcripts, as well as to a group of transcripts that are destabilized following NMD repression (destabilized transcripts).
Conclusions. We found that none of the known NMD-triggering features, such as the presence of upstream open reading frames, significantly characterizes EJC-independent NMD-targets. Instead, we saw that NMD-targets are strongly enriched with G nucleotides upstream of the termination codon, and even more so along their 3'UTR. We suggest that high G content around the termination codon impedes translation termination as a result of mRNA folding, thus triggering NMD. We also suggest that high G content in the 3'UTR helps to activate NMD by allowing for the accumulation of UPF1, or other NMD-promoting proteins, along the 3'UTR.
Keywords. Nonsense-mediated decay (NMD), EJC-independent NMD, NMD-triggering features, Stop codon GC content, Stop codon nucleotide composition, RNA secondary structure, Exon junction complex (EJC), Transcription termination

J45. Ruxandra Covacu, Hagit Philip, Merja Jaronen, Jorge Almeida, Jessica Kenison, Samuel Darko, Chun-Cheih Chao, Gur Yaari, Yoram Louzoun, Liran Carmel, Daniel C. Douek, Sol Efroni and Francisco J. Quintana (2016)
System-wide analysis of the T-cell response
Cell Reports 14:2733-2744. web pdf
The T cell receptor (TCR) controls the cellular adaptive immune response to antigens, but our understanding of TCR repertoire diversity and response to challenge is still incomplete. For example, TCR clones shared by different individuals with minimal alteration to germline gene sequences (public clones) are detectable in all vertebrates, but their significance is unknown. Although small in size, the zebrafish TCR repertoire is controlled by processes similar to those operating in mammals. Thus, we studied the zebrafish TCR repertoire and its response to stimulation with self and foreign antigens. We found that cross-reactive public TCRs dominate the T cell response, endowing a limited TCR repertoire with the ability to cope with diverse antigenic challenges. These features of vertebrate public TCRs might provide a mechanism for the rapid generation of protective T cell immunity, allowing a short temporal window for the development of more specific private T cell responses.

J44. Ranit Jaron, Nuphar Rosenfeld, Fouad Zahdeh, Shai Carmi, Liana Beni-Adani, Reeval Segel, Sharon Zeligson, Liran Carmel, Paul Renbaum and Ephrat Levy-Lahad (2016)
Expanding the phenotype of CRB2 mutations - A new ciliopathy syndrome?
Clinical Genetics 90:540-544. web
Recessive CRB2 mutations were recently reported to cause both steroid resistant nephrotic syndrome and prenatal onset ventriculomegaly with kidney disease. We report two Ashkenazi Jewish siblings clinically diagnosed with ciliopathy. Both presented with severe congenital hydrocephalus and mild urinary tract anomalies. One affected sibling also has lung hypoplasia and heart defects. Exome sequencing and further CRB2 analysis revealed that both siblings are compound heterozygotes for CRB2 mutations p.N800K and p.Gly1036Alafs*43, and heterozygous for a deleterious splice variant in the ciliopathy gene TTCB21. CRB2 is a polarity protein which plays a role in ciliogenesis and ciliary function. Biallelic CRB2 mutations in animal models result in phenotypes consistent with ciliopathy. This report expands the phenotype of CRB2 mutations to include lung hypoplasia and uretero-pelvic renal anomalies, and confirms cardiac malformation as a feature. We suggest that CRB2-associated disease is a new ciliopathy syndrome with possible digenic/triallelic inheritance, as observed in other ciliopathies. Clinically, CRB2 should be assessed when ciliopathy is suspected, especially in Ashkenazi Jews, where we found that p.N800K carrier frequency is 1/64. Patients harboring CRB2 mutations should be tested for the full range of ciliopathy manifestations.

J43. David Gokhman, Eran Meshorer and Liran Carmel (2016)
Epigenetics: it's getting old. Past meets future in paleoepigenetics
Trends in Ecology and Evolution 31:290-300. web pdf
Recent years have witnessed the rise of ancient DNA (aDNA) technology, allowing comparative genomics to be carried out at unprecedented time resolution. While it is relatively straightforward to use aDNA to identify recent genomic changes, it is much less clear how to utilize it to study changes in epigenetic regulation. Here we review recent works demonstrating that highly degraded aDNA still contains sufficient information to allow reconstruction of epigenetic signals, including DNA methylation and nucleosome positioning maps. We discuss challenges arising from the tissue specificity of epigenetics, and show how some of them might in fact turn into advantages. Finally, we introduce a method to infer methylation states in tissues that do not tend to be preserved over time.

J42. Michal Chorev, Lotem Guy and Liran Carmel (2016)
JuncDB: an exon-exon junction database
Nucleic Acids Research 44(D1) (Database issue):D101-D109. web pdf
Intron positions upon the mRNA transcript are sometimes remarkably conserved even across distantly related eukaryotic species. This has made the comparison of intron-exon architectures across orthologous transcripts a very useful tool for studying various evolutionary processes. Moreover, the wide range of functions associated with introns may confer biological meaning to evolutionary changes in gene architectures. Yet, there is currently no database that offers such comparative information. Here, we present JuncDB (, an exon-exon junction database dedicated to the comparison of architectures between orthologous transcripts. It covers nearly 40,000 sets of orthologous transcripts spanning 88 eukaryotic species. JuncDB offers a user-friendly interface, access to detailed information, instructive graphical displays of the comparative data and easy ways to download data to a local computer. In addition, JuncDB allows the analysis to be carried out either on specific genes, or at a genome-wide level for any selected group of species.

J41. Liron Levin, Dan Bar-Yaacov, Amos Bouskila, Michal Chorev, Liran Carmel and Dan Mishmar (2015)
LEMONS - A tool for the identification of splice junctions in transcriptomes of organisms lacking reference genomes
PLoS ONE 10:e0143329. web pdf
RNA-seq is becoming a preferred tool for genomics studies of model and non-model organisms. However, DNA-based analysis of organisms lacking sequenced genomes cannot rely on RNA-seq data alone to isolate most genes of interest, as DNA codes both exons and introns. With this in mind, we designed a novel tool, LEMONS, that exploits the evolutionary conservation of both exon/intron boundary positions and splice junction recognition signals to produce high throughput splice-junction predictions in the absence of a reference genome. When tested on multiple annotated vertebrate mRNA data, LEMONS accurately identified 87% (average) of the splice-junctions. LEMONS was then applied to our updated Mediterranean chameleon transcriptome, which lacks a reference genome, and predicted a total of 90,820 exon-exon junctions. We experimentally verified these splice-junction predictions by amplifying and sequencing twenty randomly selected genes from chameleon DNA templates. Exons and introns were detected in 19 of 20 of the positions predicted by LEMONS. To the best of our knowledge, LEMONS is currently the only experimentally verified tool that can accurately predict splice-junctions in organisms that lack a reference genome.

J40. Ariella Weinberg-Shukron, Abdulsalam Abu-Libdeh, Fouad Zahdeh, Liran Carmel, Aviram Kogot-Levin, Lara Kamal, Moien Kanaan, Sharon Zeligson, Paul Renbaum, Ephrat Levy-Lahad and David Zangen (2015)
Combined mineralocorticoid and glucocorticoid deficiency is caused by a novel founder nicotinamide nucleotide transhydrogenase mutation that alters mitochondrial morphology and increases oxidative stress
Journal of Medical Genetics 52:636-641. web pdf
Background: Familial glucocorticoid deficiency (FGD) reflects specific failure of adrenocortical glucocorticoid production in response to adrenocorticotropic hormone (ACTH). Most cases are caused by mutations encoding ACTH-receptor components (MC2R, MRAP) or the general steroidogenesis protein (StAR). Recently, nicotinamide nucleotide transhydrogenase (NNT) mutations were found to cause FGD through a postulated mechanism resulting from decreased detoxification of reactive oxygen species (ROS) in adrenocortical cells.
Methods and Results: In a consanguineous Palestinian family with combined mineralocorticoid and glucocorticoid deficiency, whole-exome sequencing revealed a novel homozygous NNT_c.598 G>A, p.G200S, mutation. Another affected, unrelated Palestinian child was also homozygous for NNT_p.G200S. Haplotype analysis showed this mutation is ancestral; carrier frequency in ethnically matched controls is 1/200. Assessment of patient fibroblasts for ROS production, ATP content and mitochondrial morphology showed that biallelic NNT mutations result in increased levels of ROS, lower ATP content and morphological mitochondrial defects.
Conclusions: This report of a novel NNT mutation, p.G200S, expands the phenotype of NNT mutations to include mineralocorticoid deficiency. We provide the first patient-based evidence that NNT mutations can cause oxidative stress and both phenotypic and functional mitochondrial defects. These results directly demonstrate the importance of NNT to mitochondrial function in the setting of adrenocortical insufficiency.

J39. David Gokhman, Eitan Lavi, Kay Prfer, Mario F. Fraga, Jos A. Riancho, Janet Kelso, Svante Pbo, Eran Meshorer and Liran Carmel (2014)
Reconstructing the DNA methylation maps of the Neandertal and the Denisovan
Science 344:523-527. web pdf
Ancient DNA sequencing has recently provided high-coverage archaic human genomes. However, the evolution of epigenetic regulation along the human lineage remains largely unexplored. We reconstructed the full DNA methylation maps of the Neandertal and the Denisovan by harnessing the natural degradation processes of methylated and unmethylated cytosines. Comparing these ancient methylation maps to those of present-day humans, we identified ~2000 differentially methylated regions (DMRs). Particularly, we found substantial methylation changes in the HOXD cluster that may explain anatomical differences between archaic and present-day humans. Additionally, we found that DMRs are significantly more likely to be associated with diseases. This study provides insight into the epigenetic landscape of our closest evolutionary relatives, and opens a window to explore the epigenomes of extinct species.
Coverage to this paper can be found here.
Free links to the abstract and the full paper.

J38. Michal Chorev and Liran Carmel (2013)
Computational identification of functional introns: high positional conservation of introns that harbor RNA genes
Nucleic Acids Research 41:5604-5613. web pdf
An appreciable fraction of introns is thought to have some function, but there is no obvious way to predict which specific intron is likely to be functional. We hypothesize that functional introns experience a different selection regime than non-functional ones and will therefore show distinct evolutionary histories. In particular, we expect functional introns to be more resistant to loss, and that this would be reflected in high conservation of their position with respect to the coding sequence. To test this hypothesis, we focused on introns whose function comes about from microRNAs and snoRNAs that are embedded within their sequence. We built a data set of orthologous genes across 28 eukaryotic species, reconstructed the evolutionary histories of their introns and compared functional introns with the rest of the introns. We found that, indeed, the position of microRNA- and snoRNA-bearing introns is significantly more conserved. In addition, we found that both families of RNA genes settled within introns early during metazoan evolution. We identified several easily computable intronic properties that can be used to detect functional introns in general, thereby suggesting a new strategy to pinpoint non-coding cellular functions.
Coverage to this paper can be found here.

J37. Liran Carmel, Eugene V. Koonin and Stella Dracheva (2012)
Dependencies among Editing Sites in Serotonin 2C Receptor mRNA
PLoS Computational Biology 8:e1002663. web pdf
The serotonin 2C receptor (5-HT2CR) - a key regulator of diverse neurological processes - exhibits functional variability derived from editing of its pre-mRNA by site-specific adenosine deamination (A-to-I pre-mRNA editing) in five distinct sites. Here we describe a statistical technique that was developed for analysis of the dependencies among the editing states of the five sites. The statistical significance of the observed correlations was estimated by comparing editing patterns in multiple individuals. For both human and rat 5-HT2CR, the editing states of the physically proximal sites A and B were found to be strongly dependent. In contrast, the editing states of sites C and D, which are also physically close, seem not to be directly dependent but instead are linked through the dependencies on sites A and B, respectively. We observed pronounced differences between the editing patterns in humans and rats: in humans site A is the key determinant of the editing state of the other sites, whereas in rats this role belongs to site B. The structure of the dependencies among the editing sites is notably simpler in rats than it is in humans implying more complex regulation of 5-HT2CR editing and, by inference, function in the human brain. Thus, exhaustive statistical analysis of the 5-HT2CR editing patterns indicates that the editing state of sites A and B is the primary determinant of the editing states of the other three sites, and hence the overall editing pattern. Taken together, these findings allow us to propose a mechanistic model of concerted action of ADAR1 and ADAR2 in 5-HT2CR editing. Statistical approach developed here can be applied to other cases of interdependencies among modification sites in RNA and proteins.

J36. Inbal Avraham-Davidi, Y. Ely, V.N. Pham, D. Castranova, M. Grunspan, G. Malkinson, L. Gibbs-Bar, O. Mayseless, G. Allmog, B. Lo, C.M. Warren, T.T. Chen, J. Ungos, K. Kidd, K. Shaw, I. Rogachev, W. Wan, P.M. Murphy, S.A. Farber, Liran Carmel, G.S. Shelness, M.L. Iruela-Arispe, M.L. Iruela-Arispe, Brant M. Weinstein and Karina Yaniv (2012)
ApoB-containing lipoproteins regulate angiogenesis by modulating expression of VEGF receptor 1
Nature Medicine 18:967-973. web pdf
Despite the clear major contribution of hyperlipidemia to the prevalence of cardiovascular disease in the developed world, the direct effects of lipoproteins on endothelial cells have remained obscure and are under debate. Here we report a previously uncharacterized mechanism of vessel growth modulation by lipoprotein availability. Using a genetic screen for vascular defects in zebrafish, we initially identified a mutation, stalactite (stl), in the gene encoding microsomal triglyceride transfer protein (mtp), which is involved in the biosynthesis of apolipoprotein B (ApoB)-containing lipoproteins. By manipulating lipoprotein concentrations in zebrafish, we found that ApoB negatively regulates angiogenesis and that it is the ApoB protein particle, rather than lipid moieties within ApoB-containing lipoproteins, that is primarily responsible for this effect. Mechanistically, we identified downregulation of vascular endothelial growth factor receptor 1 (VEGFR1), which acts as a decoy receptor for VEGF, as a key mediator of the endothelial response to lipoproteins, and we observed VEGFR1 downregulation in hyperlipidemic mice. These findings may open new avenues for the treatment of lipoprotein-related vascular disorders.

J35. Igor B. Rogozin, Liran Carmel, Miklos Csuros and Eugene V. Koonin (2012)
Origin and evolution of spliceosomal introns
Biology Direct 7:11. web pdf
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded 'introns first' held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes.

J34. Michal Chorev and Liran Carmel (2012)
The function of introns
Frontiers in Genetics 3:55. web pdf
The intron-exon architecture of many eukaryotic genes raises the intriguing question of whether this unique organization serves any function, or is it simply a result of the spread of functionless introns in eukaryotic genomes. In this review, we show that introns in contemporary species fulfill a broad spectrum of functions, and are involved in virtually every step of mRNA processing. We propose that this great diversity of intronic functions supports the notion that introns were indeed selfish elements in early eukaryotes, but then independently gained numerous functions in different eukaryotic lineages. We suggest a novel criterion of evolutionary conservation, dubbed intron positional conservation, which can identify functional introns.

J33. Noa E. Cohen, Roy Shen and Liran Carmel (2012)
The role of reverse-transcriptase in intron gain and loss mechanisms
Molecular Biology and Evolution 29:179-186. web pdf
Intron density is highly variable across eukaryotic species. It seems that different lineages have experienced considerably different levels of intron gain and loss events, but the reasons for this are not well-known. A large number of mechanisms for intron loss and gain have been suggested, and most of them have at least some level of indirect support. We therefore figured out that the variability in intron density can be a reflection of the fact that different mechanisms are active in different lineages. Quite a number of these putative mechanisms, both for intron loss and for intron gain, postulate that the enzyme reverse-transcriptase has a key role in the process. In this paper we lay out three predictions whose approval or falsification gives indication for the involvement of reverse-transcriptase in intron gain and loss processes. Testing these predictions requires data on the intron gain and loss rates of individual genes along different branches of the eukaryotic phylogenetic tree. So far, such rates could not be computed, and hence these predictions could not be rigorously evaluated. Here, we use a maximum likelihood algorithm that we have devised in the past, EREM, which allows the estimation of such rates. Using this algorithm, we computed the intron loss and gain rates of more than 300 genes, in each branch of the phylogenetic tree of 19 eukaryotic species. Based on that, we found only little support for reverse-transcriptase activity in intron gain. In contrast, we suggest that reverse-transcriptase-mediated intron loss is a mechanism that is very efficient in removing introns, and thus its levels of activity may be a major determinant of intron number. Moreover, we found that intron gain and loss rates are negatively correlated in intron-poor species, but are positively correlated for intron-rich species. One explanation to this is that intron gain and loss mechanisms in intron-rich species (like metazoans) share a common mechanistic component, albeit not a reverse-transcriptase.

J32. David Zangen, Yotam Kaufman, Sharon Zeligson, Shira Perlberg, Hila Fridman, Moein Kanaan, Maha Abdulhadi-Atwan, Abdulsalam Abu Libdeh, Ayal Gussow, Irit Kisslov, Liran Carmel, Paul Renbaum and Ephrat Levy-Lahad (2011)
XX Ovarian Dysgenesis Is Caused by a PSMC3IP/HOP2 Mutation that Abolishes Coactivation of Estrogen-Driven Transcription
The American Journal of Human Genetics 89:572-579. web pdf
XX female gonadal dysgenesis (XX-GD) is a rare, genetically heterogeneous disorder characterized by lack of spontaneous pubertal development, primary amenorrhea, uterine hypoplasia, and hypergonadotropic hypogonadism as a result of streak gonads. Most cases are unexplained but thought to be autosomal recessive. We elucidated the genetic basis of XX-GD in a highly consanguineous Palestinian family by using homozygosity mapping and candidate-gene and whole-exome sequencing. Affected females were homozygous for a 3 bp deletion (NM_016556.2, c.600_602del) in the PSMC3IP gene, leading to deletion of a glutamic acid residue (p.Glu201del) in the highly conserved C-terminal acidic domain. Proteasome 26S subunit, ATPase, 3-Interacting Protein (PSMC3IP)/Tat Binding Protein Interacting Protein (TBPIP) is a nuclear, tissue-specific protein with multiple functions. It is critical for meiotic recombination as indicated by the known role of its yeast ortholog, Hop2. Through the C terminus (not present in yeast), PSMC3IP also coactivates ligand-driven transcription mediated by estrogen, androgen, glucocorticoid, progesterone, and thyroid nuclear receptors. In cell lines, the p.Glu201del mutation abolished PSMC3IP activation of estrogen-driven transcription. Impaired estrogenic signaling can lead to ovarian dysgenesis both by affecting the size of the follicular pool created during fetal development and by failing to counteract follicular atresia during puberty. PSMC3IP joins previous genes known to be mutated in XX-GD, the FSH receptor, and BMP15, highlighting the importance of hormonal signaling in ovarian development and maintenance and suggesting a common pathway perturbed in isolated XX-GD. By analogy to other XX-GD genes, PSMC3IP is also a candidate gene for premature ovarian failure, and its role in folliculogenesis should be further investigated.

J31. John K. Colbourne, ... , Liran Carmel, ... and Jeffrey L. Boore (2011)
The ecoresponsive genome of Daphnia pulex
Science 331:555-561. web pdf
We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 megabases and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than a third of Daphnia's genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The coexpansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes, including many additional loci within sequenced regions that are otherwise devoid of annotations, are the most responsive genes to ecological challenges.

J30. Liran Carmel, Yuri I. Wolf, Igor B. Rogozin and Eugene V. Koonin (2010)
EREM: Parameter estimation and ancestral reconstruction by expectation-maximization algorithm for a probabilistic model of genomic binary characters evolution
Advances in Bioinformatics 2010:Article ID 167408. web pdf
Evolutionary binary characters are features of species or genes, indicating the absence (value zero) or presence (value one) of some property. Examples include eukaryotic gene architecture (the presence or absence of an intron in a particular locus), gene content, and morphological characters. In many studies, the acquisition of such binary characters is assumed to represent a rare evolutionary event, and consequently, their evolution is analyzed using various flavors of parsimony. However, when gain and loss of the character are not rare enough, a probabilistic analysis becomes essential. Here, we present a comprehensive probabilistic model to describe the evolution of binary characters on a bifurcating phylogenetic tree. A fast software tool, EREM, is provided, using maximum likelihood to estimate the parameters of the model and to reconstruct ancestral states (presence and absence in internal nodes) and events (gain and loss events along branches).

J29. Liran Carmel and Eugene V. Koonin (2009)
A universal nonmonotonic relationship between gene compactness and expression level in multicellular eukaryotes
Genome Biology and Evolution 2009:382-390. web pdf
Analysis of gene architecture and expression levels of four organisms, Homo sapiens, Caenorhabiditis elegans, Drosophila melanogaster, and Arabidopsis thaliana, reveals a surprising, nonmonotonic, universal relationship between expression level and gene compactness. With increasing expression level, the genes tend at first to become longer but, from a certain level of expression, they become more and more compact, resulting in an approximate bell-shaped dependence. There are two leading hypotheses to explain the compactness of highly expressed genes. The selection hypothesis predicts that gene compactness is predominantly driven by the level of expression whereas the genomic design hypothesis predicts that expression breadth across tissues is the driving force. We observed that the connection between gene expression breadth in humans and gene compactness to be significantly weaker than the connection between expression level and compactness, a result that is compatible with the selection hypothesis but not the genome design hypothesis. The initial gene elongation with increasing expression level could be explained, at least in part, by accumulation of regulatory elements enhancing expression, in particular, in introns. This explanation is compatible with the observed positive correlation between intron density and expression level of a gene. Conversely, the trend toward increasing compactness for highly expressed genes could be caused by selection for minimization of energy and time expenditure during transcription and splicing, and for increased fidelity of transcription, splicing and/or translation that is likely to be particularly critical for highly expressed genes. Regardless of the exact nature of the forces that shape the gene architecture, we present evidence that, at least, in animals, coding and noncoding parts of genes show similar architectonic trends.
Keywords: eukaryotic gene structure, eukaryotic gene architecture, selection on gene compactness, genomic design, intron functionality, intron density.

J28. Nandor Nagy, Olive Mwizerwa, Karina Yaniv, Liran Carmel, Rafael Pieretti-Vanmarcke, Brant M. Weinstein and Allan M. Goldstein (2009)
Endothelial cells promote migration and proliferation of enteric neural crest cells via 1 integrin signaling
Developmental Biology 330:263-272. web pdf
Enteric neural crest-derived cells (ENCCs) migrate along the intestine to form a highly organized network of ganglia that comprises the enteric nervous system (ENS). The signals driving the migration and patterning of these cells are largely unknown. Examining the spatiotemporal development of the intestinal neurovasculature in avian embryos, we find endothelial cells (ECs) present in the gut prior to the arrival of migrating ENCCs. These ECs are patterned in concentric rings that are predictive of the positioning of later arriving crest-derived cells, leading us to hypothesize that blood vessels may serve as a substrate to guide ENCC migration. Immunohistochemistry at multiple stages during ENS development reveals that ENCCs are positioned adjacent to vessels as they colonize the gut. A similar close anatomic relationship between vessels and enteric neurons was observed in zebrafish larvae. When EC development is inhibited in cultured avian intestine, ENCC migration is arrested and distal aganglionosis results, suggesting that ENCCs require the presence of vessels to colonize the gut. Neural tube and avian midgut were explanted onto a variety of substrates, including components of the extracellular matrix and various cell types, such as fibroblasts, smooth muscle cells, and endothelial cells. We find that crest-derived cells from both the neural tube and the midgut migrate avidly onto cultured endothelial cells. This EC-induced migration is inhibited by the presence of CSAT antibody, which blocks binding to 1 integrins expressed on the surface of crest-derived cells. These results demonstrate that ECs provide a substrate for the migration of ENCCs via an interaction between 1 integrins on the ENCC surface and extracellular matrix proteins expressed by the intestinal vasculature. These interactions may play an important role in guiding migration and patterning in the developing ENS.
Keywords: Enteric nervous system, Endothelial cells, Blood vessels, Hirschsprung's disease, Integrins, Avian, Zebrafish.

J27. Rea Ravin, Dan J. Hoeppner, D. M. Munno, Liran Carmel, J. Sullivan, D. L. Levitt, J. L. Miller, C. Athaide, D. M. Panchision and Ron D. McKay (2008)
Potency and fate specification in CNS stem cell populations in vitro
Cell Stem Cell 3:670-680. web pdf
To realize the promise of stem cell biology, it is important to identify the precise time in the history of the cell when developmental potential is restricted. To achieve this goal, we developed a real-time imaging system that captures the transitions in fate, generating neurons, astrocytes, and oligodendrocytes from single CNS stem cells in vitro. In the presence of bFGF, tripotent cells normally produce specified progenitors through a bipotent intermediate cell type. Surprisingly, the tripotent state is reset at each passage. The cytokine CNTF is thought to instruct multipotent cells to an astrocytic fate. We demonstrate that CNTF both directs astrogliogenesis from tripotent cells, bypassing two of the three normal bipotent intermediates, and later promotes the expansion of specified astrocytic progenitors. These results show how discrete cell types emerge from a multipotent cell and provide a strong basis for future studies to determine the molecular basis of fate specification.

J26. Sol Efroni, Liran Carmel, Carl G. Schaefer and Ken H. Buetow (2008)
Superposition of transcriptional behaviors determines gene state
PLoS One 3:e2901. web pdf
We introduce a novel technique to determine the expression state of a gene from quantitative information measuring its expression. Adopting a productive abstraction from current thinking in molecular biology, we consider two expression states for a gene - Up or Down. We determine this state by using a statistical model that assumes the data behaves as a combination of two biological distributions. Given a cohort of hybridizations, our algorithm predicts, for the single reading, the probability of each gene's being in an Up or a Down state in each hybridization. Using a series of publicly available gene expression data sets, we demonstrate that our algorithm outperforms the prevalent algorithm. We also show that our algorithm can be used in conjunction with expression adjustment techniques to produce a more biologically sound gene-state call. The technique we present here enables a routine update, where the continuously evolving expression level adjustments feed into gene-state calculations. The technique can be applied in almost any multi-sample gene expression experiment, and holds equal promise for protein abundance experiments.

J25. Igor B. Rogozin, Karen Thomson, Miklos Csuros, Liran Carmel and Eugene V. Koonin (2008)
Homoplasy in genome-wide analysis of rare amino acid replacements: the molecular-evolutionary basis for Vavilov's law of homologous series
Biology Direct 3:7. web pdf
Background: Rare genomic changes (RGCs) that are thought to comprise derived shared characters of individual clades are becoming an increasingly important class of markers in genome-wide phylogenetic studies. Recently, we proposed a new type of RGCs designated RGC_CAMs (after Conserved Amino acids-Multiple substitutions) that were inferred using genome-wide identification of amino acid replacements that were: i) located in unambiguously aligned regions of orthologous genes, ii) shared by two or more taxa in positions that contain a different, conserved amino acid in a much broader range of taxa, and iii) require two or three nucleotide substitutions. When applied to animal phylogeny, the RGC_CAM approach supported the coelomate clade that unites deuterostomes with arthropods as opposed to the ecdysozoan (molting animals) clade. However, a non-negligible level of homoplasy was detected.
Results: We provide a direct estimate of the level of homoplasy caused by parallel changes and reversals among the RGC_CAMs using 462 alignments of orthologous genes from 19 eukaryotic species. It is shown that the impact of parallel changes and reversals on the results of phylogenetic inference using RGC_CAMs cannot explain the observed support for the Coelomata clade. In contrast, the evidence in support of the Ecdysozoa clade, in large part, can be attributed to parallel changes. It is demonstrated that parallel changes are significantly more common in internal branches of different subtrees that are separated from the respective common ancestor by relatively short times than in terminal branches separated by longer time intervals. A similar but much weaker trend was detected for reversals. The observed evolutionary trend of parallel changes is explained in terms of the covarion model of molecular evolution. As the overlap between the covarion sets in orthologous genes from different lineages decreases with time after divergence, the likelihood of parallel changes decreases as well.
Conclusions: The level of homoplasy observed here appears to be low enough to justify the utility of RGC_CAMs and other types of RGCs for resolution of hard problems in phylogeny. Parallel changes, one of the major classes of events leading to homoplasy, occur much more often in relatively recently diverged lineages than in those separated from their last common ancestor by longer time intervals of time. This pattern seems to provide the molecular-evolutionary underpinning of Vavilov's law of homologous series and is readily interpreted within the framework of the covarion model of molecular evolution.
Reviewers: This article was reviewed by Alex Kondrashov, Nicolas Galtier, and Maximilian Telford and Robert Lanfear (nominated by Laurence Hurst).

J24. Malay K. Basu, Liran Carmel, Igor B. Rogozin and Eugene V. Koonin (2008)
Evolution of protein domain promiscuity in eukaryotes
Genome Research 18:449-461. web pdf
Numerous eukaryotic proteins contain multiple domains. Certain domains show a tendency to occur in diverse domain architectures and can be considered "promiscuous". These promiscuous domains are, typically, involved in protein-protein interactions and play crucial roles in interaction networks, particularly, those that contribute to signal transduction. A systematic comparative-genomic analysis of promiscuous domains in eukaryotes is described. Two quantitative measures of domain promiscuity are introduced and applied to the analysis of 28 genomes of diverse eukaryotes. Altogether, 215 domains are identified as strongly promiscuous. The fraction of promiscuous domains in animals is shown to be significantly greater than that in fungi or plants. Evolutionary reconstructions indicate that domain promiscuity is a volatile, relatively fast-changing feature of eukaryotic proteins, with few domains remaining promiscuous throughout the evolution of eukaryotes. Some domains appear to have attained promiscuity independently in different lineages, e.g., animals and plants. It is proposed that promiscuous domains persist within a relatively small pool of evolutionarily stable domain combinations from which numerous rare architectures emerge during evolution. Domain promiscuity positively correlates with the number of experimentally detected domain interactions and with the strength of purifying selection affecting a domain. Thus, evolution of promiscuous domains seems to be constrained by the diversity of their interaction partners. The set of promiscuous domains is enriched for domains mediating protein-protein interactions that are involved in various forms of signal transduction, especially, in the ubiquitin system and in the chromatin. Thus, a limited repertoire of promiscuous domains makes a major contribution to the diversity and evolvability of eukaryotic proteomes and signaling networks.

J23. Rafi Haddad, Liran Carmel, Noam Sobel and David Harel (2008)
Predicting the receptive range of olfactory receptors
PLoS Computational Biology 4:e18. web pdf
Although the family of genes encoding for olfactory receptors was identified more than 15 years ago, the difficulty of functionally expressing these receptors in an heterologous system has, with only some exceptions, rendered the receptive range of given olfactory receptors largely unknown. Furthermore, even when successfully expressed, the task of probing such a receptor with thousands of odors/ligands remains daunting. Here we provide proof of concept for a solution to this problem. Using computational methods we tune an electronic nose to the receptive range of an olfactory receptor. We then use this electronic nose to predict the receptors' response to other odorants. Our method can be used to identify the receptive range of olfactory receptors, and can also be applied to other questions involving receptor-ligand interactions in non-olfactory settings.

J22. Igor B. Rogozin, Yuri I. Wolf, Liran Carmel and Eugene V. Koonin (2007)
Analysis of rare amino acid replacements supports the Coelomata clade
Molecular Biology and Evolution 24:2594-2597. web pdf
The recent analysis of a novel class of rare genomic changes, RGC_CAMs (after Conserved Amino acids-Multiple substitutions), supported the Coelomata clade of animals as opposed to the Ecdysozoa clade (Rogozin et al. 2007). A subsequent re-analysis, with the sequences from the sea anemone Nematostella vectensis included in the set of outgroup species, suggested that this result was an artefact caused by reverse amino replacements and claimed support for Ecdysozoa (Irimia et al. 2007). We show that the internal branch connecting the sea anemone to the bilaterian animals is extremely short, resulting in a weak statistical support for the Coelomata clade. Direct estimation of the level of homoplasy, combined with taxon sampling with different sets of outgroup species, reinforces the support for Coelomata whereas the effect of reversals is shown to be relatively minor.
Keywords: Phylogenetic analysis, cladistics, rare genomic changes, coelomata, ecdysozoa.

J21. Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin (2007)
Patterns of intron gain and conservation in eukaryotic genes
BMC Evoluionary Biology 7:192. web pdf
Background: The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions.
Results: We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs.
Conclusions: We obtained robust estimates of the contribution of parallel gain to the observed sharing of intron positions between eukaryotic species separated by different evolutionary distances. The results indicate that, although the contribution of parallel gains varies across the phylogenetic tree, the high level of intron position sharing is due, primarily, to evolutionary conservation. Accordingly, numerous introns appear to persist in the same position over hundreds of millions of years of evolution. This is compatible with recent observations of a negative correlation between the rate of intron gain and coding sequence evolution rate of a gene, suggesting that at least some of the introns are functionally relevant.

J20. Liran Carmel and David Harel (2007)
Mix-to-mimic odor synthesis for electronic noses
Sensors and Actuators B: Chemical 125:635-643. web pdf
Arrays of chemical sensors, known as electronic noses, yield a unique pattern for a given mixture of odors. Recently, there has been increasing interest in trying to mix odors such as to generate a desired response in the electronic nose. For the time being, this intriguing problem had been tackled only experimentally with the aid of specific apparatus. Here, we present an algorithmic solution to the problem. We demonstrate the algorithm on data that includes mixtures of up to five ingredients.
Keywords: odor communication, sniffer, whiffer, within-sniffer mix-to-mimic algorithm, electronic nose.

J19. Alissa M. Resch, Liran Carmel, Leonardo Mario-Ramrez, Aleksey Y. Ogurtsov, Svetlana A. Shabalina, Igor B. Rogozin and Eugene V. Koonin (2007)
Widespread positive selection in synonymous sites of mammalian genes
Molecular Biology and Evolution 24:1821-1831. web pdf
Evolution of protein sequences is largely governed by purifying selection, with a small fraction of proteins evolving under positive selection. The evolution at synonymous positions in protein-coding genes is not nearly as well understood, with the extent and types of selection remaining, largely, unclear. A statistical test to identify purifying and positive selection at synonymous sites in protein-coding genes was developed. The method compares the rate of evolution at synonymous sites (Ks) to that in intron sequences of the same gene after sampling the aligned intron sequences to mimic the statistical properties of coding sequences. We detected purifying selection at synonymous sites in 28% of the 1562 analyzed orthologous genes from mouse and rat, and positive selection in 12% of the genes. Thus, the fraction of genes with readily detectable positive selection at synonymous sites is much greater than the fraction of genes with comparable positive selection at non-synonymous sites, i.e., at the level of the protein sequence. Unlike other genes, the genes with positive selection at synonymous sites showed no correlation between Ks and the rate of evolution in non-synonymous sites (Ka), indicating that evolution of synonymous sites under positive selection is decoupled from protein evolution. The genes with purifying selection at synonymous sites showed significant anticorrelation between Ks and expression level and breadth indicating that highly expressed genes evolve slowly. The genes with positive selection at synonymous sites showed the opposite trend, i.e., highly expressed genes had, on average, higher Ks. For the genes with positive selection at synonymous sites, a significantly lower mRNA stability is predicted compared to the genes with negative selection. Thus, mRNA destabilization could be an important factor driving positive selection in non-synonymous sites, probably, through regulation of expression at the level of mRNA degradation and, possibly, also translation rate. So, unexpectedly, we found that positive selection at synonymous sites of mammalian genes is substantially more common than positive selection at the level of protein sequences. Positive selection at synonymous sites migh.
Keywords: synonymous sites, non-synonymous sites, positive selection, purifying selection, introns.

J18. Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin (2007)
Evolutionarily conserved genes preferentially accumulate introns
Genome Research 17:1045-1050. web pdf
Introns that interrupt eukaryotic protein-coding sequences are generally thought to be nonfunctional. However, for reasons still poorly understood, positions of many introns are highly conserved in evolution. Previous reconstructions of intron gain and loss events during eukaryotic evolution used a variety of simplified evolutionary models that yielded contradicting conclusions and are not suited to reveal some of the key underlying processes. We combine a comprehensive probabilistic model and an extended data set, including 391 conserved genes from 19 eukaryotes, to uncover previously unnoticed aspects of intron evolution - in particular, to assign intron gain and loss rates to individual genes. The rates of intron gain and loss in a gene show moderate positive correlation. A gene's intron gain rate shows a highly significant negative correlation with the coding-sequence evolution rate; intron loss rate also significantly, but positively, correlates with the sequence evolution rate. Correlations of the opposite signs, albeit less significant ones, are observed between intron gain and loss rates and gene expression level. It is proposed that intron evolution includes a neutral component, which is manifest in the positive correlation between the gain and loss rates and a selection-driven component as reflected in the links between intron gain and loss and sequence evolution. The increased intron gain and decreased intron loss in evolutionarily conserved genes indicate that intron insertion often might be adaptive, whereas some of the intron losses might be deleterious. This apparent functional importance of introns is likely to be due, at least in part, to their multiple effects on gene expression.

J17. Liran Carmel, Yuri I. Wolf, Igor B. Rogozin and Eugene V. Koonin (2007)
Three distinct modes of intron dynamics in the evolution of eukaryotes
Genome Research 17:1034-1044. web pdf
Several contrasting scenarios have been proposed for the origin and evolution of spliceosomal introns, a hallmark of eukaryotic genes. A comprehensive probabilistic model to obtain a definitive reconstruction of intron evolution was developed and applied to 391 sets of conserved genes from 19 eukaryotic species. It is inferred that a relatively high intron density was reached early, i.e., the last common ancestor of eukaryotes contained >2.15 introns/kilobase, and the last common ancestor of multicellular life forms harbored 3.4 introns/kilobase, a greater intron density than in most of the extant fungi and in some animals. The rates of intron gain and intron loss appear to have been dropping during the last 1.3 billion years, with the decline in the gain rate being much steeper. Eukaryotic lineages exhibit three distinct modes of evolution of the intron-exon structure. The primary, balanced mode, apparently, operates in all lineages. In this mode, intron gain and loss are strongly and positively correlated, in contrast to previous reports on inverse correlation between these processes. The second mode involves an elevated rate of intron loss and is prevalent in several lineages, such as fungi and insects. The third mode, characterized by elevated rate of intron gain, is seen only in deep branches of the tree, indicating that bursts of intron invasion occurred at key points in eukaryotic evolution, such as the origin of animals. Intron dynamics could depend on multiple mechanisms, and in the balanced mode, gain and loss of introns might share common mechanistic features.

J16. Igor B. Rogozin, Yuri I. Wolf, Liran Carmel and Eugene V. Koonin (2007)
Ecdysozoan clade rejected by genome-wide analysis of rare amino acid replacements
Molecular Biology and Evolution 24:1080-1090. web pdf
As the number of sequenced genomes from diverse walks of life rapidly increases, phylogenetic analysis is entering a new era: reconstruction of the evolutionary history of organisms on the basis of full-scale comparison of their genomes. In addition to brute force, genome-wide analysis of alignments, rare genomic characters (RGCs) that are thought to comprise derived shared characters of individual clades are increasingly used in genome-wide phylogenetic studies. We propose a new type of RGCs designated RGC_CAMs (after Conserved Amino acids-Multiple substitutions), which are inferred using a genome-scale analysis of protein and underlying nucleotide sequence alignments. The RGC_CAM approach utilizes amino acid residues conserved in major eukaryotic lineages, with the exception of a few species comprising a putative clade, and selects for phylogenetic inference only those amino acid replacements that require 2 or 3 nucleotide substitutions, in order to reduce homoplasy. The RGC_CAM analysis was combined with a procedure for rigorous statistical testing of competing phylogenetic hypotheses. The RGC_CAM method is shown to be robust to branch length differences and taxon sampling. When applied to animal phylogeny, the RGC_CAM approach strongly supports the coelomate clade that unites chordates with arthropods as opposed to the ecdysozoan (molting animals) clade. This conclusion runs against the view of animal evolution that is currently prevailing in the evo-devo community. The final solution to the coelomate-ecdysozoa controversy will require a much larger set of complete genome sequences representing diverse animal taxa. It is expected that RGC_CAM and other RGC-based methods will be crucial for these future, definitive phylogenetic studies.
Keywords: Phylogenetic analysis, cladistics, rare genomic changes, coelomata, ecdysozoa, microsporidia.

J15. Rafi Haddad, Liran Carmel and David Harel (2007)
A feature extraction algorithm for multi-peak signals in electronic noses
Sensors and Actuators B: Chemical 120:467-472. web pdf
The Lorentzian model is a powerful feature extraction technique for electronic noses. In a previous work, it was applied to single-peak transient signals and was shown to achieve lower classification error rate than other feature extraction techniques. Here, we generalize the Lorentzian model by showing how to apply it to transient signals that are comprised of more than a single peak. The model is based on a fast and robust fitting of the measured signals to a physically meaningful analytic curve. We show that this model fits equally well to sensors of different technologies and embeddings, suggesting its applicability to a diverse repertoire of sensors and analytic devices.
Keywords: feature extraction, electronic nose, signal processing, multiple peaks.

J14. Ekaterina Kuznetsova, Michael Proudfoot, Claudio F. Gonzalez, Greg Brown, Marina V. Omelchenko, Ivan Borozan, Liran Carmel, Yuri I. Wolf, Hirotada Mori, Alexei V. Savchenko, Cheryl H. Arrowsmith, Eugene V. Koonin, Aled M. Edwards and Alexander F. Yakunin (2006)
Genome-wide analysis of substrate specificities of the Escherichia coli haloacid dehalogenase-like phosphatase family
Journal of Biological Chemistry 281:36149-36161. web pdf
Members of enzyme families catalyze similar reactions, but have evolved specific biological functions. Comprehensive determination of the substrate specificities of enzymes is the essential first step toward elucidation of these functions and contributions of individual enzymes to the metabolome. Haloacid dehalogenase (HAD)-like hydrolases are a vast superfamily of largely uncharacterized enzymes, with a few members shown to possess phosphatase, -phosphoglucomutase, phosphonatase, and dehalogenase activities. Using a representative set of 80 phosphorylated substrates, we characterized the substrate specificities of 23 soluble HADs encoded in the Escherichia coli genome. We identified small molecule phosphatase activity in 21 HADs and -phosphoglucomutase activity in one protein. The E. coli HAD phosphatases show high catalytic efficiency and affinity to a wide range of phosphorylated metabolites (sugars, nucleotides, organic acids, cofactors) that are intermediates of various metabolic reactions (glycolysis, pentose phosphate pathway, gluconeogenesis, intermediary sugar and nucleotide metabolism). Rather than following the classical "one enzyme - one substrate" model, most of the E. coli HADs show remarkably broad and overlapping substrate spectra. At least 12 reactions catalyzed by HADs currently have no EC numbers assigned in the Enzyme Nomenclature. Surprisingly, most HADs hydrolyzed small phosphodonors (acetyl phosphate, carbamoyl phosphate, phosphoramidate), which also serve as substrates for autophoshorylation of the receiver domains of the two-component signal transduction systems. The physiological relevance of the phosphatase activity with the preferred substrate was validated in vivo for one of the HADs, YniC. Many of the secondary activities of HADs might have no immediate physiological function, but could comprise a reservoir for evolution of novel phosphatases.

J13. Yuri I. Wolf, Liran Carmel and Eugene V. Koonin (2006)
Unifying measures of gene function and evolution
Proceedings of the Royal Society B 273:1507-1515. web pdf
Recent genome analyses revealed intriguing correlations between variables characterizing the functioning of a gene, such as expression level (EL), connectivity of genetic and protein-protein interaction networks, and knockout effect, and variables describing gene evolution, such as sequence evolution rate (ER) and propensity for gene loss. Typically, variables within each of these classes are positively correlated, e.g. products of highly expressed genes also have a propensity to be involved in many protein-protein interactions, whereas variables between classes are negatively correlated, e.g. highly expressed genes, on average, evolve slower than weakly expressed genes. Here, we describe principal component (PC) analysis of seven genome-related variables and propose biological interpretations for the first three PCs. The first PC reflects a gene's 'importance', or the 'status' of a gene in the genomic community, with positive contributions from knockout lethality, EL, number of protein-protein interaction partners and the number of paralogues, and negative contributions from sequence ER and gene loss propensity. The next two PCs define a plane that seems to reflect the functional and evolutionary plasticity of a gene. Specifically, PC2 can be interpreted as a gene's 'adaptability' whereby genes with high adaptability readily duplicate, have many genetic interaction partners and tend to be non-essential. PC3 also might reflect the role of a gene in organismal adaptation albeit with a negative rather than a positive contribution of genetic interactions; we provisionally designate this PC 'reactivity'. The interpretation of PC2 and PC3 as measures of a gene's plasticity is compatible with the observation that genes with high values of these PCs tend to be expressed in a condition- or tissue-specific manner. Functional classes of genes substantially vary in status, adaptability and reactivity, with the highest status characteristic of the translation system and cytoskeletal proteins, highest adaptability seen in cellular processes and signalling genes, and top reactivity characteristic of metabolic enzymes.
Keywords: gene expression, gene dispensability, protein-protein interaction, sequence evolution rate, gene loss, principal component analysis.

J12. Liran Carmel, Sol Efroni, Peter D. White, Eric Aslakson, Ute Vollmer-Conna and Mangalathu S. Rajeevan (2006)
Gene expression profile of empirically delineated classes of unexplained chronic fatigue
Pharmacogenomics 7:375-386. web pdf
Objectives: To identify the underlying gene expression profiles of unexplained chronic fatigue subjects classified into five or six class solutions by principal component (PCA) and latent class analyses (LCA).
Methods: Microarray expression data were available for 15,315 genes and 111 female subjects enrolled from a population-based study on chronic fatigue syndrome. Algorithms were developed to assign gene scores and threshold values that signified the contribution of each gene to discriminate the multiclasses in each LCA solution. Unsupervised dimensionality reduction was first used to remove noise or otherwise uninformative gene combinations, followed by supervised dimensionality reduction to isolate gene combinations that best separate the classes.
Results: The authors' gene score and threshold algorithms identified 32 and 26 genes capable of discriminating the five and six multiclass solutions, respectively. Pair-wise comparisons suggested that some genes (zinc finger protein 350 [ZNF350], solute carrier family 1, member 6 [SLC1A6], F-box protein 7 [FBX07] and vacuole 14 protein homolog [VAC14]) distinguished most classes of fatigued subjects from healthy subjects, whereas others (patched homolog 2 [PTCH2] and T-cell leukemia/lymphoma [TCL1A]) differentiated specific fatigue classes.
Conclusion: A computational approach was developed for general use to identify discriminatory genes in any multiclass problem. Using this approach, differences in gene expression were found to discriminate some classes of unexplained chronic fatigue, particularly one termed interoception.
Keywords: chronic fatigue syndrome, Fisher quotient and discriminatory genes, gene expression and gene scores, interoception, latent class analysis, principal component analysis.

J11. Liran Carmel, Noa Sever and David Harel (2005)
On predicting responses to mixtures in quartz microbalance sensors
Sensors and Actuators B: Chemical (special issue, selected papers from ISOEN 2003 - the 10th International Symposium on Olfaction and Electronic Noses; edited by J. Kleperis, L. Grinberga, A. D'Amico & M. Koudelka-Hep) 106:128-135. web pdf
A fundamental question in studying odor patterns in electronic noses is how to estimate the response to a mixture, given the response curves of the pure chemicals. We study this question by proposing two mixture-predicting models, and verify them against real data collected using quartz microbalance sensors. We find that a simple additive law explains fairly well the measured response patterns of binary mixtures, but that a slightly more complicated mixing model is required in order to produce good estimations of the response patterns of mixtures that are comprised of more than two compounds.
Keywords: electronic noses, mixtures, response prediction, mixing model, law of mixing, quartz microbalance sensors.

J10. Liran Carmel (2005)
Electronic nose signal restoration - beyond the dynamic range limit
Sensors and Actuators B: Chemical (special issue, selected papers from ISOEN 2003 - the 10th International Symposium on Olfaction and Electronic Noses; edited by J. Kleperis, L. Grinberga, A. D'Amico & M. Koudelka-Hep) 106:95-100. web pdf
When measuring over-concentrated stimuli, chemical sensors tend to exhibit corrupted time signals, which are normally categorized as missing data. Such a failure of one or more sensors occurs frequently in applications where an eNose is exposed to a diverse repertoire of chemicals. As a rule, missing data are removed from the dataset by leaving a potentially large portion of the original dataset unutilized. Here we propose an algorithm to handle such missing data by utilizing intact regions of corrupted signals to restore the damaged regions. We do so by fitting a parametric model of the sensor response over time to the intact regions, and using the resulting model for the restoration. We show that the restoration is both accurate and consistent, thus allowing for the restored signals to take part in any subsequent data analysis process.
Keywords: electronic nose, signal restoration, missing data, signal corruption, signal failure.

J9. Oded Shaham, Liran Carmel and David Harel (2005)
On mapping between electronic noses
Sensors and Actuators B: Chemical (special issue, selected papers from ISOEN 2003 - the 10th International Symposium on Olfaction and Electronic Noses; edited by J. Kleperis, L. Grinberga, A. D'Amico & M. Koudelka-Hep) 106:76-82. web pdf
We consider the task of finding a mapping between two eNoses that employ two different sensor technologies, quartz microbalance and conducting polymers. Such a mapping is a model that predicts the response of one eNose based on the response of the other. eNose mappings are important for odor communication and synthesis, as well as for eNose data integration. We investigated a number of methods for performing this task, including principal components regression, partial least squares, neural networks and tessellation-based linear interpolation. Our measure of success is the percentage of predictions that are correctly classifiable. Using two different techniques for splitting our data set, we achieved success rates of 67% and 100%.

J8. Yehuda Koren and Liran Carmel (2004)
Robust linear dimensionality reduction
IEEE Trans. Visualization and Computer Graphics 10:459-470. web pdf
We present a novel family of data-driven linear transformations, aimed at finding low dimensional embeddings of multivariate data, in a way that optimally preserves the structure of the data. The well-studied PCA and Fisher's LDA are shown to be special members in this family of transformations, and we demonstrate how to generalize these two methods such as to enhance their performance. Furthermore, our technique is the only one, to the best of our knowledge, that reflects in the resulting embedding both the data coordinates and pairwise similarities and/or dissimilarities between the data elements. Even more so, when information on the clustering (labeling) decomposition of the data is known, this information can also be integrated in the linear transformation, resulting in embeddings that clearly show the separation between the clusters, as well as their internal structure. All this makes our technique very flexible and powerful, and lets us cope with kinds of data that other techniques fail to describe properly.
Index terms: dimensionality reduction, visualization, classification, feature extraction, projection, linear transformation, principal component analysis, Fisher's linear discriminant analysis.

J7. Liran Carmel, David Harel and Yehuda Koren (2004)
Combining hierarchy and energy for drawing directed graphs
IEEE Trans. Visualization and Computer Graphics 10:46-57. web pdf
We present an algorithm for drawing directed graphs, which is based on rapidly solving a unique one-dimensional optimization problem for each of the axes. The algorithm results in a clear description of the hierarchy structure of the graph. Nodes are not restricted to lie on fixed horizontal layers, resulting in layouts that convey the symmetries of the graph very naturally. The algorithm can be applied without change to cyclic or acyclic digraphs, and even to graphs containing both directed and undirected edges. We also derive a hierarchy index from the input digraph, which quantitatively measures its amount of hierarchy.
Keywords: Directed graph drawing, force directed layout, hierarchy energy, Fiedler vector, minimum linear arrangement.

J6. Yehuda Koren, Liran Carmel and David Harel (2003)
Drawing huge graphs by algebraic multigrid optimization
Multiscale Modeling and Simulation 1:645-673. web pdf
We present an extremely fast graph drawing algorithm for very large graphs, which we term ACE (for Algebraic multigrid Computation of Eigenvectors). ACE exhibits a vast improvement over the fastest algorithms we are currently aware of; using a serial PC, it draws graphs of millions of nodes in less than a minute. ACE finds an optimal drawing by minimizing a quadratic energy function. The minimization problem is expressed as a generalized eigenvalue problem, which is solved rapidly using a novel algebraic multigrid technique. The same generalized eigenvalue problem seems to come up also in other fields, hence ACE appears to be applicable outside graph drawing too.
Keywords: algebraic multigrid, multiscale/multilevel optimization, graph drawing, generalized eigenvalue problem, Fiedler vector, force directed layout, the Hall energy.

J5. Liran Carmel, Noa Sever, Doron Lancet and David Harel (2003)
An e-Nose algorithm for identifying chemicals and determining their concentration
Sensors and Actuators B: Chemical (special issue, Proceedings of the 9th international Meeting on Chemical Sensors, 2002; edited by J. Stetter & S. Yao) 93:77-83. web pdf
We propose an algorithm for use with multisensor systems that is capable of the following: a) identify an analyte independently of its concentration; b) estimate the concentration of the analyte, even if the system was not previously exposed to this concentration; c) tell when an analyte is of a chemical type not previously presented to the system. The algorithm, based upon recent work of Hopfield, uses the multiplicity of sensors explicitly, and is intuitive and easy to implement. We have tested it against real data, and it exhibits high quality performance.
Keywords: electronic noses, classification, identification, concentration estimation, reject option, Hopfield algorithm.

J4. Liran Carmel, Shlomo Levy, Doron Lancet and David Harel (2003)
A feature extraction method for chemical sensors in electronic noses
Sensors and Actuators B: Chemical (special issue, Proceedings of the 9th international Meeting on Chemical Sensors, 2002; edited by J. Stetter & S. Yao) 93:67-76. web pdf
We propose a new feature extraction method for use with chemical sensors. It is based on fitting a parametric analytic model of the sensor's response over time to the measured signal, and taking the set of best-fitting parameters as the features. The process of finding the features is fast and robust, and the resulting set of features is shown to significantly enhance the performance of subsequent classification algorithms. Moreover, the model that we have developed fits equally well to sensors of different technologies and embeddings, suggesting its applicability to a diverse repertoire of sensors and analytic devices.
Keywords: feature extraction, electronic nose, curve fitting, quartz-microbalance sensors, metal-oxide sensors.

J3. David Harel, Liran Carmel and Doron Lancet (2003)
Towards an odor communication system
Computational Biology and Chemistry 27:121-133. web pdf
We propose a setup for an odor communication system. Its different parts are described, and ways to realize them are outlined. Our scheme enables an output device --- the whiffer --- to release an imitation of an odorant read in by an input device --- the sniffer --- upon command. The heart of the system is the novel algorithmic scheme that makes the scheme feasible. We are currently at work researching and developing some of the components that constitute the algorithm, and we hope that the description of the overall scheme in this paper will help to get other groups to join in this effort.
Keywords: odor communication system, palette odorants, odor space, odorant mixing, sniffer, whiffer.

J2. Liran Carmel, David Harel and Doron Lancet (2001)
Estimating the size of the olfactory repertoire
Bulletin of Mathematical Biology 63:1063-1078. web pdf
The concept of shape space, which has been successfully implemented in immunology, is used here to construct a model for the discrimination power of the olfactory system. Using reasonable assumptions on the behavior of the biological system, we are able to estimate the number of distinct olfactory receptor types. Our estimated value of around 1000 receptor types is in high agreement with experimental data.

J1. Liran Carmel and Ady Mann (2000)
Geometrical approach to two-level Hamiltonians
Physical Reviews A 61:052113. web pdf
Two-level systems were shown to be fully described by a single function, known sometimes as the Stueckelberg parameter. Using concepts from differential geometry, we give geometrical meaning to the Stueckelberg parameter and to other related quantities. As a result, a generalization of the Stueckelberg parameter is introduced, and a relation obtained between two-level systems and spatial one-dimensional curves in three-dimensional space. Previous authors used this Stueckelberg parameter to solve analytically several two-level models. We further develop this idea, and solve analytically three fundamental models, from which many other known models emerge as special cases. We present the detailed analysis of these models.
PACS: 03.65.Db, 34.10.+x, 31.15.-p, 42.50.-p.

Proceedings (full papers):
P5. Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin (2005)
An expectation-maximization algorithm for analysis of evolution of exon-intron structure of eukaryotic genes
Lecture Notes in Bioinformatics (A. McLysaght and D. H. Huson, Eds.): Proc. RECOMB 2005 Comparative Genomics International Workshop (RCG 2005) 3678:35-46. web pdf
We propose a detailed model of evolution of exon-intron structure of eukaryotic genes that takes into account gene-specific intron gain and loss rates, branch-specific gain and loss coefficients, invariant sites incapable of intron gain, and rate variability of both gain and loss which is gamma-distributed across sites. We develop an expectation-maximization algorithm to estimate the parameters of this model, and study its performance using simulated data.

P4. Yehuda Koren and Liran Carmel (2003)
Visualization of labeled data using linear transformations
IEEE Symposium on Information Visualization 2003 (InfoVis 2003), 121-128. web pdf
We present a novel family of data-driven linear transformations, aimed at visualizing multivariate data in a low-dimensional space in a way that optimally preserves the structure of the data. The well-studied PCA and Fisher's LDA are shown to be special members in this family of transformations, and we demonstrate how to generalize these two methods such as to enhance their performance. Furthermore, our technique is the only one, to the best of our knowledge, that reflects in the resulting embedding both the data coordinates and pairwise similarities and/or dissimilarities between the data elements. Even more so, when information on the clustering (labeling) decomposition of the data is known, this information can be integrated in the linear transformation, resulting in embeddings that clearly show the separation between the clusters, as well as their intra-structure. All this make our technique very flexible and powerful, and let us cope with kinds of data that other techniques fail to describe properly.
Keywords: visualization, dimensionality-reduction, projection, principal component analysis, linear discriminant analysis, eigen-projection, classification.

P3. Liran Carmel, Yehuda Koren and David Harel (2003)
Visualizing and classifying odors using a similarity matrix
Proc. 9th International Symposium on Olfaction and Electronic Nose (ISOEN'02), 141-146. web pdf
The Lorentzian model is an analytic expression that describes the time response of electronic nose sensors. We show how this model can be utilized to calculate a normalized similarity index between any two measurements. The set of similarity indices is then used for two purposes: visualization of the data, and classification of new samples. The visualization is carried out using graph drawing tools, and the results are shown to bear some desired properties. The classification is done using a majority-decision type algorithm, and is demonstrated to have very low error rate.
Keywords: electronic noses, similarity index, feature extraction, Lorentzian model, graph drawing, visualization, classification.

P2. Liran Carmel, David Harel and Yehuda Koren (2002)
Drawing directed graphs using one-dimensional optimization
Lecture Notes in Computer Science (M. T. Goodrich and S. G. Kobourov, Eds.): Proc. Graph Drawing 2002 (GD 2002)2528:193-206. web pdf
We present an algorithm for drawing directed graphs, which is based on rapidly solving a unique one-dimensional optimization problem for each of the axes. The algorithm results in a clear description of the hierarchy structure of the graph. Nodes are not restricted to lie on fixed horizontal layers, resulting in layouts that convey the symmetries of the graph very naturally. The algorithm can be applied without change to cyclic or acyclic digraphs, and even to graphs containing both directed and undirected edges. We also derive a hierarchy index from the input digraph, which quantitatively measures its amount of hierarchy.

P1. Yehuda Koren, Liran Carmel and David Harel (2002)
ACE: A fast multiscale eigenvectors computation for drawing huge graphs
Proc. IEEE Symposium on Information Visualization 2002 (InfoVis 2002), 137-144. web pdf
We present an extremely fast graph drawing algorithm for very large graphs, which we term ACE (for Algebraic multigrid Computation of Eigenvectors). ACE exhibits an improvement of something like two orders of magnitude over the fastest algorithms we are aware of; it draws graphs of millions of nodes in less than a minute. ACE finds an optimal drawing by minimizing a quadratic energy function. The minimization problem is expressed as a generalized eigenvalue problem, which is rapidly solved using a novel algebraic multigrid technique. The same generalized eigenvalue problem seems to come up also in other fields, hence ACE appears to be applicable outside of graph drawing too.
Keywords: algebraic multigrid, multiscale/multilevel optimization, graph drawing, generalized eigenvalue problem, Fiedler vector, force directed layout, the Hall energy.

Book Chapters:
C2. Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin (2009)
A maximum likelihood method for reconstruction of the evolution of eukaryotic gene structure
in Computational System Biology (J. Mcdermott, R. Samudrala, R. Bumgarner, K. Montgomery, and R. Ireton, eds.); the Methods in Molecular Biology seires, Vol. 541 pp. 357-372, Humana Press. web pdf
Spliceosomal introns are one of the principal distinctive features of eukaryotes. Nevertheless, different large-scale studies disagree about even the most basic features of their evolution. In order to come up with a more reliable reconstruction of intron evolution, we developed a model that is far more comprehensive than previous ones. This model is rich in parameters, and estimating them accurately is infeasible by straightforward likelihood maximization. Thus, we have developed an expectation-maximization algorithm that allows for efficient maximization. Here, we outline the model and describe the expectation-maximization algorithm in detail. Since the method works with intron presence-absence maps, it is expected to be instrumental for the analysis of the evolution of other binary characters as well.
Keywords: maximum likelihood, expectation-maximization, intron evolution, profile likelihood, ancestral reconstruction, eukaryotic gene structure.

C1. Yuri I. Wolf, Liran Carmel and Eugene V. Koonin (2005)
Correlations between quantitative measures of genome evolution, expression and function
in Discovering Biomolecular Mechanisms with Computational Biology (F. Eisenhaber, ed.), Landes Bioscience, web pdf
In addition to multiple, complete genome sequences, genome-wide data on biological prop properties of genes, such as knockout effect, expression levels, protein-protein interactions, and others, are rapidly accumulating. Numerous attempts were made by many groups to examine connections between these properties and quantitative measures of gene evolution. The questions addressed pertain to the most fundamental aspects of biology: what determines the effect of the knockout of a given gene on the phenotype (in particular, is it essential or not) and the rate of a gene's evolution and how are the phenotypic properties and evolution connected? Many significant correlations were detected, e.g., positive correlation between the tendency of a gene to be lost during evolution and sequence evolution rate, and negative correlations between each of the above measures of evolutionary variability and expression level or the phenotypic effect of gene knockout. However, most of these correlations are relatively weak and explain a small fraction of the variation present in the data. We propose that the majority of the relationships between the phenotypic ("input") and evolutionary ("output") variables can be described with a single, composite variable, the gene's "social status in the genomic community", which reflects the biological role of the gene and its mode of evolution. "High-status" genes, involved in house-keeping processes, are more likely to be higher and broader expressed, to have more interaction partners, and to produce lethal or severely impaired knockout mutants. These genes also tend to evolve slower and are less prone to gene loss across various taxonomic groups. "Low-status" genes are expected to be weakly expressed, have fewer interaction partners, and exhibit narrower (and less coherent) phyletic distribution. On average, these genes evolve faster and are more often lost during evolution than high-status genes. The "gene status" notion may serve as a generator of null hypotheses regarding the connections between phenotypic and evolutionary parameters associated with genes. Any deviation from the expected pattern calls for attention - to the quality of the data, the nature of the analyzed relationship, or both.