|Gene-set Cohesion Analysis Tool (GCAT)|
GCAT utilizes Latent Semantic Indexing (LSI) of Medline abstracts to determine the functional coherence of gene sets. LSI was shown to be robust in identifying both explicit and implicit gene relationships (Homayouni et al. 2005). Here, an LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene as of November 2010. Based on gene-to-gene LSI derived similarities, a literature p-value (LPv) is estimated using Fisher's exact test by comparing the cohesion of the given gene set to a random one. Therefore, the LPv calculated by GCAT represents the significance of functional cohesion supported by the biomedical literature.
Since GCAT utilizes Latent Semantic Indexing (LSI), it can automatically determine both explicit and implicit (conceptual) relationships between genes from information in Medline abstracts. The ability to determine implied relationships is very useful in interpretation of genomic studies which identify new gene associations.
GCAT determines the inter-relationships between all genes in an experimental gene list. Unlike pathway oriented gene set enrichment approaches, GCAT can determine the global cohesion, considering cross-talk between genes in multiple pathways.
1. Prepare your input gene list, one gene per line, and paste into the text box labeled "Please enter your gene symbols or Entrez Gene IDs below". Example gene lists can be pasted in the box by clicking any of the four buttons above the text box. Example gene lists are collected from the following publication: