baseCode.bio.geneset
Class GeneSetMapTools

java.lang.Object
  extended bybaseCode.bio.geneset.GeneSetMapTools

public class GeneSetMapTools
extends java.lang.Object

Methods to 'clean' a set of geneSets - to remove redundancies, for example.

Version:
$Id: GeneSetMapTools.java,v 1.5 2004/10/13 21:58:46 pavlidis Exp $
Author:
Paul Pavlidis

Constructor Summary
GeneSetMapTools()
           
 
Method Summary
static void collapseGeneSets(GeneAnnotations geneData, StatusViewer messenger)
          Identify classes which are absoluely identical to others.
static hep.aida.IHistogram1D geneSetSizeDistribution(GeneAnnotations ga, int numBins, int minSize, int maxSize)
           
static java.util.ArrayList getRedundancies(java.lang.String classId, java.util.Map classesToRedundantMap)
           
 java.lang.String getRedundanciesString(java.lang.String classId, java.util.Map classesToRedundantMap)
           
static java.util.ArrayList getSimilarities(java.lang.String classId, java.util.Map classesToSimilarMap)
           
static void ignoreSimilar(double fractionSameThreshold, GeneAnnotations ga, StatusViewer messenger, int maxClassSize, int minClassSize, double bigClassPenalty)
           Remove classes which are too similar to some other class.
static double meanGeneSetSize(GeneAnnotations ga, boolean countEmpty)
           
static double meanSetsPerGene(GeneAnnotations ga, boolean countEmpty)
           
static void removeAspect(GeneAnnotations ga, GONames gon, StatusViewer messenger, java.lang.String aspect)
           
static void removeBySize(GeneAnnotations ga, StatusViewer messenger, int minClassSize, int maxClassSize)
          Remove gene sets that don't meet certain criteria.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GeneSetMapTools

public GeneSetMapTools()
Method Detail

meanGeneSetSize

public static double meanGeneSetSize(GeneAnnotations ga,
                                     boolean countEmpty)
Parameters:
ga -
countEmpty - if false, gene sets that have no members are not counted in the total.
Returns:
The average size of the gene sets.

meanSetsPerGene

public static double meanSetsPerGene(GeneAnnotations ga,
                                     boolean countEmpty)
Parameters:
ga -
countEmpty - if false ,genes that have no gene sets assigned to them are not counted in the total.
Returns:
The average number of gene sets per gene (per probe actually). This is a measure of gene set overlap. If the value is 1, it means that each gene is (on average) in only one set. Large values indicate larger amounts of overelap between gene sets.

geneSetSizeDistribution

public static hep.aida.IHistogram1D geneSetSizeDistribution(GeneAnnotations ga,
                                                            int numBins,
                                                            int minSize,
                                                            int maxSize)

removeAspect

public static void removeAspect(GeneAnnotations ga,
                                GONames gon,
                                StatusViewer messenger,
                                java.lang.String aspect)
Parameters:
ga -
gon -
messenger -
aspect -

removeBySize

public static void removeBySize(GeneAnnotations ga,
                                StatusViewer messenger,
                                int minClassSize,
                                int maxClassSize)
Remove gene sets that don't meet certain criteria.

Parameters:
ga -
messenger -
minClassSize -
maxClassSize -

ignoreSimilar

public static void ignoreSimilar(double fractionSameThreshold,
                                 GeneAnnotations ga,
                                 StatusViewer messenger,
                                 int maxClassSize,
                                 int minClassSize,
                                 double bigClassPenalty)

Remove classes which are too similar to some other class. In addition, the user can select a penalty for large gene sets. Thus when two gene sets are found to be similar, the decision of which one to keep can be tuned based on the size penalty. We find it useful to penalize large gene sets so we tend to keep smaller ones (but not too small). Useful values of the penalty are above 1 (a value of 1 will result in the larger class always being retained).

The amount of similarity to be tolerated is set by the parameter fractionSameThreshold, representing the fraction of genes in the smaller class which are also found in the larger class. Thus, setting this threshold to be 0.0 means that no overlap is tolerated. Setting it to 1 means that classes will never be discarded.

Parameters:
fractionSameThreshold - A value between 0 and 1, indicating how similar a class must be before it gets ditched.
ga -
messenger - For updating a log.
maxClassSize - Large class considered. (that doesn't mean they are removed)
minClassSize - Smallest class considered. (that doesn't mean they are removed)
bigClassPenalty - A value greater or equal to one, indicating the cost of retaining a larger class in favor of a smaller one. The penalty is scaled with the difference in sizes of the two classes being considered, so very large classes are more heavily penalized.

collapseGeneSets

public static void collapseGeneSets(GeneAnnotations geneData,
                                    StatusViewer messenger)
Identify classes which are absoluely identical to others. This isn't superfast, because it doesn't know which classes are actually relevant in the data.


getRedundancies

public static java.util.ArrayList getRedundancies(java.lang.String classId,
                                                  java.util.Map classesToRedundantMap)
Parameters:
classId -
classesToRedundantMap -
Returns:

getSimilarities

public static java.util.ArrayList getSimilarities(java.lang.String classId,
                                                  java.util.Map classesToSimilarMap)
Parameters:
classId -
classesToSimilarMap -
Returns:

getRedundanciesString

public java.lang.String getRedundanciesString(java.lang.String classId,
                                              java.util.Map classesToRedundantMap)
Parameters:
classId -
classesToRedundantMap -
Returns:


Copyright © 2003-2005 Columbia University. All Rights Reserved.