Home

How the code runs

There are 11 main classes in ermineJ:
1. affy_go_parse: Parses the file from the form [chip_id][GO1|GO2|GO3|...] Stores it in the Map called affy_go_map. Stores the reverse also int go_affy_map in the form [GO1][chip1,chip2...]
2. Pval_parse: Parses the file from the form [chip_id][pval], determine the target genes which are going to be computed
3. Ug_Parse: Parses the file from the form [probe][UG_ID], and then stores the temporary reverse map of the form [UD_ID][probe1,probe2,probe3...] then stores the values for each UG_ID in a hash map probe1->probe2,probe3 probe2->probe3,probe1 probe3->probe1,probe2    where each group represent items in the same unigene cluster.
4. GoName_parse: Parses the file from the form [GoID][Biological Name] The values are stored in a Map
5. ConstantStuff: stores constant variables
6. Stats: set of mathematics functions
7. histogram: Stores information relevant to a histogram
8. exp_class_scores: calculates a background distribution for class scores derived from randomly selected individual gene scores, then stores information to histogram
9. class_pvals: calculates the raw p-values, hypergeometric p-values, and ROC p-values
10.Class_Frame: Front end GUI for class scores
11.Final_Project: Final test file

The first 4 classes read in corresponding data files then parse those file then store all the information in maps or arrays. ConstantStuff and Stats contain necessary constants and functions for computation, the histogram stores information about random trails. exp_class_scores calculates the background distribution, and class_pvals use all the information above and generates all the results we need. Class_Frame provide GUI, which is triggered by Final_Project, the root of  whole program. Most of the source codes are straight forward, but here I want to explain how replicated genes are handled . In class_pvals, the in_pval means the p-val array that is actually used for randomized trials. Therefore, if weight is on, we use 'ug_pval_arr', which is pvals of all the unigenes; if weight is off, we use 'pvals', which is pvals of all the genes. More general, the difference between weight on and off is, when weight is off, all the computation use gene properties as input; while weight is on, we use unigene properties instead.


--