org.biojava.bio.symbol
Class SoftMaskedAlphabet

java.lang.Object
  extended byorg.biojava.utils.Unchangeable
      extended byorg.biojava.bio.symbol.SoftMaskedAlphabet
All Implemented Interfaces:
Alphabet, Annotatable, Changeable, FiniteAlphabet

public final class SoftMaskedAlphabet
extends Unchangeable
implements FiniteAlphabet

Soft masking is usually displayed by making the masked regions somehow different from the non masked regions. Typically the masked regions are lower case but other schemes could be invented. For example a softmasked DNA sequence may look like this:


 >DNA_sequence
 ATGGACGCTAGCATggtggtggtggtggtggtggtGCATAGCGAGCAAGTGGAGCGT

 
Where the lowercase regions are masked by low complexity.

SoftMaskedAlphabets come with SymbolTokenizers that understand how to read and write the softmasking. The interpretation of what constitutes a masked region is governed by an implementation of a MaskingDetector. The DEFAULT field of the MaskingDetector interface defines lower case tokens as masked.

Copyright (c) 2004 Novartis Institute for Tropical Diseases

Version:
1.0
Author:
Mark Schreiber

Nested Class Summary
 class SoftMaskedAlphabet.CaseSensitiveTokenization
          This SymbolTokenizer works with a delegate to softmask symbol tokenization as appropriate.
static interface SoftMaskedAlphabet.MaskingDetector
          Implementations will define how soft masking looks.
 
Nested classes inherited from class org.biojava.bio.Annotatable
Annotatable.AnnotationForwarder
 
Field Summary
 
Fields inherited from interface org.biojava.bio.symbol.Alphabet
EMPTY_ALPHABET, PARSERS, SYMBOLS
 
Fields inherited from interface org.biojava.bio.Annotatable
ANNOTATION
 
Method Summary
 void addSymbol(Symbol s)
          SoftMaskedAlphabets cannot add new Symbols.
 boolean contains(Symbol s)
           Returns whether or not this Alphabet contains the symbol.
 List getAlphabets()
          Gets the components of the Alphabet.
 Symbol getAmbiguity(Set s)
           Get a symbol that represents the set of symbols in syms.
 Annotation getAnnotation()
          The SoftMaskedAlphabet has no annotation
protected  FiniteAlphabet getDelegate()
          The compound alpha that holds the symbols used by this wrapper
 Symbol getGapSymbol()
           Get the 'gap' ambiguity symbol that is most appropriate for this alphabet.
static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask)
          Generates a soft masked Alphabet where lowercase tokens are assumed to be soft masked.
static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask, SoftMaskedAlphabet.MaskingDetector maskingDetector)
          Creates a compound alphabet that is a hybrid of the alphabet that is to be soft masked and a binary alphabet that indicates if any Symbol is soft masked or not.
 FiniteAlphabet getMaskedAlphabet()
           
 SoftMaskedAlphabet.MaskingDetector getMaskingDetector()
           
 String getName()
          The name of the Alphabet
 Symbol getSymbol(List l)
           Get a symbol from the Alphabet which corresponds to the specified ordered list of symbols.
 SymbolTokenization getTokenization(String type)
           Get a SymbolTokenization by name.
 boolean isMasked(AtomicSymbol s)
           
 Iterator iterator()
          Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet.
 void removeSymbol(Symbol s)
          SoftMaskedAlphabets cannot remove Symbols.
 int size()
          The number of symbols in the alphabet.
 void validate(Symbol s)
           Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.
 
Methods inherited from class org.biojava.utils.Unchangeable
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.biojava.utils.Changeable
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener
 

Method Detail

getInstance

public static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask)
                                      throws IllegalAlphabetException
Generates a soft masked Alphabet where lowercase tokens are assumed to be soft masked.

Parameters:
alphaToMask - for example the DNA alphabet.
Returns:
a reference to a singleton SoftMaskedAlphabet.
Throws:
IllegalAlphabetException - if it cannot be constructed

getInstance

public static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask,
                                             SoftMaskedAlphabet.MaskingDetector maskingDetector)
                                      throws IllegalAlphabetException
Creates a compound alphabet that is a hybrid of the alphabet that is to be soft masked and a binary alphabet that indicates if any Symbol is soft masked or not.

Parameters:
alphaToMask - for example the DNA alphabet.
maskingDetector - to define masking behaivour
Returns:
a reference to a singleton SoftMaskedAlphabet.
Throws:
IllegalAlphabetException - if it cannot be constructed

getMaskedAlphabet

public FiniteAlphabet getMaskedAlphabet()

getDelegate

protected FiniteAlphabet getDelegate()
The compound alpha that holds the symbols used by this wrapper

Returns:
a FiniteAlphabet

getAnnotation

public Annotation getAnnotation()
The SoftMaskedAlphabet has no annotation

Specified by:
getAnnotation in interface Annotatable
Returns:
Annotation.EMPTY_ANNOTATION

getName

public String getName()
The name of the Alphabet

Specified by:
getName in interface Alphabet
Returns:
a String in the form of "Softmasked {"+alphaToMask.getName()+"}"

getAlphabets

public List getAlphabets()
Gets the components of the Alphabet.

Specified by:
getAlphabets in interface Alphabet
Returns:
a List with two members, the first is the wrapped Alphabet the second is the binary SubIntegerAlphabet.

getSymbol

public Symbol getSymbol(List l)
                 throws IllegalSymbolException
Description copied from interface: Alphabet

Get a symbol from the Alphabet which corresponds to the specified ordered list of symbols.

The symbol at i in the list must be a member of the i'th alphabet in getAlphabets. If all of the symbols in rl are atomic, then the resulting symbol will also be atomic. If any one of them is an ambiguity symbol then the resulting symbol will be the appropriate ambiguity symbol.

Specified by:
getSymbol in interface Alphabet
Parameters:
l - A list of Symbol instances
Throws:
IllegalSymbolException - if the members of rl are not Symbols over the alphabets returned from getAlphabets

getAmbiguity

public Symbol getAmbiguity(Set s)
                    throws UnsupportedOperationException
Description copied from interface: Alphabet

Get a symbol that represents the set of symbols in syms.

Syms must be a set of Symbol instances each of which is contained within this alphabet. This method is used to retrieve ambiguity symbols.

Specified by:
getAmbiguity in interface Alphabet
Parameters:
s - the Set of Symbols that will be found in getMatches of the returned symbol
Returns:
a Symbol (possibly fly-weighted) for the Set of symbols in syms
Throws:
UnsupportedOperationException

getGapSymbol

public Symbol getGapSymbol()
Description copied from interface: Alphabet

Get the 'gap' ambiguity symbol that is most appropriate for this alphabet.

In general, this will be a BasisSymbol that represents a list of AlphabetManager.getGapSymbol() the same length as the getAlphabets list.

Specified by:
getGapSymbol in interface Alphabet
Returns:
the appropriate gap Symbol instance

contains

public boolean contains(Symbol s)
Description copied from interface: Alphabet

Returns whether or not this Alphabet contains the symbol.

An alphabet contains an ambiguity symbol iff the ambiguity symbol's getMatches() returns an alphabet that is a proper sub-set of this alphabet. That means that every one of the symbols that could match the ambiguity symbol is also a member of this alphabet.

Specified by:
contains in interface Alphabet
Parameters:
s - the Symbol to check
Returns:
boolean true if the Alphabet contains the symbol and false otherwise

validate

public void validate(Symbol s)
              throws IllegalSymbolException
Description copied from interface: Alphabet

Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.

This function is used all over the code to validate symbols as they enter a method. Also, the code is littered with catches for IllegalSymbolException. There is a preferred style of handling this, which should be covererd in the package documentation.

Specified by:
validate in interface Alphabet
Parameters:
s - the Symbol to validate
Throws:
IllegalSymbolException - if r is not contained in this alphabet

getMaskingDetector

public SoftMaskedAlphabet.MaskingDetector getMaskingDetector()

getTokenization

public SymbolTokenization getTokenization(String type)
                                   throws BioException
Description copied from interface: Alphabet

Get a SymbolTokenization by name.

The parser returned is guaranteed to return Symbols and SymbolLists that conform to this alphabet.

Every alphabet should have a SymbolTokenzation under the name 'token' that uses the symbol token characters to translate a string into a SymbolList. Likewise, there should be a SymbolTokenization under the name 'name' that uses symbol names to identify symbols. Any other names may also be defined, but the behaviour of the returned SymbolTokenization is not defined here.

A SymbolTokenization under the name 'default' should be defined for all sequences, that determines the behavior when printing out a sequence. Standard behavior is to define the 'token' SymbolTokenization as default if it exists, else to define the 'name' SymbolTokenization as the default, but others are possible.

Specified by:
getTokenization in interface Alphabet
Parameters:
type - the name of the parser
Returns:
a parser for that name
Throws:
BioException - if for any reason the tokenization could not be built

size

public int size()
Description copied from interface: FiniteAlphabet
The number of symbols in the alphabet.

Specified by:
size in interface FiniteAlphabet
Returns:
the size of the alphabet

iterator

public Iterator iterator()
Description copied from interface: FiniteAlphabet
Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet.

Each AtomicSymbol as for which this.contains(as) is true will be returned exactly once by this iterator in no specified order.

Specified by:
iterator in interface FiniteAlphabet
Returns:
an Iterator over the contained AtomicSymbol objects

addSymbol

public void addSymbol(Symbol s)
               throws ChangeVetoException
SoftMaskedAlphabets cannot add new Symbols. A ChangeVetoException will be thrown.

Specified by:
addSymbol in interface FiniteAlphabet
Parameters:
s - the Symbol to add.
Throws:
ChangeVetoException - when called.

removeSymbol

public void removeSymbol(Symbol s)
                  throws ChangeVetoException
SoftMaskedAlphabets cannot remove Symbols. A ChangeVetoException will be thrown.

Specified by:
removeSymbol in interface FiniteAlphabet
Parameters:
s - the Symbol to remove.
Throws:
ChangeVetoException - when called.

isMasked

public boolean isMasked(AtomicSymbol s)
                 throws IllegalSymbolException
Throws:
IllegalSymbolException