treebuild package¶
Submodules¶
treebuild.tree_build module¶
-
class
treebuild.tree_build.
TreeBuild
(input_file, output_file, id_column, fps, properties)[source]¶ There are assumptions for the data format of the input file. It is very important to understand these assumptions:
- potency (e.g. IC50/Ka/Ki) unit is nM
- the file must have a id column, you can set the column name with id_column
- the file must have a SMILES column, with ‘Canonical_Smiles’ as column name
- the file must have at least one potency column (IC50/Ka/Ki).
- To build the tree
- the identity column needs to be specified with id_column
- a list of fingerprints and a list of properties need to be specified with rdkit
- the directories for input and output file need to be specified
-
static
gen_dist_file
(liganddict, fp_func)[source]¶ generate distance file which is the input of rapidnj program.
Parameters: - liganddict – ligand information
- fp_func – fingerprint function
Returns: filename for distance file
-
static
gen_properties
(ligand_dict, activities, properties, ext_cols)[source]¶ Generate properties for each molecule.
Parameters: - ligand_dict – ligand dictionary which keep all ligand information
- activities – a list of PropertyType objects
- properties – a list of PropertyType objects
- ext_cols – the column name for external links
Returns:
-
static
make_structures_for_smiles
(ligand_dict)[source]¶ Make structure figures from smile strings. All image files will be in the IMG_DIR
Parameters: ligand_dict – ligand dictionary which keep all ligand information Returns:
-
static
parse_lig_file
(in_file, identifier)[source]¶ parse ligand file and return a dictionary with identifier as IDs
Parameters: - in_file – input file directory
- identifier – name for the identifier
Returns: a dictionray with ligand information
-
run_rapidnj
(distance_file)[source]¶ run rapidnj program on distance_file
Parameters: distance_file – directory of distance file Returns: newick string
treebuild.types module¶
-
class
treebuild.types.
FingerPrintType
(name, fp_func, metadata)[source]¶ representing fingerprint types
-
class
treebuild.types.
PropertyType
(name, metadata, transfunc=None, colname=None)[source]¶ representing biological or chemical properties
-
gen_property
(mol_dict=None)[source]¶ generate value for this property type
Parameters: - prop_name – the name of the property
- mol_dict – other information about the molecule
Returns: a generated value for this property
-
treebuild.util module¶
Provide utility functions
-
treebuild.util.
AddNewChild
(contents, a_node, new_node_name, edge_length, children, currentlist)[source]¶ Add a new child to a node.
Parameters: - contents – a string, a line from DOT
- a_node – a node object
- new_node_name – new node name
- edge_length – the length of edge
- children – existing children
- currentlist – current list of node name
Returns: void
-
treebuild.util.
CleanAttribute
(attr)[source]¶ Clean attribute, remove ‘,’.
Parameters: attr – old attribute string Returns: new string
-
treebuild.util.
ConvertToFloat
(line, colnam_list)[source]¶ Convert some columns (in colnam_list) to float, and round by 3 decimal.
Parameters: - line – a dictionary from DictReader.
- colnam_list – float columns
Returns: a new dictionary
-
treebuild.util.
Dot2Dict
(dotfile, moldict)[source]¶ Read a DOT file to generate a tree and save it to a dictionary.
Parameters: - dotfile – DOT file name
- moldict – a dictionary with ligand information
Returns: a dictionary with the tree
-
treebuild.util.
GetAttributeValue
(attrname, attr)[source]¶ Get node attribute.
Parameters: - attrname – name of the attribute
- attr – the attribute string
Returns: the value for the attribute
-
treebuild.util.
GetNodeProperty
(line)[source]¶ Get node property from a string.
Parameters: line – a string Returns: name, size, and position of the node
-
treebuild.util.
GetRoot
(dotfile, rootname)[source]¶ Return root name with rootname.
Parameters: - dotfile – DOT file
- rootname – the name of the root
Returns: the object of the root
-
treebuild.util.
GuessByFirstLine
(firstline)[source]¶ Guess the number of columns with floats by the first line of the file
Parameters: firstline – Returns:
-
treebuild.util.
IsEdge
(line)[source]¶ Whether this line in DOT file is an edge.
Parameters: line – a string line in DOT file Returns: True or False
-
treebuild.util.
NameAndAttribute
(line)[source]¶ Split name and attribute.
Parameters: line – DOT file name Returns: name string and attribute string
-
class
treebuild.util.
Node
(name, **attr)[source]¶ Bases:
dict
class for node of tree, each node can only have one parent
-
get_dist
(a_node)[source]¶ get the node as a dictionary.
Parameters: a_node – Node object Returns: a dictionary
-
-
treebuild.util.
NodeByName
(name, contents)[source]¶ Create node with name name.
Parameters: - name – a string with node name
- contents – a list of string from DOT file
Returns: node object
-
treebuild.util.
NodeNameExist
(line)[source]¶ Functions for parsing DOT file.
Parameters: line – a line from DOT file Returns: whether there is a node name in this line
-
treebuild.util.
ParseLigandFile
(infile, identifier)[source]¶ Parse ligand file to an dictionary, key is ligand id and value is a dictionary with properties and property values. This program will guess the type for each column based on the first row. The program will assume there is only two types of data: number and string.
Parameters: - infile – input filename
- identifier – the identifier column name
Returns: a dictionary
-
treebuild.util.
ProcessName
(name, isedge)[source]¶ Process the name of the node.
Parameters: - name – name of the node
- isedge – whether this is a edge
Returns: new name
-
treebuild.util.
RecursiveNode2Dict
(node, info_dict)[source]¶ Recursively populate information to the tree object with info_dict.
Parameters: - node – tree object with all info
- info_dict – information for each ligand.
Returns: a tree dictionary
-
treebuild.util.
RemoveBackSlash
(dotfile)[source]¶ Rewrite dot file, with removing back slash of dot file.
Parameters: dotfile – DOT file name Returns: void
-
treebuild.util.
SelectColumn
(lig_dict, colname)[source]¶ Prune the dictionary, only attribute in colname will be left.
Parameters: - lig_dict – a tree like dictionary
- colname – what attribute you want to keep.
Returns: a new dictionary
-
treebuild.util.
SizeScale
(size)[source]¶ Rescale the size (currently only convert to float).
Parameters: size – a string Returns: a float
-
treebuild.util.
ToFPObj
(alist, fp_func)[source]¶ A list of SMILE string object with (id, smiles) to a list of fingerprint object with (id, fp_obj)
Parameters: - alist – two element list, the first item is ligand name, the second is smile
- fp_func – the fingerprint function
Returns: two element list, with first item as ligand name, second item as a fingerprint object.
-
treebuild.util.
WriteAsPHYLIPFormat
(smile_list, fp_func)[source]¶ Prepare the input for RapidNJ.
Parameters: - smile_list – a list of smiles string
- fp_func – the fingerprint function
Returns: the filename with PHYLIP format (input for rapidnj)
-
treebuild.util.
WriteDotFile
(newick)[source]¶ Write newick string to a DOT file
Parameters: newick – a string with newick tree structure Returns: DOT file name
-
treebuild.util.
WriteJSON
(dict_obj, outfile, write_type)[source]¶ Dump json object to a file.
Parameters: - dict_obj – dictionary object
- outfile – output file name
- write_type – append or rewrite (‘a’ or ‘w’)
Returns: void