treebuild package

Submodules

treebuild.tree_build module

class treebuild.tree_build.TreeBuild(input_file, output_file, id_column, fps, properties)[source]

There are assumptions for the data format of the input file. It is very important to understand these assumptions:

  1. potency (e.g. IC50/Ka/Ki) unit is nM
  2. the file must have a id column, you can set the column name with id_column
  3. the file must have a SMILES column, with ‘Canonical_Smiles’ as column name
  4. the file must have at least one potency column (IC50/Ka/Ki).
To build the tree
  1. the identity column needs to be specified with id_column
  2. a list of fingerprints and a list of properties need to be specified with rdkit
  3. the directories for input and output file need to be specified
static dot2dict(dot_outfile)[source]
static gen_dist_file(liganddict, fp_func)[source]

generate distance file which is the input of rapidnj program.

Parameters:
  • liganddict – ligand information
  • fp_func – fingerprint function
Returns:

filename for distance file

static gen_properties(ligand_dict, activities, properties, ext_cols)[source]

Generate properties for each molecule.

Parameters:
  • ligand_dict – ligand dictionary which keep all ligand information
  • activities – a list of PropertyType objects
  • properties – a list of PropertyType objects
  • ext_cols – the column name for external links
Returns:

static make_structures_for_smiles(ligand_dict)[source]

Make structure figures from smile strings. All image files will be in the IMG_DIR

Parameters:ligand_dict – ligand dictionary which keep all ligand information
Returns:
static parse_lig_file(in_file, identifier)[source]

parse ligand file and return a dictionary with identifier as IDs

Parameters:
  • in_file – input file directory
  • identifier – name for the identifier
Returns:

a dictionray with ligand information

run_rapidnj(distance_file)[source]

run rapidnj program on distance_file

Parameters:distance_file – directory of distance file
Returns:newick string
sfdp_dot(dot_infile, size)[source]

run sdfp on dot file

Parameters:
  • dot_infile – directory for dot file
  • size – parameter for the sfdp
Returns:

new filename

static write_dotfile(newick)[source]

write newick string as dot file

Parameters:newick – newick string
Returns:dot file

treebuild.types module

class treebuild.types.FingerPrintType(name, fp_func, metadata)[source]

representing fingerprint types

to_dict()[source]

Show the information for this fingerprint

Returns:dictionary with basic info
class treebuild.types.PropertyType(name, metadata, transfunc=None, colname=None)[source]

representing biological or chemical properties

gen_property(mol_dict=None)[source]

generate value for this property type

Parameters:
  • prop_name – the name of the property
  • mol_dict – other information about the molecule
Returns:

a generated value for this property

set_col_name(col_name)[source]

Set the property name from the input file

Parameters:col_name – original column name in the input file
Returns:
to_dict()[source]

Show the information

Returns:dictionary with basic info

treebuild.util module

Provide utility functions

treebuild.util.AddNewChild(contents, a_node, new_node_name, edge_length, children, currentlist)[source]

Add a new child to a node.

Parameters:
  • contents – a string, a line from DOT
  • a_node – a node object
  • new_node_name – new node name
  • edge_length – the length of edge
  • children – existing children
  • currentlist – current list of node name
Returns:

void

treebuild.util.CleanAttribute(attr)[source]

Clean attribute, remove ‘,’.

Parameters:attr – old attribute string
Returns:new string
treebuild.util.ConvertToFloat(line, colnam_list)[source]

Convert some columns (in colnam_list) to float, and round by 3 decimal.

Parameters:
  • line – a dictionary from DictReader.
  • colnam_list – float columns
Returns:

a new dictionary

treebuild.util.Dot2Dict(dotfile, moldict)[source]

Read a DOT file to generate a tree and save it to a dictionary.

Parameters:
  • dotfile – DOT file name
  • moldict – a dictionary with ligand information
Returns:

a dictionary with the tree

treebuild.util.GetAttributeValue(attrname, attr)[source]

Get node attribute.

Parameters:
  • attrname – name of the attribute
  • attr – the attribute string
Returns:

the value for the attribute

treebuild.util.GetNodeProperty(line)[source]

Get node property from a string.

Parameters:line – a string
Returns:name, size, and position of the node
treebuild.util.GetRoot(dotfile, rootname)[source]

Return root name with rootname.

Parameters:
  • dotfile – DOT file
  • rootname – the name of the root
Returns:

the object of the root

treebuild.util.GetSize(width)[source]

Get the size.

Parameters:width
Returns:
treebuild.util.GuessByFirstLine(firstline)[source]

Guess the number of columns with floats by the first line of the file

Parameters:firstline
Returns:
treebuild.util.IsEdge(line)[source]

Whether this line in DOT file is an edge.

Parameters:line – a string line in DOT file
Returns:True or False
treebuild.util.NameAndAttribute(line)[source]

Split name and attribute.

Parameters:line – DOT file name
Returns:name string and attribute string
class treebuild.util.Node(name, **attr)[source]

Bases: dict

class for node of tree, each node can only have one parent

add_child(a_node)[source]

Add child to the node.

Parameters:a_node – Node object
Returns:void
get_dist(a_node)[source]

get the node as a dictionary.

Parameters:a_node – Node object
Returns:a dictionary
set_dist(dist)[source]

set the dictionary attribute for the Node object.

Parameters:dist
Returns:
set_parent(a_node)[source]

Set the parent for a node.

Parameters:a_node – Node object
Returns:void
treebuild.util.NodeByName(name, contents)[source]

Create node with name name.

Parameters:
  • name – a string with node name
  • contents – a list of string from DOT file
Returns:

node object

treebuild.util.NodeNameExist(line)[source]

Functions for parsing DOT file.

Parameters:line – a line from DOT file
Returns:whether there is a node name in this line
treebuild.util.ParseLigandFile(infile, identifier)[source]

Parse ligand file to an dictionary, key is ligand id and value is a dictionary with properties and property values. This program will guess the type for each column based on the first row. The program will assume there is only two types of data: number and string.

Parameters:
  • infile – input filename
  • identifier – the identifier column name
Returns:

a dictionary

treebuild.util.ProcessName(name, isedge)[source]

Process the name of the node.

Parameters:
  • name – name of the node
  • isedge – whether this is a edge
Returns:

new name

treebuild.util.RecursiveNode2Dict(node, info_dict)[source]

Recursively populate information to the tree object with info_dict.

Parameters:
  • node – tree object with all info
  • info_dict – information for each ligand.
Returns:

a tree dictionary

treebuild.util.RemoveBackSlash(dotfile)[source]

Rewrite dot file, with removing back slash of dot file.

Parameters:dotfile – DOT file name
Returns:void
treebuild.util.SelectColumn(lig_dict, colname)[source]

Prune the dictionary, only attribute in colname will be left.

Parameters:
  • lig_dict – a tree like dictionary
  • colname – what attribute you want to keep.
Returns:

a new dictionary

treebuild.util.SizeScale(size)[source]

Rescale the size (currently only convert to float).

Parameters:size – a string
Returns:a float
treebuild.util.ToFPObj(alist, fp_func)[source]

A list of SMILE string object with (id, smiles) to a list of fingerprint object with (id, fp_obj)

Parameters:
  • alist – two element list, the first item is ligand name, the second is smile
  • fp_func – the fingerprint function
Returns:

two element list, with first item as ligand name, second item as a fingerprint object.

treebuild.util.WriteAsPHYLIPFormat(smile_list, fp_func)[source]

Prepare the input for RapidNJ.

Parameters:
  • smile_list – a list of smiles string
  • fp_func – the fingerprint function
Returns:

the filename with PHYLIP format (input for rapidnj)

treebuild.util.WriteDotFile(newick)[source]

Write newick string to a DOT file

Parameters:newick – a string with newick tree structure
Returns:DOT file name
treebuild.util.WriteJSON(dict_obj, outfile, write_type)[source]

Dump json object to a file.

Parameters:
  • dict_obj – dictionary object
  • outfile – output file name
  • write_type – append or rewrite (‘a’ or ‘w’)
Returns:

void

treebuild.util.extendChildren(a_node, contents, cur_list)[source]

Find all children of a node in a tree.

Parameters:
  • a_node – a node in a tree
  • contents – contents from DOT file
  • cur_list – current children
Returns:

a list of node objects (children)

treebuild.util.getSimilarity(fp1, fp2)[source]

Generate similarity score for two smiles strings.

Parameters:
  • fp1 – fingerprint object (rdkit)
  • fp2 – fingerprint object (rdkit)
Returns:

Tanimoto similarity