API · GCIdentifier.jl

Index

GCIdentifier.GCPair
GCIdentifier.find_missing_groups_from_smiles
GCIdentifier.get_grouplist
GCIdentifier.get_groups_from_name
GCIdentifier.get_groups_from_smiles
GCIdentifier.group_replace
GCIdentifier.@gcstring_str

types and methods

GCIdentifier.GCPair — Type

GCPair(smarts,name;group_order = 1)

Struct used to hold a description of a group. Contains the SMARTS string necessary to match the group within a SMILES query, and the assigned name. the group_order parameter is used for groups that follow a Constantinou-Gani approach: the list of GCPair with group_order = 1 will be matched with strict coverage (failing if there is missing atoms to cover) while second order groups and above will not be stringly checked for total coverage. Each order group will be matched independendly.

source

GCIdentifier.get_groups_from_smiles — Function

get_groups_from_smiles(smiles::String,groups;connectivity = false,check = true)

Given a SMILES string and a group list (groups::Vector{GCPair}), returns a list of groups and their corresponding amount.

If connectivity is true, then it will additionally return a vector containing the amount of bonds between each pair.

Examples

julia> get_groups_from_smiles("CCO",UNIFACGroups)
("CCO", ["CH3" => 1, "CH2" => 1, "OH(P)" => 1])

julia> get_groups_from_smiles("CCO",JobackGroups,connectivity = true)
("CCO", ["-CH3" => 1, "-CH2-" => 1, "-OH (alcohol)" => 1], [("-CH3", "-CH2-") => 1, ("-CH2-", "-OH (alcohol)") => 1])

source

GCIdentifier.get_groups_from_name — Function

get_groups_from_name(name::String,groups;connectivity = false)

Given a molecule name and a group list (groups::Vector{GCPair}), returns a list of groups and their corresponding amount.

If connectivity is true, then it will additionally return a vector containing the amount of bonds between each pair.

Note: Can only be used if the ChemicalIdentifiers package is also installed and loaded (using ChemicalIdentifiers).

Examples

julia> get_groups_from_name("ethanol",UNIFACGroups)
("ethanol", ["CH3" => 1, "CH2" => 1, "OH(P)" => 1])

julia> get_groups_from_name("ethanol",JobackGroups,connectivity = true)
("ethanol", ["-CH3" => 1, "-CH2-" => 1, "-OH (alcohol)" => 1], [("-CH3", "-CH2-") => 1, ("-CH2-", "-OH (alcohol)") => 1])

source

GCIdentifier.find_missing_groups_from_smiles — Function

find_missing_groups_from_smiles(smiles::String, groups;max_group_size = nothing, environment=false, reduced=false)

Given a SMILES string and a group list (groups::Vector{GCPair}), returns a list of potential groups (new_groups::Vector{GCPair}) which could cover those atoms not covered within groups. If no groups vector is provided, it will simply generate all possible groups for the molecule.

A set of heuristics are built into the code when it comes to combining heavy atoms into large groups:

If a carbon atom is bonded to another carbon atom, unless only one of the carbons is on a ring, they will not be combined into a group.
All other combinations of atoms are allowed.

The logic behind the first heuristic is due to the fact that neighbouring atoms with similar electronegativities won't have a great impact on each other's properties. As such, they are not combined into a group. In the future, this approach could be extended to use HNMR data to determine which atoms can be combined into the same group.

Optional arguments:

max_group_size::Int: The maximum number of atoms within a group to be generated. If nothing, the maximum size is however many atoms a central atom is bonded to.
environment::Bool: If true, the groups SMARTS will include information about the environment of the group is in. For example, in pentane, if environment is false, there will only be one CH2 group, whereas, if environment is true, there will be two CH2 groups, one bonded to CH3 and one bonded to another CH2.
reduced::Bool: If true, the groups will be generated such that the minimum number of groups required to represent the molecule, based on max_group_size, will be generated. If false, all possible groups will be generated.

Example

julia> find_missing_groups_from_smiles("CC(=O)O")
7-element Vector{GCIdentifier.GCPair}:
 GCIdentifier.GCPair("[CX4;H3;!R]", "CH3")
 GCIdentifier.GCPair("[CX3;H0;!R]", "C=")
 GCIdentifier.GCPair("[OX1;H0;!R]", "O=")
 GCIdentifier.GCPair("[OX2;H1;!R]", "OH")
 GCIdentifier.GCPair("[CX3;H0;!R](=[OX1;H0;!R])", "C=O=")
 GCIdentifier.GCPair("[CX3;H0;!R]([OX2;H1;!R])", "C=OH")
 GCIdentifier.GCPair("[CX3;H0;!R](=[OX1;H0;!R])([OX2;H1;!R])", "C=O=OH")

source

GCIdentifier.get_grouplist — Function

get_grouplist(x)

Should return a Vector{GCPair} containing the available groups for SMILES matching.

source

GCIdentifier.@gcstring_str — Macro

@gcstring_str(str)

given a string of the form "Group1:n1;Group2:2", returns ["Group1" => n1,"Group2" => n2]

source

GCIdentifier.group_replace — Function

group_replace(grouplist,keys...)

given a group list generated by get_groups_from_smiles, replaces certain groups in grouplist with the values specified in keys.

Examples

groups1 = get_groups_from_smiles("CCO", UNIFACGroups) #["CH3" => 1, "CH2" => 1, "OH(P)" => 1]
#we replace each "OH(P)" with 1 "OH" group
#and each "CH3" group with 3 "H" group and 1 "C" group
groups2 = group_replace(groups1[2],"OH(P)" => ("OH" => 1), "CH3" => [("C" => 1),("H" => 3)])

source

Contents

Index

types and methods