Contents
Index
GCIdentifier.GCPair
GCIdentifier.find_missing_groups_from_smiles
GCIdentifier.get_grouplist
GCIdentifier.get_groups_from_name
GCIdentifier.get_groups_from_smiles
GCIdentifier.group_replace
GCIdentifier.@gcstring_str
types and methods
GCIdentifier.GCPair
— TypeGCPair(smarts,name;group_order = 1)
Struct used to hold a description of a group. Contains the SMARTS string necessary to match the group within a SMILES query, and the assigned name. the group_order
parameter is used for groups that follow a Constantinou-Gani approach: the list of GCPair
with group_order = 1
will be matched with strict coverage (failing if there is missing atoms to cover) while second order groups and above will not be stringly checked for total coverage. Each order group will be matched independendly.
GCIdentifier.get_groups_from_smiles
— Functionget_groups_from_smiles(smiles::String,groups;connectivity = false,check = true)
Given a SMILES string and a group list (groups::Vector{GCPair}
), returns a list of groups and their corresponding amount.
If connectivity
is true, then it will additionally return a vector containing the amount of bonds between each pair.
Examples
julia> get_groups_from_smiles("CCO",UNIFACGroups)
("CCO", ["CH3" => 1, "CH2" => 1, "OH(P)" => 1])
julia> get_groups_from_smiles("CCO",JobackGroups,connectivity = true)
("CCO", ["-CH3" => 1, "-CH2-" => 1, "-OH (alcohol)" => 1], [("-CH3", "-CH2-") => 1, ("-CH2-", "-OH (alcohol)") => 1])
GCIdentifier.get_groups_from_name
— Functionget_groups_from_name(name::String,groups;connectivity = false)
Given a molecule name and a group list (groups::Vector{GCPair}
), returns a list of groups and their corresponding amount.
If connectivity
is true
, then it will additionally return a vector containing the amount of bonds between each pair.
Note: Can only be used if the ChemicalIdentifiers package is also installed and loaded (using ChemicalIdentifiers
).
Examples
julia> get_groups_from_name("ethanol",UNIFACGroups)
("ethanol", ["CH3" => 1, "CH2" => 1, "OH(P)" => 1])
julia> get_groups_from_name("ethanol",JobackGroups,connectivity = true)
("ethanol", ["-CH3" => 1, "-CH2-" => 1, "-OH (alcohol)" => 1], [("-CH3", "-CH2-") => 1, ("-CH2-", "-OH (alcohol)") => 1])
GCIdentifier.find_missing_groups_from_smiles
— Functionfind_missing_groups_from_smiles(smiles::String, groups;max_group_size = nothing, environment=false, reduced=false)
Given a SMILES string and a group list (groups::Vector{GCPair}
), returns a list of potential groups (new_groups::Vector{GCPair}
) which could cover those atoms not covered within groups
. If no groups
vector is provided, it will simply generate all possible groups for the molecule.
A set of heuristics are built into the code when it comes to combining heavy atoms into large groups:
- If a carbon atom is bonded to another carbon atom, unless only one of the carbons is on a ring, they will not be combined into a group.
- All other combinations of atoms are allowed.
The logic behind the first heuristic is due to the fact that neighbouring atoms with similar electronegativities won't have a great impact on each other's properties. As such, they are not combined into a group. In the future, this approach could be extended to use HNMR data to determine which atoms can be combined into the same group.
Optional arguments:
max_group_size::Int
: The maximum number of atoms within a group to be generated. Ifnothing
, the maximum size is however many atoms a central atom is bonded to.environment::Bool
: If true, the groups SMARTS will include information about the environment of the group is in. For example, in pentane, if environment is false, there will only be one CH2 group, whereas, if environment is true, there will be two CH2 groups, one bonded to CH3 and one bonded to another CH2.reduced::Bool
: If true, the groups will be generated such that the minimum number of groups required to represent the molecule, based onmax_group_size
, will be generated. If false, all possible groups will be generated.
Example
julia> find_missing_groups_from_smiles("CC(=O)O")
7-element Vector{GCIdentifier.GCPair}:
GCIdentifier.GCPair("[CX4;H3;!R]", "CH3")
GCIdentifier.GCPair("[CX3;H0;!R]", "C=")
GCIdentifier.GCPair("[OX1;H0;!R]", "O=")
GCIdentifier.GCPair("[OX2;H1;!R]", "OH")
GCIdentifier.GCPair("[CX3;H0;!R](=[OX1;H0;!R])", "C=O=")
GCIdentifier.GCPair("[CX3;H0;!R]([OX2;H1;!R])", "C=OH")
GCIdentifier.GCPair("[CX3;H0;!R](=[OX1;H0;!R])([OX2;H1;!R])", "C=O=OH")
GCIdentifier.get_grouplist
— Functionget_grouplist(x)
Should return a Vector{GCPair}
containing the available groups for SMILES matching.
GCIdentifier.@gcstring_str
— Macro@gcstring_str(str)
given a string of the form "Group1:n1;Group2:2", returns ["Group1" => n1,"Group2" => n2]
GCIdentifier.group_replace
— Functiongroup_replace(grouplist,keys...)
given a group list generated by get_groups_from_smiles
, replaces certain groups in grouplist
with the values specified in keys
.
Examples
groups1 = get_groups_from_smiles("CCO", UNIFACGroups) #["CH3" => 1, "CH2" => 1, "OH(P)" => 1]
#we replace each "OH(P)" with 1 "OH" group
#and each "CH3" group with 3 "H" group and 1 "C" group
groups2 = group_replace(groups1[2],"OH(P)" => ("OH" => 1), "CH3" => [("C" => 1),("H" => 3)])