Once I had the reactant atom maps, breaking reactions down into edges became a simple matter of generating all pairs of reactants-products and then filtering them based on whether at least one atom of the reactant was mapped to the product:
def _splitReaction(rxn: IReaction, rctMap: Map[IAtom, IAtom]): List[IReaction] = {
val rctPrdPairs: Iterable[(IAtomContainer, IAtomContainer)] = for (
rct <- rxn.getReactants.atomContainers.asScala;
prd <- rxn.getProducts.atomContainers.asScala
) yield (rct, prd)
val mappedPairs: Iterable[(IAtomContainer, IAtomContainer)] = rctPrdPairs.filter({ case (rct, prd) => {
rctMap.isEmpty ||
rct.atoms.asScala.exists(r => rctMap.get(r) match {
case None => false
case Some(null) => false
case Some(p) => p.getContainer == prd
})
} })
mappedPairs.map({ case (rct, prd) => {
val newRxn: IReaction = new Reaction()
newRxn.addReactant(AtomContainerManipulator.copyAndSuppressedHydrogens(rct))
newRxn.addProduct(AtomContainerManipulator.copyAndSuppressedHydrogens(prd))
newRxn.setProperty(CDKConstants.SMILES, sg.create(newRxn))
newRxn
}}).toList
I remember being astonished when I worked this out. The first time I did this in Java it was a LOT harder and took a LOT more lines of code. Part of that was working in Java, but this time I also made much better use of the CDK interface to create a new reaction and clone the reactant and product before adding them to it. I guess I have grown as a programmer in the intervening years! I added the SMILES of the edge to the emitted edge to aid in visualization and debugging later. Later I generating the InChIKey of the reactant and product as well to group identical edges into single records in Spark.
I added the escape valve of an empty reactant map. To preserve information I include all combinations of reactants and products.
I also have to do a check on the image of the reactant atom in the reactant map. The way I constructed the map, the reactant atom could be mapped to None or to Some(null), which caused exceptions when I ran this over the whole NextMove reaction set.
CDK's
IsomorphismTester made checking the correctness of the splitting much easier than going atom-by-atom:
test("it maps all reactants to all products if no map is present") {
val rxn: IReaction = sp.parseReactionSmiles(unmappedEsterSmiles)
Splitter.filterInorganicsFromReaction(rxn)
val rctMap = Splitter.getMaps(rxn)
val edges: List[IReaction] = Splitter._splitReaction(rxn, rctMap)
assert(edges.size == 2)
assert(edges.forall(e => e.getReactantCount == 1 && e.getProductCount == 1))
assert(new IsomorphismTester(edges(0).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(0)))
assert(new IsomorphismTester(edges(1).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(1)))
assert(new IsomorphismTester(edges(0).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(0)))
assert(new IsomorphismTester(edges(1).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(0)))
}
test("it uses maps to correctly split reactions") {
val rxn: IReaction = sp.parseReactionSmiles(chiralResolution)
Splitter.filterInorganicsFromReaction(rxn)
val rctMap = Splitter.getMaps(rxn)
val edges: List[IReaction] = Splitter._splitReaction(rxn, rctMap)
assert(edges.size == 4)
assert(edges.forall(e => e.getReactantCount == 1 && e.getProductCount == 1))
// DL-alanine to L-alanine
assert(new IsomorphismTester(edges(0).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(0)))
assert(new IsomorphismTester(edges(0).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(0)))
// DL-alanine to n-acetyl-D-alanine
assert(new IsomorphismTester(edges(1).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(1)))
assert(new IsomorphismTester(edges(1).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(1)))
// acetic anhydride to n-acetyl-D-alanine
assert(new IsomorphismTester(edges(2).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(2)))
assert(new IsomorphismTester(edges(2).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(1)))
// acetic anhydride to acetic acid
assert(new IsomorphismTester(edges(3).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(2)))
assert(new IsomorphismTester(edges(3).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(2)))
}
No comments:
Post a Comment