Sunday, October 20, 2019

Reactions to edges 2: Breaking reactions down into edges

Once I had the reactant atom maps, breaking reactions down into edges became a simple matter of generating all pairs of reactants-products and then filtering them based on whether at least one atom of the reactant was mapped to the product:
def _splitReaction(rxn: IReaction, rctMap: Map[IAtom, IAtom]): List[IReaction] = {
  val rctPrdPairs: Iterable[(IAtomContainer, IAtomContainer)] = for (
    rct <- rxn.getReactants.atomContainers.asScala;
    prd <- rxn.getProducts.atomContainers.asScala
  ) yield (rct, prd)
  val mappedPairs: Iterable[(IAtomContainer, IAtomContainer)] = rctPrdPairs.filter({ case (rct, prd) => {
    rctMap.isEmpty ||
    rct.atoms.asScala.exists(r => rctMap.get(r) match {
      case None => false
      case Some(null) => false
      case Some(p) => p.getContainer == prd
    })
  } })
  mappedPairs.map({ case (rct, prd) => {
    val newRxn: IReaction = new Reaction()
    newRxn.addReactant(AtomContainerManipulator.copyAndSuppressedHydrogens(rct))
    newRxn.addProduct(AtomContainerManipulator.copyAndSuppressedHydrogens(prd))
    newRxn.setProperty(CDKConstants.SMILES, sg.create(newRxn))
    newRxn
  }}).toList
I remember being astonished when I worked this out. The first time I did this in Java it was a LOT harder and took a LOT more lines of code. Part of that was working in Java, but this time I also made much better use of the CDK interface to create a new reaction and clone the reactant and product before adding them to it. I guess I have grown as a programmer in the intervening years! I added the SMILES of the edge to the emitted edge to aid in visualization and debugging later. Later I generating the InChIKey of the reactant and product as well to group identical edges into single records in Spark.

I added the escape valve of an empty reactant map. To preserve information I include all combinations of reactants and products.

I also have to do a check on the image of the reactant atom in the reactant map. The way I constructed the map, the reactant atom could be mapped to None or to Some(null), which caused exceptions when I ran this over the whole NextMove reaction set.

CDK's IsomorphismTester made checking the correctness of the splitting much easier than going atom-by-atom:

test("it maps all reactants to all products if no map is present") {
  val rxn: IReaction = sp.parseReactionSmiles(unmappedEsterSmiles)
  Splitter.filterInorganicsFromReaction(rxn)
  val rctMap = Splitter.getMaps(rxn)
  val edges: List[IReaction] = Splitter._splitReaction(rxn, rctMap)
  assert(edges.size == 2)
  assert(edges.forall(e => e.getReactantCount == 1 && e.getProductCount == 1))
  assert(new IsomorphismTester(edges(0).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(0)))
  assert(new IsomorphismTester(edges(1).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(1)))
  assert(new IsomorphismTester(edges(0).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(0)))
  assert(new IsomorphismTester(edges(1).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(0)))
}

test("it uses maps to correctly split reactions") {
  val rxn: IReaction = sp.parseReactionSmiles(chiralResolution)
  Splitter.filterInorganicsFromReaction(rxn)
  val rctMap = Splitter.getMaps(rxn)
  val edges: List[IReaction] = Splitter._splitReaction(rxn, rctMap)
  assert(edges.size == 4)
  assert(edges.forall(e => e.getReactantCount == 1 && e.getProductCount == 1))
  // DL-alanine to L-alanine
  assert(new IsomorphismTester(edges(0).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(0)))
  assert(new IsomorphismTester(edges(0).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(0)))
  // DL-alanine to n-acetyl-D-alanine
  assert(new IsomorphismTester(edges(1).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(1)))
  assert(new IsomorphismTester(edges(1).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(1)))
  // acetic anhydride to n-acetyl-D-alanine
  assert(new IsomorphismTester(edges(2).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(2)))
  assert(new IsomorphismTester(edges(2).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(1)))
  // acetic anhydride to acetic acid
  assert(new IsomorphismTester(edges(3).getReactants.getAtomContainer(0)).isIsomorphic(rxn.getReactants.getAtomContainer(2)))
  assert(new IsomorphismTester(edges(3).getProducts.getAtomContainer(0)).isIsomorphic(rxn.getProducts.getAtomContainer(2)))
}

No comments:

Post a Comment