Sunday, September 29, 2019

My current project: A reaction graph - purpose and start

Chemical reactions have always been what fascinated me most about chemistry: stoichiometry in high school, organic reaction mechanisms in college, and transformations later on. When I was employed at a large chemical information company I learned about graph theory and treating sets of organic reactions as a graph. I loved the idea of processing a set of chemical reactions so that you could immediately know whether it was possible to get from any reactant to any product. While I was at that large chemical information company I worked out an algorithm to do it in MapReduce. Calculating the full transitive closure was impossible even on the cluster I had access to, but I always wondered if implementing it in Spark and Scala would make it feasible. A year or so ago I heard about the NextMove patent reaction database, so I decided to see if I could make it work using what I had learned in the meantime.

I downloaded the reaction set and took a look. The reactions are recorded in SMILES with reaction mapping. Fantastic! That's just what my algorithm needed.

I use the Chemistry Development Kit to process chemical information in my code. Spark is natively implemented in Scala, with alternate interfaces in Java and Python. I thought Scala would handle the abstract concepts I needed and do the processing best out of the three.

The first step was to parse out a reaction object from the SMILES string. One of the nice things about Scala is that it runs on the JVM, so you can import and use Java objects easily. So this one was simple:

var sp: SmilesParser = new SmilesParser(SilentChemObjectBuilder.getInstance())

def parseSmiles(smiles: String): IReaction  = sp.parseReactionSmiles(smiles)
Lastly, I am unashamed to be completely biased toward organic substances, so I wanted to filter out any inorganic compounds, even if they happened to be mapped in the NextMove reaction. This was my first approach:
    val rctIterator = rxn.getReactants.atomContainers.iterator
    while (rctIterator.hasNext) {
      val rct: IAtomContainer = rctIterator.next
      val formula: IMolecularFormula = MolecularFormulaManipulator.getMolecularFormula(rct)
      if (!MolecularFormulaManipulator.containsElement(formula, Elements.CARBON)) {
        rctIterator.remove
      }
    }
and something similar for the products. I then refactored that into a method that I call twice, once for the reactants and again for the products.
  def filterInorganicsFromReaction(rxn: IReaction): Unit = {
    filterInorganicsFromMolListIterator(rxn.getReactants.atomContainers.iterator)
    filterInorganicsFromMolListIterator(rxn.getProducts.atomContainers.iterator)
  }

  def filterInorganicsFromMolListIterator(subIt: java.util.Iterator[IAtomContainer]): Unit = {
    while (subIt.hasNext) {
      val sub: IAtomContainer = subIt.next
      val formula: IMolecularFormula = MolecularFormulaManipulator.getMolecularFormula(sub)
      if (!MolecularFormulaManipulator.containsElement(formula, Elements.CARBON)) {
        subIt.remove
      }
    }
  }

Sunday, September 22, 2019

Installing OpenBabel on my Mac

I have been interested in chemistry since high school, and programming since grad school. I dream sometimes about getting a job in cheminformatics or chemical information. To do that, I probably need practice, so I decided to install OpenBabel. This is a library of C functions to do interesting chemistry-related things, like convert between various chemistry file formats, cluster similar compounds, calculate descriptors, etc. It has a set of Python bindings which makes it a lot easier to work with. Sounds like fun, doesn't it?
  • Documentation: http://openbabel.org/docs/current/index.html
  • Compilation and installation instructions: http://openbabel.org/docs/current/Installation/install.html

Preliminaries

  • Downloaded and extracted a tar.gz file from http://sourceforge.net/projects/openbabel/files/openbabel/2.3.1/openbabel-2.3.1.tar.gz/download
  • Tried an out-of-source build, and CMake complained about missing prerequisites

Prerequisites

  • Eigen
    • OpenBabel says it needs version 2, but brew only knows about version 3.x. I installed version 3 anyway.
    • This created an entry in /usr/local/include/eigen3 which pointed to ../Cellar/eigen/3.3.7/include/eigen3 which contains the Eigen and unsupported folders.
    • Eigen is a header-only project, so I cloned it from its Github mirror and checked out branch 2.0.17.
    • Then I created a symlink from /usr/local/include/eigen2 to ../Cellar/eigen/2.0.17/include/eigen2 and copied the Eigen and unsupported folders there.
    • Might confuse me if I ever try to brew uninstall Eigen 2, but c'est la vie.
  • wxWidgets
    • The first two Google hits didn't do it. Finally I searched `wxwidgets mac homebrew`, and sure enough, there's a formula:
    • brew install wxmac
  • Cairo
    • Did this out of order. The error stack mentioned Cairo, so I first tried installing it with Homebrew:
    • brew install cairo
    • After all this, CMake still gave an error, but it seemed related to pkg-config. Another quick Google search fixed that:
    • brew install pkg-config
      Which, if I recall, is a good thing to have for general C development.
    • CMake was still complaining that Cairo wasn't found, despite the presence of `cairo` in `/usr/local/include`. Looking more closely at the output I saw "Package 'libffi', required by 'gobject-2.0', not found"
    • Another Google search on that turned up setting the path to libffi as:
    • PKG_CONFIG_PATH="/usr/local/opt/libffi/lib/pkgconfig" cmake -S . -B build
After that the CMake build completed without error finding all the packages I thought I would need.

Compilation and installation

Since I want to use this in Python I attempted to set the Python bindings. I wasn't finding the Python 3 libraries, but I finally found a web page that said I could point OpenBabel to Python 3 instead of 2 by giving the path to the executable. For the heck of it I also built the GUI. Being a good TDD person I also wanted to run the tests. My final CMake command was:
PKG_CONFIG_PATH="/usr/local/opt/libffi/lib/pkgconfig" cmake -S . -B build -Wno-dev -DBUILD_GUI=ON -DPYTHON_BINDINGS=ON -DPYTHON_EXECUTABLE=/usr/local/bin/python3 -DENABLE_TESTS=ON
make -C build
make test -C build
That caused a problem with a missing header. Searching for a solution to that, I found that there's a formula for OpenBabel, which would allow me to bypass all this dependency installation by hand stuff.
brew install open-babel
Which went without hitch. But just to be thorough, it also suggested cloning and building from Github. When I did that:
git clone git@github.com:openbabel/openbabel.git
and used the CMake and make commands, this time there was no problem with the missing header. But there were no Python bindings. Looking at the build output, I found I needed to add -DRUN_SWIG=ON, and that that failed without SWIG installed. So brew install swig and then the Python bindings were generated.
$ python3
>>> import openbabel
>>>
QED

Introduction and Purpose

I'm trying blogging again! I do some hobby programming at home, and unlike work, I don't have any place convenient to keep my notes. So that's what this will be, a place for me to keep notes and write explanations to other programmers about problems I encounter and how I overcome them. The other programmers may be just me, but hey, that's okay. My company uses pair programming extensively, but as an introvert, there are times when I just like to try things on my own. So that's where the title of the blog comes from: just a solitary programmer, trying to do things on his own and writing about it. I hope you (and I) learn something. Happy coding!