Tuesday, May 12, 2015

New post, new library

Cheers. In the last post a briefed my Params library, which I guest left a feeling of "this is decent, but why should I use that?". Today I will describe the first depending library, CLIApp, which contains common commandline application functionality - using the Params lib. Let me demonstrate. I recently wrote a CLI application called dino-mapper. This app needs some inputs, which are described in the following Params implementer:



package se.lth.immun

import se.jt.Params
import java.io.File

class DinoMapperParams(val name:String, val version:String) extends Params {

 import Params._
 
 // USER EXPOSED PARAMS
 val verbose   = false  ## "increase details in output"
 val matchPPM  = 10.0   ## "match threshhold in PPM"
 val matchPreRT  = 60.0  ## "maximun allowed pre-emption of Ms2 compared to feature (sec)"
 val matchPostRT  = 60.0  ## "maximun allowed delay of Ms2 compared to feature (sec)"
 val outDir  = ""  ## "output directory (by default same as input mzML)"
 val outName  = ""  ## "basename for output files (by default same as input mzML)"
 
 val dinoFeatures = ReqString("Csv-Merged dinosaur feature file")
 val searchResults = ReqString("search results from TPP (interact.pep.xls)")
}

Now I want to expose my parameters as commandline options, and this is were CLIApp comes in


package se.lth.immun

import se.jt.CLIApp
import java.util.Properties

object DinoMapper extends CLIApp {

 
 def main(args:Array[String]):Unit = {
  
  var properties = new Properties
  properties.load(this.getClass.getResourceAsStream("/pom.properties"))
  val name = properties.getProperty("pom.artifactId")
  val version = properties.getProperty("pom.version")
     
  params = new DinoMapperParams(name, version)
     
  failOnError(parseArgs(name, version, args, params, List("dinoFeatures", "searchResults"), None))
  
  println(name + " "+version)
  println("   dino feature file: " + params.dinoFeatures.value)
  println("  search result file: " + params.searchResults.value)
  println()
 }
}

What did I do here? The DinoMapper extends CLIApp, which gives access to the two methods parseArgs and failOnError. Of these, parseArgs read through the input argument and checks for entries of the type --KEY=VALUE or --FLAG. When such entries are found, they are matched against the provided Params object, to update the KEY/FLAG value accordingly. In addition, parseArgs also accepts a list of ordered required arguments, and an optional place to store the remaining arguments after the required once have been filled.

The parseArgs in typically wrapped in a failOnError call. This is because parseArgs return a list of error encountered during argument parsing, and failOnError simply takes a list of errors (Strings really) and does nothing on an empty list, but fails with some nice usage output on a non-empty error list.

Let's see some live interaction


$johant> java -jar target/DinoMapper-0.9.0-jar-with-dependencies.jar 
usage:
> java -jar DinoMapper-0.9.0.jar [OPTIONS] dinoFeatures searchResults 
OPTIONS:
        PARAMETER DEFAULT          DESCRIPTION
     dinoFeatures -                Csv-Merged dinosaur feature file
         matchPPM 10.0             match threshhold in PPM
      matchPostRT 60.0             maximun allowed delay of Ms2 compared to feature (sec)
       matchPreRT 60.0             maximun allowed pre-emption of Ms2 compared to feature (sec)
           outDir                  output directory (by default same as input mzML)
          outName                  basename for output files (by default same as input mzML)
    searchResults -                search results from TPP (interact.pep.xls)
          verbose false            increase details in output

Not enough arguments!
$johant> java -jar DinoMapper.jar --matchPPM=8.0 --verbose dino.features.csv search-results.pep.xml
DinoMapper 1.0.0
   dino feature file: dino.features.csv
  search result file: search-results.pep.xml
$johant>
$johant> java -jar DinoMapper.jar --matchPPMNSDA=1.0 dino.features.csv search-results.pep.xml
usage:
> java -jar DinoMapper-0.9.0.jar [OPTIONS] dinoFeatures searchResults 
OPTIONS:
        PARAMETER DEFAULT                DESCRIPTION
     dinoFeatures dino.features.csv      Csv-Merged dinosaur feature file
         matchPPM 10.0                   match threshhold in PPM
      matchPostRT 60.0                   maximun allowed delay of Ms2 compared to feature (sec)
       matchPreRT 60.0                   maximun allowed pre-emption of Ms2 compared to feature (sec)
           outDir                        output directory (by default same as input mzML)
          outName                        basename for output files (by default same as input mzML)
    searchResults search-results.pep.xml search results from TPP (interact.pep.xls)
          verbose false                  increase details in output

Error parsing 'matchPPMNSDA'. Option does not exist.

I summary I'm finding this library very useful for a lot of reasons. Gathering parameters in only place help tidy things up, and getting parameter explanations readable both directly in source and from the commandline is very useful. Further parameters are easy to hide/expose by simply adding or removing the ## comment in the Params file. And last, all my tools get unified commandline argument handling, without any extra work on my side. Finally, the CLIApp library has some other functions as well. The parseParams function of CLIApp lets you read a Params object from a file and the CLIBar object lets you produce commandline style progress bars. FYI.

Saturday, May 09, 2015

Scala Params library

This post will present a small scala library that I've written for keeping track of parameters in software. The library is called Params, and consists of one object and one trait, both called Params. With Params, you can create code like this:


import se.jt.Params

class AppParams extends Params {

  import Params._
  val name =     "hi"     ## "this is a string param"
  val flag =     false     ## "set this flag to allow separate mode of operation"
  val num =     100     ## "this number could be important"
  val inPath = ReqString("without a path to handle we can't continue!")
}


val p = new AppParams
val opts = p.opts
opts("name").update("bruce")
opts("flag").update("true")
opts("num").update("42")
opts("inPath").update("my-file.txt")

Params uses reflection to update any fields in the Params object using the string-name of that field, as long as it has one of the special types in the Params object: Plong, Ping, Pouble, Pring Poolean or Plist.

So why is this any good? First of all, having a Params object to gather all algorithm parameters is very useful, especially if you send it around as an implicit parameter to your functions and classes so it's always available when needed. In the AppParams source the programmer can get a quick summary of the used parameters, and descriptions of these.

The real gains are not seen until we start exposing the parameters to the user, with the library CLIApp, that I describe in my next post.