Skip to content
opcode81 edited this page May 12, 2013 · 4 revisions

The ProbCog toolbox supports several scripting libraries for Jython and Python. ProbCog comes with a Jython interpreter, which you can invoke as either jython or pcjython (use the latter to invoke ProbCog's interpreter when another version of Jython is already installed on your system).

For an optimized workflow, it may be desirable to script learning and inference processes such that they can, for example, easily be repeated as we obtain new data or as our models change. This page lists a few example scripts to give you an idea of how simple it really is.

Contents:

JyProbCog: A Unified Inference Interface for Jython

The ''jyprobcog'' library allows to conveniently script inference tasks, making use of the Java implementations of BLNs and MLNs (J-MLNs).

Working with Model Pools

Although model pools are primarily designed for use in client-server applications, we can also use them in scripts.

Model pools are defined in XML files. A very simple model pool file could look like this:

<pool>
	<model name="meals_bln" type="BLN" path="meals">
		<file type="decls" name="meals_any_for.blnd" />
		<file type="network" name="meals_any_for_functional.xml" />
		<file type="logic" name="meals_any_for_functional.blnl" />
	</model>
	<model name="alarm_mln" type="MLN" path="alarm">
		<file type="network" name="alarm-noisyor.mln" />
	</model>
	<model name="alarm_bln" type="BLN" path="alarm">
		<file type="decls" name="alarm.blnd" />
		<file type="network" name="alarm.pmml" />
		<file type="logic" name="alarm.blnl" />
		<param name="inferenceMethod" value="EnumerationAsk" />
	</model>
	<model name="smokers" type="MLN" path="smokers">
		<file type="network" name="wts.pybpll.smoking-train-smoking.mln" />
	</model>
</pool>

Source: examples/examples.pool.xml

For any model, we can specify an arbitrary number of default parameters that are always to be used for the particular model (e.g. the inferenceMethod parameter for the model alarm_bln). We can, of course, override these defaults in our scripts.

Here's a script that makes use of the models referenced in the pool above:

from jyprobcog import *

# load pool of models
pool = ModelPool("examples.pool.xml")

# query alarm BLN and MLN models
evidence = [
	"livesIn(James, Yorkshire)",
	"livesIn(Stefan, Freiburg)",
	"burglary(James)",
	"tornado(Freiburg)",
	"neighborhood(James, Average)",
	"neighborhood(Stefan, Bad)"]    
for modelName in ("alarm_mln", "alarm_bln"):
	print "\n%s:" % modelName
	for result in pool.query(modelName, ["alarm(x)", "burglary(Stefan)"], evidence, verbose=False):
		print result

Source: examples/inference_script.py

Note that we are querying two models (one MLN and one BLN) using exactly the same interface.

Run the script using jython inference_script.py in the examples directory. The output:

alarm_mln:
0.930400  alarm(James)
0.948200  alarm(Stefan)
0.515800  burglary(Stefan)

alarm_bln:
0.972000  alarm(James)
0.962000  alarm(Stefan)
0.567000  burglary(Stefan)

We can easily parameterize an inference procedure by specifying an arbitrary number of keyword arguments. (Above, we only had one keyword argument, verbose=False, which turned off any output generated by the inference method itself). Here's an example that queries the meals model and uses a number of additional parameters (specified in a dictionary):

# query meals BLN using time-limited inference
queries = ["mealT", "usesAnyIn(x,Bowl,M)"]
evidence = ["takesPartIn(P, M)", "takesPartIn(P2, M)", "consumesAnyIn(P, Cereals, M)"]
params = {
	"verbose": True,
	"inferenceMethod": "BackwardSampling",
	"timeLimit": 5.0,
	"infoTime": 1.0,
}
pool.query("meals_bln", queries, evidence, **params)    

Here, we are not suppressing the output generated by the inference method (verbose=True) and we are using the time-limited version of backward simulation, the timeLimit being set to 5.0 seconds. The infoTime parameter causes intermediate results to be printed every 1.0 seconds.

Directly loading MLNs and BLNs

It is not required for a model to be part of pool for it to be queried (though pools are convenient). We can directly construct either a BLNModel or an MLNModel and make use of exactly the same query interface as above. Here's a snippet that loads the smokers MLN and queries it:

from jyprobcog import *

# query smokers MLN (not in pool)
print "\nsmokers:"
mln = MLNModel("smokers/smoking.mln")
for result in mln.query("Smokes(Anna)", ["Smokes(Bob)", "Friends(Anna,Bob)"], verbose=False):
	print result

We could construct and query a BLN model in a similar fashion: BLNModel(declsFilename, networkFilename, logicFilename)

The query method works the same way as for model pools (only we need not specify a model name, obviously).

MLN Inference Scripting with Python

The JyProbCog library currently supports ProbCog's Java libraries for MLN and BLN inference. If you are primarily interested in MLN inference, you have two further options, as briefly described below.

The PyMLNs Library

To use the Python-based MLN engine instead of the Java-based engine that is used by JyProbCog, we can directly apply the Python API of PyMLNs.

Here's a short Python script that computes a query using the smokers MLN:

from MLN import *

mln = MLN("wts.pybpll.smoking-train-smoking.mln")
mrf = mln.groundMRF("smoking-test-smaller.db")
queries = ["Smokes(Ann)", "Smokes(Bob)", "Smokes(Ann) ^ Smokes(Bob)"]
results = mrf.inferMCSAT(queries, verbose=False)
for query, prob in zip(queries, results):
    print "  %f  %s" % (prob, query)

Source: examples/smokers/pymlns_example.py

The output:

  0.452800  Smokes(Ann)
  0.142000  Smokes(Ann) ^ Smokes(Bob)
  0.238800  Smokes(Bob)

Because we specified verbose=False, the inference process does not produce any additional output.

The QueryTool Wrapper

You can also make use of the MLN Query Tool's wrapper to make inference calls to all the engines that the tool supports. Here's an example that passes the same query to all three supported MLN inference engines:

from mlnQueryTool import MLNInfer

inf = MLNInfer()
mlnFiles = ["wts.pybpll.smoking-train-smoking.mln"]
db = "smoking-test-smaller.db"
queries = "Smokes"
output_filename = "results.txt"
allResults = {}
tasks = (("MC-SAT", "PyMLNs"), ("MC-SAT", "J-MLNs"), ("MC-SAT", "Alchemy - August 2010 (AMD64)"))
for method, engine in tasks:
	allResults[(method,engine)] = inf.run(mlnFiles, db, method, queries, engine, output_filename,
                                              saveResults=True, maxSteps=5000)

for (method, engine), results in allResults.iteritems():
	print "Results obtained using %s and %s" % (engine, method)
	for atom, p in results.iteritems():
		print  "  %.6f  %s" % (p, atom)

Source: examples/smokers/querytool_example.py

The output:

Results obtained using PyMLNs and MC-SAT
  0.469200  Smokes(Ann)
  0.327400  Smokes(Bob)
Results obtained using Alchemy - August 2010 (AMD64) and MC-SAT
  0.472203  Smokes(Ann)
  0.343616  Smokes(Bob)
Results obtained using J-MLNs and MC-SAT
  0.474200  Smokes(Ann)
  0.345400  Smokes(Bob)

BLN Inference Scripting with Jython

Example 1

Here's a Jython script that queries the ''meals'' model.

  from jyblns import infer

  network = "meals_any_for_functional.xml"
  decls = "meals_any_for.blnd"
  logic = "meals_any_for_functional.blnl"
  inferenceMethod = "LikelihoodWeighting"
  evidenceDB = "query2.blogdb"
  queries = "name,usesAnyIn(x,Plate,M)"
  inf = infer(network, decls, logic, inferenceMethod, evidenceDB, queries, args=["--confidenceLevel=0.95"])

  for result in inf.getResults():
      print result

Source: examples/meals/inference_script.py

In args, we can specify any of the parameters that are supported by the BLNinfer app (issue command to the console for help).

The output:

  name(P1) ~ Frank: 0.15395434374819145, Charly: 0.11770215872367056, Emily: 0.17835276188745905, 
      Anna: 0.13949866695914753, Dorothy: 0.22996125909666892, Bert: 0.18053080958485512
  name(P3) ~ Frank: 0.1301522895355616, Charly: 0.13878919351464314, Emily: 0.19012310254477874, 
      Anna: 0.16327138727971174, Dorothy: 0.20366246595672868, Bert: 0.17400156116856863
  name(P2) ~ Frank: 0.13822631521409542, Charly: 0.13177546807435006, Emily: 0.18139164983898534, 
      Anna: 0.14539097470979825, Dorothy: 0.22480551699023912, Bert: 0.17841007517252438
  usesAnyIn(P1,Plate,M) ~ True: 0.6730501979998668, False: 0.32694980200012724
  usesAnyIn(P3,Plate,M) ~ True: 0.68600117734425, False: 0.31399882265574375
  usesAnyIn(P2,Plate,M) ~ True: 0.6640724469729609, False: 0.3359275530270331

Example 2

In this excerpt of a variation of the first example, we print all the results ourselves by directly accessing the inf object that the inference process returned:

  for r in inf.getResults():
      print "%s" % r.varName
      for i in range(r.getDomainSize()):
          print "  %f  %s" % (r.probabilities[i], r.domainElements[i]),
          if r.additionalInfo is not None:
              interval = r.additionalInfo[i]
              print " [%f;%f]" % (interval.lowerEnd, interval.upperEnd),
          print
  print "time taken: %fs" % inf.getSamplingTime()
  print "steps taken: %d" % inf.getNumSteps()

Source: examples/meals/inference_script2.py

The output:

  name(P1)
    0.144996  Frank  [0.124553;0.168196]
    0.129045  Charly  [0.109700;0.151269]
    0.191428  Emily  [0.168259;0.216992]
    0.156962  Anna  [0.135754;0.180834]
    0.204797  Dorothy  [0.180952;0.230933]
    0.172772  Bert  [0.150622;0.197462]
  name(P3)
    0.154836  Frank  [0.133760;0.178592]
    0.141516  Charly  [0.121304;0.164510]
    0.171245  Emily  [0.149183;0.195859]
    0.154705  Anna  [0.133637;0.178453]
    0.211857  Dorothy  [0.187672;0.238278]
    0.165842  Bert  [0.144096;0.190183]
  name(P2)
    0.165775  Frank  [0.144032;0.190112]
    0.123136  Charly  [0.104224;0.144974]
    0.198498  Emily  [0.174966;0.224370]
    0.159914  Anna  [0.138524;0.183943]
    0.192391  Dorothy  [0.169171;0.217997]
    0.160287  Bert  [0.138874;0.184336]
  usesAnyIn(P1,Plate,M)
    0.688682  True  [0.659296;0.716607]
    0.311318  False  [0.283393;0.340704]
  usesAnyIn(P3,Plate,M)
    0.668035  True  [0.638242;0.696526]
    0.331965  False  [0.303474;0.361758]
  usesAnyIn(P2,Plate,M)
    0.698704  True  [0.669537;0.726330]
    0.301296  False  [0.273670;0.330463]
  time taken: 0.485000s
  steps taken: 1000

The intervals at the end of each line are estimations of the intervals within which the true result lies with a confidence level of 0.95. The additional parameter --confidenceLevel triggered the computation of these intervals.

Clone this wiki locally