Over the last few months, I’ve started to use Weka more and more. Weka is a toolkit, written in Java, that I use to create models with which to make classifications on data sets.
It features a wide variety of different machine learning algorithms (although I’ve used the logistic regressions and Bayesian networks most) which can be trained on data in order to make classifications (or ‘predictions’) for sets of instances.
Weka comes as a GUI application and also as a library of classes for use from the command line or in Java applications. I needed to use it to create some large models and several smaller ones, and using the GUI version makes the process of training the model, testing it with data and parsing the classifications a bit clunky. I needed to automate the process a bit more.
Nearly all of the development work for my PhD has been in Python, and it’d be nice to just plug in some machine learning processes over my existing code. Whilst there are some wrappers for Weka written for Python (this project, PyWeka, etc.), most of them feel unfinished, are under-documented or are essentially just instructions on how to use Jython.
So, I started work on WekaPy, a simple wrapper that allows efficient and Python-friendly integration with Weka. It basically just involves subprocesses to execute Weka from the command line, but also includes several areas of functionality aimed to provide more of a seamless and simple experience to the user.
I haven’t got round to writing proper documentation yet, but most of the current functionality is explained and demo’d through examples here. Below is an example demonstrating its ease of use.
model = Model(classifier_type = "bayes.BayesNet")
model.train(training_file = "train.arff")
model.test(test_file = "test.arff")
All that is needed is to instantiate the model with your desired classifier, train it with some training data and then test it against your test data. The predictions can then be easily extracted from the model as shown in the documentation.
I hope to continue updating the library and improving the documentation when I get a chance! Please let me know if you have any ideas for functionality.