Expert Systems and JavaEE on ARM: a simple benchmark

This post is to report my findings while experimenting and – a simple, overall – benchmark of a JavaEE use case on ARM platforms. I currently have on my desk a Raspberry Pi (model B) and an Odroid-U2 now: given my interest on Expert Systems, I thought this could be a great way to test them out!

Photo 14-08-13 10 53 28Premise: I’m not a guru on Expert Systems, in fact I consider myself just a happy power user, so it is not my intention to delve into the debate on how an Expert System should be benchmark-ed, this is not in the scope of this post. Likewise, is not in the scope of this post to report a fully comprehensive benchmark comparison of running Java/JavaEE on these platforms.

In fact, much simplier:

GOAL: Given the use case of a JavaEE application which provides a reasoning service, benchmark the overall performance on the different platforms.

The Use Case

For the reasoning service, I use my all time favorite, JBoss Drools. On their GitHub repository, they provide several examples and benchmarks, based on published papers related to the Rete algorithm. Again, while I’m aware of the big discussion if actually these benchmarks are still relevant nowadays, given the progress on the Expert System algorithms, that debate is not impacting on this use case, because here the benchmark is used for a relative comparison.

I have a very simple webservice:

@Stateless
@WebService
public class WaltzWs {
	@EJB
	WaltzKb waltzKb;

	@WebMethod
	public String waltz(@WebParam(name="WaltzDTO")WaltzDTO dto) {

		StatefulKnowledgeSession session = waltzKb.getKbase().newStatefulKnowledgeSession();

		for (Line l : dto.getLine()) {
			session.insert(l);
		}
		session.insert(dto.getStage());
		long start = System.currentTimeMillis();
		session.setGlobal( "time", start );

		session.fireAllRules();
		long time = System.currentTimeMillis() - start;
		System.err.println( time );

		session.dispose();
		return "time: "+time;
	}
}

which exposes the reasoning functionality by webservice call. When the webservice is consumed, a new Knowledge session is created, the content of the SOAP message is insert-ed into the Working memory, and then all the rules are evaluated. This webservice relates to the second half of the Waltz benchmark as linked above on the Drools GitHub repo.

For the actual Knowledge base, this is created by a Singleton EJB:

@Singleton
@Startup
public class WaltzKb {
	private static final transient Logger logger = LoggerFactory.getLogger(WaltzKb.class);
	private KnowledgeBase kbase;

	@PostConstruct
	public void init() {
		KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();
        kbuilder.add( ResourceFactory.newClassPathResource("waltz.drl", WaltzKb.class), ResourceType.DRL );
        if (kbuilder.hasErrors()) {
        	for (KnowledgeBuilderError error : kbuilder.getErrors()) {
        		logger.error("DRL Error "+error);
        	}
        }
        Collection<KnowledgePackage> pkgs = kbuilder.getKnowledgePackages();
        kbase = KnowledgeBaseFactory.newKnowledgeBase();
        kbase.addKnowledgePackages( pkgs );
	}

	public KnowledgeBase getKbase() {
		return kbase;
	}
}

Taking the Webservice + Singleton EJB approach, I can have several webservice calls happening at the same time, each with its own Knowledge session, while actually the Knowledge base is efficiently shared among them.

All the code, and the benchmark project file with results, available on GitHub.

The Benchmark

In order to load test this JavaEE application, i.e.: the webservice, I use SoapUI:

Screen Shot 2013-08-14 at 11.48.27

I created two webservice request template, each reflecting the “12” and “50” data file of the original JBoss Drools “waltz” benchmark. Then, before actually running the load test, I consume the webservice a couple of times, just to “warm up” the JavaEE container – in this case, JBoss AS.

I have performed load test session of 60s, with 1 thread first – i.e.: all webservice calls are sequential, await for the first webservice call to return before starting new one. Then followed by other load test session of 60s, this time with 2, 3 and 4 thread – i.e.: concurrent webservice calls, similarly to a stress test of the system being used by multiple “users”.

There are some limitations applying here, that’s why I put all the premises above to warn that this cannot be considered a comprehensive benchmark, more of a simple one to get the overall benchmark figures:

  • Raspberry Pi is single core, so this platform is put on disadvantage when the load test session is performed with 2+ threads.
  • for the performance baseline I’m using a MacBook Air (mid-2011, 1,8 GHz Intel Core i7) having for JVM the JDK 6, while on both the ARMs platforms I’ve got JDK 8ea, build 1.8.0-ea-b99. So yeah, JVM and architecture of the baseline for the figures is quite a different beast, but again, this is just to get an overall performance indicator.
  • while on both the MacBook Air and the Odroid I can leave both the flags: -server -Xmx512m, while starting the JavaEE container JBoss AS, this is not possible on the Raspberry Pi, where I have to change them into -client -Xmx400m given constraints of the memory and the ServerVM is currenlty implemented only since ARM7, and the Raspberry Pi is an ARM6. Please bear in mind on the ARM is a Early Access version of the JVM.
  • for the performance baseline test is performed on localhost, so the overhead of the LAN is not included in the figures.

The Results

I have to say I’m quite impressed with the results. Although it is an overall performance indicator, it provides great insights there is plenty of potential in using JavaEE on an ARM embedded platform – and I’m specifically referring to the Odroid. The Raspberry Pi suffers a lot in this case, possibly an unfair comparison due to the computational resource intensive use case of this scenario.

Below are the results of the load test; columns are type of test (waltz12, waltz50) and number of threads used for the load session, rows is platform (localhost is the baseline MacBook Air), figures are expressed in average ms of response of the webservice, within the load test session.

Screen Shot 2013-08-14 at 12.25.51

Below are the same results, this time figures are expressed in percentage with reference to localhost (MacBook Air) as the baseline.

Screen Shot 2013-08-14 at 15.02.39

My perspective on these results, considering the Raspberry Pi and the Odroid: Odroid is also an ARM embedded platform as the RPi, but with 4x the cores, 4x the RAM and priced $89 Vs $35 (meaning 2.5x) which is still very cheap. I think the most make it the fact that it is a multi-core. With this specs, we’re improving the performance of the above use case scenario with reference to the Intel i7 baseline, from ~130x slower on the Raspberry Pi, to ~4x slower on the Odroid. I mean, IMHO, this is A LOT.

Why do I blog this

I do believe this is a good experiment to show the potential of JavaEE on ARM embedded platform; I’m really curious to perform again these test once the JDK is fully released! Given the small size of these platforms and their small power requirements, I think is a great way to have Pervasive and Mobile Expert Systems!

(Bladerunner mode ON:) I do also believe we might see in the future a platform shift in the data centers, as we know them nowadays: from the current platforms, to smaller and less power-eager platforms, like these two ARM platforms I’ve presented in this post. Potentially this also make a case from shifting from air cooling, to liquid cooling, by submerging this tiny size computer in mineral oil?

Advertisements

4 thoughts on “Expert Systems and JavaEE on ARM: a simple benchmark”

  1. Hi Mateo,
    Nice post, I’m also doing some experiments with Drools 6 and the Raspberry Pi (+ arduino) my tests are more on the practical side, not benchmarking but I found interesting the idea of comparing how these little devices perform. I will check out your code to see if I can use some of it in my experiments. Are you planning to expand your work to a more practical application?

    By the way, which operating systems are you using in both platforms?
    I’ve noticed that you used SOAP UI to perform the benchmarks, and that will be adding the network latency + the deserialization and marshalling of the data on top of the Rule Engine performance. It would be nice to see a log inside the web service that only measure the rules execution.

    Like

    1. Ciao Mauricio, thanks!

      In fact the more practical application I’m planning is all about Pervasive (ie: “embedded”) Expert System. So far, I’ve connected Drools to my toothbrush as per an earlier blog post on this site :D I know sounds a stupid/silly thing to do, but the background idea is that I want to see how far Pervasive and Mobile Expert Systems can get, to monitor my habits, or to help me extract/consolidate data from my daily life? I’m planning to plug Drools to other scenario in these terms, for example my workout routines, etc.

      The code for this post is basically just a reconversion of the Drools waltz benchmark into a JavaEE 6 webservice + EJB. I totally hear you about the “overhead” of the SOAP based webservice, but I wanted to focus on a more complete use-case, rather than just a custom JavaSE application. I can tell you I’ve tested the waltz benchmark as-is from the drools-examples bundle, and the results are very very similar proportionally: by heart I recall the waltz50 takes about 40s on the RPi, 3.8s on the Odroid, and <1s on my MacBook. Please notice these figures are actually higher than the one presented on this post: reason is, I believe, on the drools-examples waltz.drl there are a lot of System.out which make for expensive I/O operation on the ARM platforms (even if I pipe stdout to /dev/null).

      On the RPi I got the Raspbian which is hard-float so I am forced to use JDK8ea. On the Odroid I got the Linaro Ubuntu server, again being hard-float is JDK8ea. Please notice if you try to run JBoss AS, the current builds make it hangs on startup, I'm sticking to build99 for the moment – possibly if you have just JavaSE or another Container does not impact, but you may be aware sometimes appear some regressions(?) with JDK8ea.

      Last, as I believe you have concurrent aspects in the Drools 6 core, I would strongly suggest you get your hands on a multi-core ARM platform – not necessarily the Odroid, but you should totally see improvements as soon as you have a multi-core. :)

      Hope this helps and please let's keep in touch in case, use of Expert Systems into Mobile / Pervasive application is totally on my interest list :D
      Ciao,
      Matteo

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s