Tag Archives: raspi

Expert Systems and JavaEE on ARM: a simple benchmark

This post is to report my findings while experimenting and – a simple, overall – benchmark of a JavaEE use case on ARM platforms. I currently have on my desk a Raspberry Pi (model B) and an Odroid-U2 now: given my interest on Expert Systems, I thought this could be a great way to test them out!

Photo 14-08-13 10 53 28Premise: I’m not a guru on Expert Systems, in fact I consider myself just a happy power user, so it is not my intention to delve into the debate on how an Expert System should be benchmark-ed, this is not in the scope of this post. Likewise, is not in the scope of this post to report a fully comprehensive benchmark comparison of running Java/JavaEE on these platforms.

In fact, much simplier:

GOAL: Given the use case of a JavaEE application which provides a reasoning service, benchmark the overall performance on the different platforms.

The Use Case

For the reasoning service, I use my all time favorite, JBoss Drools. On their GitHub repository, they provide several examples and benchmarks, based on published papers related to the Rete algorithm. Again, while I’m aware of the big discussion if actually these benchmarks are still relevant nowadays, given the progress on the Expert System algorithms, that debate is not impacting on this use case, because here the benchmark is used for a relative comparison.

I have a very simple webservice:

@Stateless
@WebService
public class WaltzWs {
	@EJB
	WaltzKb waltzKb;

	@WebMethod
	public String waltz(@WebParam(name="WaltzDTO")WaltzDTO dto) {

		StatefulKnowledgeSession session = waltzKb.getKbase().newStatefulKnowledgeSession();

		for (Line l : dto.getLine()) {
			session.insert(l);
		}
		session.insert(dto.getStage());
		long start = System.currentTimeMillis();
		session.setGlobal( "time", start );

		session.fireAllRules();
		long time = System.currentTimeMillis() - start;
		System.err.println( time );

		session.dispose();
		return "time: "+time;
	}
}

which exposes the reasoning functionality by webservice call. When the webservice is consumed, a new Knowledge session is created, the content of the SOAP message is insert-ed into the Working memory, and then all the rules are evaluated. This webservice relates to the second half of the Waltz benchmark as linked above on the Drools GitHub repo.

For the actual Knowledge base, this is created by a Singleton EJB:

@Singleton
@Startup
public class WaltzKb {
	private static final transient Logger logger = LoggerFactory.getLogger(WaltzKb.class);
	private KnowledgeBase kbase;

	@PostConstruct
	public void init() {
		KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();
        kbuilder.add( ResourceFactory.newClassPathResource("waltz.drl", WaltzKb.class), ResourceType.DRL );
        if (kbuilder.hasErrors()) {
        	for (KnowledgeBuilderError error : kbuilder.getErrors()) {
        		logger.error("DRL Error "+error);
        	}
        }
        Collection<KnowledgePackage> pkgs = kbuilder.getKnowledgePackages();
        kbase = KnowledgeBaseFactory.newKnowledgeBase();
        kbase.addKnowledgePackages( pkgs );
	}

	public KnowledgeBase getKbase() {
		return kbase;
	}
}

Taking the Webservice + Singleton EJB approach, I can have several webservice calls happening at the same time, each with its own Knowledge session, while actually the Knowledge base is efficiently shared among them.

All the code, and the benchmark project file with results, available on GitHub.

The Benchmark

In order to load test this JavaEE application, i.e.: the webservice, I use SoapUI:

Screen Shot 2013-08-14 at 11.48.27

I created two webservice request template, each reflecting the “12” and “50” data file of the original JBoss Drools “waltz” benchmark. Then, before actually running the load test, I consume the webservice a couple of times, just to “warm up” the JavaEE container – in this case, JBoss AS.

I have performed load test session of 60s, with 1 thread first – i.e.: all webservice calls are sequential, await for the first webservice call to return before starting new one. Then followed by other load test session of 60s, this time with 2, 3 and 4 thread – i.e.: concurrent webservice calls, similarly to a stress test of the system being used by multiple “users”.

There are some limitations applying here, that’s why I put all the premises above to warn that this cannot be considered a comprehensive benchmark, more of a simple one to get the overall benchmark figures:

  • Raspberry Pi is single core, so this platform is put on disadvantage when the load test session is performed with 2+ threads.
  • for the performance baseline I’m using a MacBook Air (mid-2011, 1,8 GHz Intel Core i7) having for JVM the JDK 6, while on both the ARMs platforms I’ve got JDK 8ea, build 1.8.0-ea-b99. So yeah, JVM and architecture of the baseline for the figures is quite a different beast, but again, this is just to get an overall performance indicator.
  • while on both the MacBook Air and the Odroid I can leave both the flags: -server -Xmx512m, while starting the JavaEE container JBoss AS, this is not possible on the Raspberry Pi, where I have to change them into -client -Xmx400m given constraints of the memory and the ServerVM is currenlty implemented only since ARM7, and the Raspberry Pi is an ARM6. Please bear in mind on the ARM is a Early Access version of the JVM.
  • for the performance baseline test is performed on localhost, so the overhead of the LAN is not included in the figures.

The Results

I have to say I’m quite impressed with the results. Although it is an overall performance indicator, it provides great insights there is plenty of potential in using JavaEE on an ARM embedded platform – and I’m specifically referring to the Odroid. The Raspberry Pi suffers a lot in this case, possibly an unfair comparison due to the computational resource intensive use case of this scenario.

Below are the results of the load test; columns are type of test (waltz12, waltz50) and number of threads used for the load session, rows is platform (localhost is the baseline MacBook Air), figures are expressed in average ms of response of the webservice, within the load test session.

Screen Shot 2013-08-14 at 12.25.51

Below are the same results, this time figures are expressed in percentage with reference to localhost (MacBook Air) as the baseline.

Screen Shot 2013-08-14 at 15.02.39

My perspective on these results, considering the Raspberry Pi and the Odroid: Odroid is also an ARM embedded platform as the RPi, but with 4x the cores, 4x the RAM and priced $89 Vs $35 (meaning 2.5x) which is still very cheap. I think the most make it the fact that it is a multi-core. With this specs, we’re improving the performance of the above use case scenario with reference to the Intel i7 baseline, from ~130x slower on the Raspberry Pi, to ~4x slower on the Odroid. I mean, IMHO, this is A LOT.

Why do I blog this

I do believe this is a good experiment to show the potential of JavaEE on ARM embedded platform; I’m really curious to perform again these test once the JDK is fully released! Given the small size of these platforms and their small power requirements, I think is a great way to have Pervasive and Mobile Expert Systems!

(Bladerunner mode ON:) I do also believe we might see in the future a platform shift in the data centers, as we know them nowadays: from the current platforms, to smaller and less power-eager platforms, like these two ARM platforms I’ve presented in this post. Potentially this also make a case from shifting from air cooling, to liquid cooling, by submerging this tiny size computer in mineral oil?

Advertisements