Java integration

Now you can embed and execute Java code directly in Stata. Previously, you could create and use Java plugins in Stata, but that required you to compile your code and bundle it in a JAR file. You can still do this, but you can now also write your Java code directly in do-files, ado-files, or even interactively. The Java code you write compiles on the fly—an external compiler is not necessary! You may even be able to write parallelized code to take advantage of multiple cores.

The included Stata Function Interface (sfi) Java package provides a bidirectional connection between Stata and Java.

Stata bundles the Java Development Kit (JDK) with its installation, so there is no additional setup involved. This version of Stata includes Java 11, which is the current long-term support (LTS) version.

Let’s see it work

Example 1: Invoke Java interactively

If you are familiar with JShell, the following will look similar to you.

In this example, we asked Java to print “Hello, Java!” We can call Java interactively from Stata’s Command window.

Example 2: Embed Java code in a do-file

We can embed Java in a do-file, just as we do with Mata and Python code. Just place the Java code between java: and end.

We have an ArrayList containing four different coffee drinks (“Latte”, “Mocha”, “Espresso”, “Cappuccino”).

We sorted the ArrayList using Collections.sort() and then printed the sorted list, one on each line.


Example 3: Embed Java code in an ado-file

In contrast with Python, Java libraries tend to have a lower-level implementation, meaning you might have to write a bit more code to do what you want. This, however, often gives greater flexibility and performance. One of Java’s strengths is in its extensive APIs, which are packaged with the Java virtual machine. See the Java Development Kit for details. There are also many useful third-party libraries available.

Let’s assume that Stata’s egen command did not already have a rowmean function. We could use Java integration to write a Stata command with this functionality.

We put our Java code right in our ado-file. It compiles on the fly as the ado-file is loaded and executed.

Let’s set up some sample data.

Let’s run the command.

. rowmean v*, gen(rmean)

That ran in 3.2 seconds. That’s not bad. Can we make it faster? You bet we can! We can take advantage of our multiple processors. The machine I am testing on is an I7-5820K and has 6 CPU cores. Another optimization we can make is to not ask Stata repeatedly for each variables index. Instead, we can get that information once and cache it.

Example 4: Make it faster

Here is our improved command:

Did that make it faster? It sure did! Our original timing was 3.2 seconds. This time, using the same dataset, the command finished in .79 seconds. Most of the improvement was from making our code run in parallel. If you don’t look closely, you might not notice the change we made. Because we are using “LongStream” to loop over the observations, we can ask Java to run that in parallel by invoking the parallel() method before foreach() in the lambda expression. Keep in mind that writing parallel code has many pitfalls and may not be easy for more complicated problems. And for other problems, parallelization may not be beneficial.