The Shared Scientific Toolbox in Java
The Shared Scientific Toolbox in Java (SST) is a collection of foundational scientific libraries. Its primary purpose is to serve as a bridge between the highly specific demands of involved scientific calculations and the more traditional aspects of the Java programming language. True to the Java way, the SST strives for idioms and primitives that are powerful and yet general enough to enable the user to write concise, correct, and fast code for most scientific tasks. The SST is best suited for deployment code where integration and portability are priorities -- in other words, prototype in MATLAB, deploy in the SST.
The SST includes many member packages with the goal of simplifying scientific programming. These include and are not limited to:
The next five sections offer a cross section of the features described above.
The org.shared.array package allows the user to interact with multidimensional numerical data in a structured, concise manner. The following picture demonstrates how one goes about multiplying two RealArrays (which happen to be two-dimensional and interpreted as matrices).
An asynchronous model for sockets, aside from the performance benefits associated with being built on top of select, can reduce programmer error through a concise API and well-defined set of usage conventions. The SST implements a complete messaging layer upon which high level protocols may be built. Users may override any level of the networking stack, with the choice dependent on the level of specific functionality desired. The end goal is to empower users to build scalable, thread-safe distributed programs. The illustration below demonstrates a fairly typical, network-aware message passing architecture in the style of SEDA, with the asynchronous sockets layer marked in red. In it, packets come in from asynchronous sockets; packets are transformed into events; events are then dispatched to processing units; processing units serialize response events; and finally serialized responses are written back to asynchronous sockets.
The subsystem described above offers significant advantages over the traditional one-thread-per-socket model of network programming:
Switching from traditional I/O constructs to an NIO framework requires careful initial consideration; consequently, we invite you to peruse the networking chapter of the user manual to see if the SST's networking layer is appropriate for your problem. If not, do check out the excellent Apache MINA and Netty projects as alternatives.
A parallel dataflow engine separates the specification of a parallel computation from the actual execution of it. All that is required from the user is a description of the atomic units of work involved and their interdependencies. Upon receiving an input, the engine will carry out the computation in parallel to the fullest extent that its dependencies allow. The illustration below depicts a very simple engine whose nodes perform arithmetic operations on their inputs. Conceptually, one may view engine execution as pushing, and in the process transforming, an initial input from a source node through a series of calculation nodes to an eventual sink node.
The above approach has multiple computational and organizational advantages:
Whereas MATLAB is a sheltered execution environment with all statistical tools in one place, the SST, being compatible with vanilla Java code, takes a mix-and-match approach.
In providing a statistical package, we pay particular attention to design and abstraction, since these are what the base language encourages.
Time constraints, however, dictate the implementation of only a handful of concepts -- we welcome the user to check out what's currently available (e.g., classes for combinatorics and Gaussian mixture modeling).
One noteworthy subpackage is org.shared.stat.plot.
Here, one will find abstractions modeled for basic plotting functionality.
Currently, we provide a Gnuplot-backed implementation of the interfaces.
Consequently, we were able to generate the surface plot shown below with 14
lines of non-Gnuplot-specific code.
In addition to calculation-driven libraries, the SST contains declarative, annotation-driven APIs that attempt to reduce the tedium of writing repetitive, boilerplate code. With Java 1.5 annotations, programmers can describe how their program is supposed to work without having to write control flow logic.
One such annotation-driven API marshalls resources and loads them on program start. Normally, programs depend on third party libraries, and, by association, their Jar files. In general, it's not good practice to unpack Jar files and dump their contents onto the main class path. Declaring an overly long class path for the system class loader isn't a viable solution either, as one exposes the underlying needs of the program to a shell script or the like, and that breaks portability. The SST program loader addresses the above issues by inserting a level of indirection between resource management and program execution. The illustration below demonstrates just how the loader interrogates a designated main class for resource targets (delimited in red along with annotations). Quite simply, the annotations say that the paths found in "embedded.jar", which itself is a resource on the bootstrap class path, should be visible to a newly created RegistryClassLoader. Moreover, as an added service, said class loader should also load the native JNI library "stuff/libnative.so".
By controlling class loading behavior with annotations, users can achieve the following effects:
To send you on your way, here's how to obtain the SST and/or learn more about it:
The fastest way to test drive the SST is to download the pure Java distribution, given as a Jar, and type
java -jar sst.jar
,
which will kick off a suite of JUnit tests.
No functionality is lost in using pure Java backend bindings; however, we highly recommend compiling the native layer for speed.
Thus, once you are satisfied that the SST will meet your needs, download the full, Tar'd distribution and build the Java sources as well as the C++ sources constituting the native layer.
Windows users may benefit from a special source distribution that contains precompiled DLLs and a special executable, buildandtest.exe
, for compiling Java classes and unit testing the whole package.
Finally, Eclipse users can import the source distribution or version control working image, which contain .project
and .classpath
files, directly.
Note that the IvyDE plugin is required to properly set up the class path.
Make sure you have Java 1.6.*+ handy. Contact the administrator if you have any lingering doubts and/or questions.