Welcome!

Java Authors: Maureen O'Gara, Bruce Armstrong, Liz McMillan, Walter H. Pinson, III, Yakov Werde

Related Topics: Java

Java: Article

Translation-Based Integration

Rocky road made smooth

Now let's cook up a C++ class that does the same thing. That should be easy, shouldn't it? We could probably write a simple parser and translator and come up with the automatically generated C++ type in the following.


class Person
{
public:
std::string name;
Person() : name() {}
Person( const Person & p ) : name( p.name ) {}
~Person() {}
std::string getName() { return name; }
void setName( std::string n ) { name = n }
static void delete( Person & p )
{
PersonUtil::delete( p.name );
}
static Person create( std::string n )
{
return Person( n );
}
};

That's a straightforward look-alike of the original Java type in C++. We'll have at least one immediately obvious problem here: delete is a reserved word in C++, so the generated code won't compile. In the case of delete this is an easily solved problem; we simply use _delete or some other mangled form of the reserved word. The ease of solving this problem hides a deeper problem though: C++ has a preprocessor that usually predefines many macros. Each of those macros really has to be treated like a reserved word, and that is going to be complicated without a sophisticated tool.

Then there is a host of less obvious problems with this naïve translation. Let's go through them one by one.

1.  We translated the Java Person( Person ) constructor into the C++ copy constructor. This is totally appropriate based on its call signature, but it might not be appropriate semantically. The C++ copy constructor is invoked automatically by the compiler in many scenarios, whereas the original Java constructor is only supposed to be invoked explicitly by the programmer. Will our generated code have the correct semantics? It depends on the type, which is usually not a satisfactory answer.
2.  In Java, we had a string field; in C++ we have a corresponding std::string field. There's another semantic difference hidden in this seemingly obvious translation: String instances are immutable, whereas std::string instances can be modified. Yes, we could try working with const or mutable modifiers, but those modifiers might collide with other intended semantic usages.
3.  The create method in the Java class returns a new Person instance. We're facing the question of what to return from the corresponding C++ method. If we return the Person instance by value as in the sample code, we make sure that it does not get created on the heap and that we don't have a memory leak if we ignore the return value (something that's usually safe to do in garbage collecting Java). The downside of this approach is that we end up with an instance of class Person. We can now never return a subtype of Person from this method unless we return the result through a pointer to Person or a reference to Person. But if we return the result via a pointer, we put the burden of freeing the instance on the caller, which is a built-in memory leak as the return value is usually ignored on the Java side. This demonstrates how a perfectly correct and proper translation can yield either semantically incorrect or semantically hard-to-use code. It's a true catch 22.
4.  In the delete method, we're calling into the PersonUtil type. This means we have to translate this as well, or make it available in some other fashion in order for the integration project to be a success. In a typical integration project, you quickly face a type explosion due to these dependencies. For example, if you analyze the Java type object, you find out that it references (directly or indirectly) between 250 and 350 types. A good translation-based integration tool will allow you to prune down the type set to just the types that you're interested in without compromising usability.

Other big semantic issues in a Java/C++ translation involve:

  • Inheritance semantics (is inheritance allowed or not)
  • Exception semantics (C++ exception declarations have very different semantics from Java exception declarations)
  • Interface semantics
  • Life cycle management and copy semantics
  • Thread semantics
Just translating a Java statement to a corresponding C++ statement rarely does the job. The problem is mostly not the syntax differences between the two languages but rather the semantic differences between the languages. Also think about the work that might be required to replace the platform infrastructure like the String class. In Java, String has a lot of functionality; how much depends on which version of String you're looking at. If we simply translate String to std::string, what's going to happen to the original Java functionality of String?

The point I'm trying to make is that naïve translations very quickly run into lots of problems unless the translated API is very small. Where does this leave us? It leaves us with two basic options.

1. Put the semantic integration burden on the user
This is the path that most EAI strategies like Web services or CORBA take. The user creates what amounts to an integration model (IDL, WSDL) and implements it in terms of his or her favorite technology.

Some tools can help by generating integration models based on existing code, but, as I already pointed out, this is a tricky proposition because some semantic usability (extensibility, life-cycle, etc.) cannot be expressed in your integration model of choice. Also as already mentioned, this approach works best for component models with relatively few and relatively simple entry points. Simple means that you don't have many complex, user-defined or platform types, but rather primitive types or pseudo-primitive types, like String or Date.

2. Put the semantic integration burden on the translation tool
This is the path that is taken by some integration solutions that are exclusively targeting language-integration problems. These tools create very sophisticated type models that mirror (to the extent possible) the underlying type model.

The developer can use arbitrarily complex user- or platform-defined types as if they were written in his or her language.

In my opinion, the latter alternative is the better one because it avoids the pitfalls of having to constantly maintain an integration model and because the "unexpected" API differences are small. The more a developer has to learn when using a type that is written in a different language, the less successful the integration technology is going to be. Any special knowledge that is required creates "friction," and friction causes bugs, and bugs cost money and cause the tool to be unpopular.

I believe that the automated translation of code (source or object) into another language, be it an implementation translation or a calling interface translation, is the way to go for language integration.

More Stories By Alexander R. Krapf

Krapf has over 15 years experience in software engineering, product development, and project management in the United States and Europe. In addition to founding and managing CodeMesh, Krapf has worked for IBM, Thomson Financial Services, Hitachi, Veeder-Root, and Document Directions, Inc. He has been extensively involved in a variety of complex product development efforts using his in-depth understanding of .NET, C++ and Java.

Krapf has been published in technology journals and been a speaker at a variety of industry conferences. He received a Bachelor of Science degree in Electrical Engineering from the University of Stuttgart, Germany, and can be reached at alex@codemesh.com.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.