Many of our competitors promise to deliver a code-translation that is 100% compatible and done within seconds, because of the fully automated translation process.

This sounds great and when you demonstrate that, e.g. as Heirloom Computing does here or Ispirer does here it looks impressive to the untrained eye. But why then do all these competitors shy away from showing the generated code? At best they quickly scroll over some uncritical stuff like translations of DISPLAY statements, instantiation of variables, but no translation of GOTOs, REDEFINEs, out-of-line PERFORM-THRUs, etc.

In all topics of computing you have tradeoffs (Speed vs. Memory Usage, Speed vs. Security, Simplicity vs. Feature-Richness, etc.) In software modernization, there is a tradeoff between compatibility and maintainability. 100% Compatibility is actually quite simple to achieve fully automatically. But achieving 100% compatibility automatically always involves some kind of emulation of the COBOL program (I will give some more technical details on that below).

This is where all these solutions fall flat, because when the resulting code emulates COBOL, the code doesn’t make sense to someone who doesn’t know COBOL. Great. Why were you migrating, again? Who will maintain that resulting code once your COBOL developers are gone?
In my opinion, these competitors „solve“ your problem in the wrong direction. They don’t turn your COBOL Code into Java Code – they turn the Java Compiler into a COBOL Compiler. Instead of pulling you out of the dead-end street, they just repaint the houses on the dead-end street and tell you that you were now in a different street.

Of course said competitors claim that the code they produce was „maintainable“, but when you dive into this claim you see what they mean: the code is maintainable, because the COBOL developers are able to recognize their original code in the resulting code. Precisely my point.


 
Some examples of how to achieve 100% compatibility easily:

  • The easiest way would be to take some mainframe emulator, feed the COBOL source code to a COBOL Compiler on it and execute the resulting binary on the mainframe emulator. Many of our competitors do exactly that. They call it „Platform Migration“ or „Rehosting“.
    Use a cloud-based mainframe emulator and you have „Cloud-Migration“ (including all of COBOLs inherent security-flaws open to the public!).
  • Another way would be to write a COBOL Parser in the target-language (say Java), feed it the original COBOL Source-Code at runtime and then execute that on an internal mainframe emulator. The resulting code might look something like this:
    CobolParser p = new CobolParser();
    p.Feed(„DISPLAY ‚Hello World'“);
    CobolProgram prog = p.Parse();
    MainframeEmulator mf = new MainframeEmulator();
    prog.Execute(mf);
  • Is that to far fetched? How about you strip out the Parser and let the Java Compiler build the CobolProgram object directly? The resulting code might look something like this:
    CobolProgram prog = new CobolProgram();
    prog.AddStatement(new DisplayStatement(„Hello World“));
    // or maybe even just
    prog.Display(„Hello World“);
    MainframeEmulator mf = new MainframeEmulator();
    prog.Execute(mf);
  • Maybe create the MainframeEmulator ahead of time and give it to the CobolProgram object directly so it can execute the instructions immediately:
    CobolProgram prog = new CobolProgram(new MainframeEmulator());
    // or maybe hide the Mainframe in CobolPrograms constructor so this becomes
    CobolProgram prog = new CobolProgram();
    prog.Display(„Hello World“);

This starts to look like code you see in these competitors demonstrations, although at this point, there wouldn’t even be a reason to even have the CobolProgram and MainframeEmulator objects. Just translate it to System.out.println(„Hello World“); and we’re done! Why, then, are there still artifacts of such a design observable in the codes you see in the competitors demonstrations?

Well, what about variables? COBOLs Memory Layout is very different from modern languages Memory Layouts. Some things are uncritical. An S9(6) variable „foo“ could be translated to an int foo; for example, but what if you read foo(2) or, worse, write to it?
An X(60) variable cannot simply be translated to a string, because if you set the variable to a string of (say) 40 characters, accessing character 42 will throw an exception and crash your program, which wouldn’t happen in COBOL. Further, writing a special character into such a string wouldn’t result in an error, although it would in COBOL. Also, how do you encode the string? Anything other than EBCDIC will result in possibly differently sorted tables, breaking your „100% compatibility“.

The problems so far can be solved by using ugly, slow custom types based on Byte-Arrays. But due to Rice’s Theorem, it is impossible to automatically decide whether you must use these compatible, ugly, slow, custom types – or whether you can replace them with maintainable, readable, native types. A „fully automatic, 100% compatible“ translation is, therefore, forced to use the ugly, slow custom types. I would already argue that it would be better to do some manual preprocessing and postprocessing to decide case-by-case whether to use these custom types or native types.

But here comes the dealbreaker: REDEFINEd variables.
Say you have an S9(6) in a grouped variable – say at offset 10. Also you redefine that grouped variable for example to an X(60) variable. What do you do with your integer, when someone writes to offset 12 of the X(60)?
Unless you can guarantee that nobody ever writes to offsets 10 to 17 of the X(60), you have to emulate COBOLs way of Memory-Layout here. Can you guarantee, that nobody ever writes to offsets 10 to 17 of the X(60)? No. No, you can’t, because this would contradict Rice’s Theorem.

This is why fully automatic, 100% compatible translations always emulate COBOL – at least COBOLs Memory Layout. This is why you see these artifacts in their codes. This is why fully automatic, 100% compatible translations are never truly maintainable.