Is Java „the next COBOL“?

vengelmann26. Oktober 2018COBOL

I have seen some (not so many) COBOL advocates claiming it was futile to migrate to other languages, because these other languages might one day be obsolete, too – and then you’ll have to migrate again. „They are just moving to the next COBOL“ was a catchy sentence I have read.

First of all: this is a confession that COBOL is obsolete! But would you keep driving this car, just because a new car will break down some day, too?

You have to move with the times or the times move you!

Sure – some day, Java (or whichever language you migrate to) could be obsolete, too. But migrating from COBOL to a modern language is far more difficult than migrating from one modern language to another.

Modern languages all have very similar, relatively small core-languages: Classes, Methods, Exception Handling, Loops, Conditionals, Arithmetic and Syscalls. Everything else is done by combinations of these features, typically providing some sort of standard library. This makes it much easier to write Parsers and in turn Transpilers for these languages. A context-free grammar with a few dozen production rules suffices to parse large portions of these languages.
One of the main problems in parsing COBOL is that disaster of „natural language syntax“ which requires a context free grammar with hundreds and thousands of production rules. Just the fact that in COBOL, a period character terminates all open blocks, doubles the number of production rules for statements. Also, COBOL has no standard library – everything you can do with COBOL is part of the core-language! Every use-case needs its own separate production rule in the grammar!
At the time of writing, our COBOL parser has 300 production rules and more than 7000 lines of code. Our lexer has about 300 regular expressions. We estimate this to cover about 40% of the core-language.
Modern languages all require your code to have a certain structure and abide to it. For example: when you are in a function, you stay in that function (or a function it calls) until the function is terminated. A Transpiler would just transfer that structure unwittingly – it can’t even help it.
COBOL on the other hand doesn’t require such a structure. COBOL allows and encourages you to break through any boundaries that you might reasonably see in your code. You can for example just temporarily jump from one function into the middle of a different function (no, this causes no problems with scopes of variables, because all variables are global anyways!). Therefore, a Transpiler needs to force the code into appropriate structures. This is the other excruciatingly hard part of translating COBOL to other languages. You need very deep control-flow analysis techniques (fixed-point iteration) just to determine, if you can translate your code in a „sane“ way or if you have to translate it to its Kleene Normal Form (which causes lots of additional, hard to understand, boilerplate code).
AND you can’t even use the fixed-point iteration right away, because the abominable out-of-line PERFORM statements cause gaps in the control-flow graph! We needed a Model-Checker for Computation Tree Logic to close these gaps, first. This makes 1800 lines of extremely sophisticated code before you can even start to translate COBOL without risking ugly or false translations!
Speaking of global variables – Even if you were able to translate COBOL to Java perfectly – Do you really want all your variables to be global and static? This would be a mess! You would have to go through all your variables manually (imagine how much work that would be!) and make them local if possible. Otherwise, you wouldn’t – for example – be able to run the same method on multiple threads in parallel, because the threads would break each others‘ data. And remember that in COBOL, paragraphs typically pass data back and forth by using common, global variables, so you can’t even run method A on thread 1, when you run method B on thread 2, if A and B have some variable in common.
We use our control-flow analysis to determine, which variables are only used in a local way. Such variables are then turned into local variables. This takes away at least a huge part of your manual post-processing.
All modern languages are designed to be LR(1). Being LR(1) or LL(1) or even just LR(k) or LL(k) is what makes it possible to translate a grammar to a deterministic pushdown automaton… or… you know… a computer program!
In COBOL, for example the precedence of the AND depends on the type of the variables it combines. If you say IF A > B AND C then the AND binds stronger than the >, if C is a numeric variable (the expression is then equivalent to IF A>B AND A>C). But the AND binds weaker than >, if C is a boolean variable.
This illustrates that COBOL is not only outside of LR(k) and LL(k) for any constant k (which would be a disaster by itself already) – no – it is even worse! It is even the case that every grammar for COBOL must be ambiguous! This may sound very theoretical, but for someone who knows compiler-construction, this is a nuclear meltdown. To deal with this problem, you really need to go down to the very bottom of the bag of tricks. You need to allow the ambiguities by switching to GLR Parsing and then you need to write your own custom merge procedures to resolve the ambiguities manually. Otherwise, you wouldn’t even be able to use compiler-generators to parse COBOL! And the parsers you get like this aren’t even particularly performant or memory-efficient. Damn. This. Language!

In our transpiler-framework, all front-ends and back-ends share a single, common middle-end – except for the COBOL front-end. The COBOL front-end has its own, separate middle-end, from which we translate to the common middle-end!

When I’ve heard that scientists complained about COBOL being designed without any scientific input, I thought „so what? Are you miffed because they didn’t want to play with you?“, but after learning about the above problems, I realized that – yes – COBOLs designers really should have talked to someone who (unlike them!) understands what they do.

Maybe – just Maybe – COBOL wouldn’t be such a pile of shit, then!

Recent Posts