What is Legacy Code? – Legacy Coder Podcast #5

What is Legacy Code?

This podcast is called “Legacy Coder” but what exactly is legacy code? I talk about my definition of the term in the fifth episode of the Legacy Coder Podcast.

Legacy Code?

  • What is Legacy Code?
    • Definition by Michael Feathers: “Code without tests.”
    • Code of a certain age.
      • Brown field instead of green field.
    • “Old” languages or platforms.
      • Natural, COBOL, ABAP, Mainframes.
      • But also J2EE.
    • Code that’s hard to change or maintain.
      • You can write “new” legacy code.
      • You can also write legacy code in modern languages like Java or C#.
    • Big Balls of Mud, Monoliths.
      • Duplicated code.
      • Hard to separate into individual pieces of functionality for reuse.
      • Different concerns are bundled together (see title image).
    • Code that lacks certain quality characteristics.
      • Not readable, not modularized, not consistent, hard to understand, deeply nested, similar things are done differently, no patterns.
  • How can you get rid of legacy code?
    • Why would you want to get rid of the code in the first place?
      • “Don’t forget – having legacy software is often a sign of success. Your business was successful to last long enough for your software to become legacy.” [Sam Newman]
      • High maintenance costs, aging/retiring workforce, unable to implement new requirements.
    • A big rewrite is almost never the answer. But sometimes.
    • Gradually improve the quality of your codebase.
      • Introduce tests, e.g. compare log files before/after.
    • Integrate the legacy code base into your modern architecture, e.g. with webMethods and EntireX for Adabas/Natural applications.

A short piece of Legacy Code in (pseudo) Natural

Here’s how many of the old Natural modules I encounter in my day job look like:

DEFINE DATA
LOCAL USING DDMVIEW
END-DEFINE

READ IMPORTANT-DDM BY SUPERDESCRIPTOR

    IF IMPORTANT-DDM.FIELD EQ 1
        ADD 100 TO IMPORTANT-DDM.FIELD
        UPDATE
        END TRANSACTION
    ELSE
        ESCAPE TOP
    END-IF

    INPUT USING MAP 'OUTPUT'

END-READ

END

Database access, business logic, and the presentation of the results to the user (UI) are all bundled together into a single module. This becomes a maintenance nightmare quickly and is very hard to test because the individual concerns can’t be separated for testing.

This module could be split up into 5 different modules that only do one thing, can therefore be reused in different scenarios, and can easily be (unit) tested:

  • Reading the database (e.g. subroutine READ-DATA)
  • Processing the data, a.k.a. your “business logic” (e.g. subroutine PROCESS-DATA)
  • Saving data to the database (e.g. subroutine SAVE-DATA)
  • Showing the results to the user (e.g. subroutine DISPLAY-DATA)
  • Orchestrating the individual steps to implement the whole use case (the main program)

Here’s how the refactored main program would look like:

DEFINE DATA
LOCAL USING ARRDATA
END-DEFINE

PERFORM READ-DATA ARRDATA
PERFORM PROCESS-DATA ARRDATA
PERFORM SAVE-DATA ARRDATA
PERFORM DISPLAY-DATA ARRDATA

END

Recommended reading (and hearing)

In his book Working Effectively with Legacy Code* Michael Feathers shows different ways of introducing automated tests into a legacy code base. He uses C++ in his examples but the underlying ideas can be applied to any other programming language, too.

Michael Feathers - Working Effectively with Legacy Code (Robert C. Martin Series) (Affiliate)*

Robert C. Martin wrote my all time favourite book for software developers: Clean Code*. If you haven’t read it already, grab a copy now and read it from front to back! No matter what programming language you’re using, you will definitely find lots of ways to improve your existing code in here.

Robert C. Martin - Clean Code: A Handbook of Agile Software Craftsmanship (Affiliate)*

In the very first episode of this podcast I talked about how to unit test your Natural application. In my opinion, that’s a very important step in modernizing a legacy code base.

Unit Testing Natural Applications - Legacy Coder Podcast #1

Links

Return code 82 when running ftouch for a Natural FUser

Today we had a problem with one of our Natural FUsers. When trying to add new sources with ftouch, we got the following error message:

user@server ~ $ ftouch fuser=22,173 lib=ACC sm -b -d


        FTOUCH UTILITY V 6.3.13 PL 0   Software AG 2012

Error  : Mass update could not be started.
          Return code 82 received.

As the return code didn’t help with finding a solution, I kicked off strace and followed the output until the error message was shown:

strace -f -v -s 2014 -o /tmp/stracelog.txt ftouch fuser=22,173 lib=ACC sm -b -d
  • -f: Trace child processes as they are created by currently traced processes as a result of the fork(2) system call.
  • -v: Print unabbreviated versions of environment, stat, termios, etc. calls.
  • -s strsize: Specify the maximum string size to print (the default is 32).
  • -o filename: Write the trace output to the file filename rather than to stderr.

Here comes the interesting part:

stat("/home/macke/fuser", {st_dev=makedev(253, 2), st_ino=2007056, st_mode=S_IFDIR|S_ISGID|0775, st_nlink=4, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=4096, st_atime=2015/06/02-12:14:40, st_mtime=2015/06/02-12:14:33, st_ctime=2015/06/02-12:14:39}) = 0
open("/tmp/NCFD00b30016.LCK", O_RDONLY) = 3
read(3, "B24B\0\0\0\0\1\0\0\0FD00b30016\0\0006\200\34\0\0\0\0\0", 32) = 32
close(3)                          = 0
semctl(1867830, 0, GETVAL, 0)     = 0
semctl(1867830, 1, GETVAL, 0)     = 9999
unlink("/home/macke/fuser/ACC/FILEDIR.SAG") = -1 ENOENT (No such file or directory)
semop(1867830, 0x7ffdbcb66ab0, 1) = -1 EACCES (Permission denied)
write(1, "Error  : Mass update could not be started.\n", 43) = 43
write(1, "          Return code 82 received.\n", 35) = 35

Apparently, after opening some kind of temporary file under /tmp, a system call to semop couldn’t be executed (see EACCES (Permission denied)).

Without searching for the cause any longer, I simply deleted all the temporary files under /tmp/NCFD* (who cares for temporary files, anyway?) and ftouch ran successfully immediately:

user@server ~ $ ftouch fuser=22,173 lib=ACC sm -b -d


        FTOUCH UTILITY V 6.3.13 PL 0   Software AG 2012

Ftouch request executed with success.

How to determine a Natural module’s caller

I wanted to find out, from which module another Natural module was called. My goal was to make sure, that the module can only be called from a certain other module and raises an error, if a “disallowed” module calls it. I don’t want to get into the details here of why this is a bad idea in the first place 😉

In Ruby, this is a one liner (see Any way to determine which object called a method?):

caller.first

As it turns out, in Natural it’s not that simple. However, it’s not that hard, either. Thanks to a forum post (see Previous Program System Variable) I was able to quickly implement a short subroutine that does the job. It uses User Exit USR0600N (Get program level information) and looks like this:

DEFINE DATA
*
PARAMETER
01 P-CALLER (A8)
*
LOCAL
01 #NAMES (A8/1:32)
01 #LEVEL (P3/1:32)
*
01 #I (I4)
01 #STACK-SIZE (I4)
01 #INDEX-CALLER (I4)
*
END-DEFINE
*
DEFINE SUBROUTINE GET-CALLER
*
RESET P-CALLER
*
CALLNAT 'USR0600N' #NAMES(*) #LEVEL(*)
*
FOR #I 1 *OCC(#NAMES)
  IF #NAMES(#I) NE ' '
    #STACK-SIZE := #I
  END-IF
END-FOR
*
#INDEX-CALLER := 3
IF #STACK-SIZE GE 3
  P-CALLER := #NAMES(#INDEX-CALLER)
END-IF
*
END-SUBROUTINE
*
END

It can be called like this:

PERFORM GET-CALLER #CALLER

USR0600N returns the Natural modules currently on the stack in descending order (as you would expect from a stack). So if STACK calls STACK2 and STACK2 calls STACK3 and STACK3 calls GET-CALLER, USR0600N returns:

GET-CALLER (index 1; in fact, this would be the module's name, e.g. "GETCALL")
STACK3 (index 2)
STACK2 (index 3)
STACK (index 4)

This should explain the logic in GET-CALLER above. For the call chain above, WRITE *PROGRAM 'was called by <' #CALLER '>' results in:

STACK3 was called by <SMSTACK2>
STACK2 was called by <SMSTACK >
STACK  was called by <        >

Performance of array redimensioning in Natural

As I found out totay, the performance of redimensioning an array in Natural largely depends on the statement you use. I compared RESIZEand EXPAND and found out, that RESIZE is more than two times slower than EXPAND. With bigger arrays, RESIZE may even be up to 20 times more slowly than EXPAND!

Unfortunately, the documentation for the two statements is almost identical (see RESIZE and EXPAND). So there is no hint on why the performance is so drastically different.

Example program:

DEFINE DATA
*
LOCAL
01 #I (N8)
01 #ARR (A8/1:*)
01 #START (T)
01 #END (T)
01 #TIME (T)
01 #N (N8)
END-DEFINE
*
#N := 100000
*
#START := *TIMN
*
REDUCE ARRAY #ARR TO 0
FOR #I 1 #N
  RESIZE ARRAY #ARR TO (1:#I)
END-FOR
*
#END := *TIMN
#TIME := #END - #START
WRITE 'RESIZE' #TIME
*
#START := *TIMN
*
REDUCE ARRAY #ARR TO 0
FOR #I 1 #N
  EXPAND ARRAY #ARR TO (1:#I)
END-FOR
*
#END := *TIMN
#TIME := #END - #START
WRITE 'EXPAND' #TIME
*
END

Result:

RESIZE 00:00:11
EXPAND 00:00:04

If I use a more realistic array (that resembles a real database row), the result is even more obvious:

01 #ARR (1:*)
  02 #A1 (A8)
  02 #A2 (N8)
  02 #A3 (A) DYNAMIC
  02 #A4 (L)
  02 #A5 (N12,7)
  02 #A6 (A100)
  02 #A7 (A1000)

Result (after only 10,000 iterations):

RESIZE 00:01:04
EXPAND 00:00:18

And another result (after 20,000 iterations):

RESIZE 01:25:02
EXPAND 00:03:23