Evolution of a Computer Application

Author(s):

John J. Wavrik

It is possible to criticize and improve upon almost everything done so far. While it is good to reevaluate and improve code, it has proved to be bad to do this prematurely. The first goal should be to produce something that works and to experiment with options. A working system is the most important tool for developing improved versions. The modularity of Forth-produced systems often makes it possible to revise segments of a system while retaining code that comes before and after.

It is often not possible, or even desirable, to pre-conceive a system and design it on paper before writing any code. Forth does not require this (an advantage for this type of work). It usually becomes apparent only when a system is used what parts of it need speeding up, what parts could use a more congenial nomenclature, what functions are really needed, how they should work, and what form is best for the information they produce. This approach to system design can be called "middle out" coding: A rudimentary version of a system is produced, and it is refined and expanded as it is being used.

Changing the Language to Fit the Program
Choosing Better Names for Words
Inconsistent Methods of Action
Improving Speed
Forth Style
Eliminating Magic Numbers
Back to the Isomorphism Problem

a. Changing the language to fit the program

I preferred, for this article, to transport the earlier code to Win32Forth, which is used to run the current Groups32. The earlier Forth system used 16-bit integers and the current system uses 32-bit integers. In the earlier version, the group data were stored on disk using a virtual memory mechanism. In the modern version, the data are stored in memory as part of the dictionary. This change in the way data are stored did not actually require altering the code for the rest of the system - it was handled by an approach that may be unique to Forth: Instead of changing the code, change the language. We define a word BLOCK that reads data stored in a segment of memory. It appears to the rest of the code to act just as if the older virtual memory scheme was being used:

\ +++++++++++++++++++++++++++++++++++++++
\ +     pseudo block structure          +
\ +++++++++++++++++++++++++++++++++++++++

CREATE  PSEUDO  12 1024 * ALLOT
  : BLOCK  ( n -- addr )
      1024 * PSEUDO +  ;

When the program is loaded, group data are read from the original block file into this segment of memory. The rest of the program does not know that the definition of BLOCK has been given a new meaning.

b. Choosing better names for words

Now let me talk about some of the things I didn't like and that were changed. For one thing, I didn't like the names that I gave to some things in the original version. The word GLIMITS, for example, is used to put the loop limits on the stack for a loop over all elements of the current group. The etymology is obvious -- but it is a name for how the word does something rather than what it does. In use I found myself always having to look up what name I had used -- there is something not at all memorable about the name "GLIMITS". In Groups32, this has been replaced by the word "For_All_Elements". For similar reasons, the word EORDER has been replaced by EleOrder. (Note 7)

When I have used this code with others new to Forth, I found that it was helpful to distinguish words that come as part of the Forth language and those that have been added as part of the Groups package. Words that are part of Forth are typed in caps, words added for the system are in mixed case. Contrast:

: ORDERS ( grp# -- )   >GROUP  
     GLIMITS DO I EORDER . LOOP ;

: Orders ( grp# -- )   >Group  
     For-All-Elements 
     DO I EleOrder . LOOP ;

As I have used Forth, the importance of a good choice of names has become clear. With proper choices, one can often read a section of code without having to refresh one's memory by looking up a description of the word in a glossary or in the source code. A properly chosen name for a word can often suggest what it does.

c. Inconsistent methods of action

I also do not like inconsistencies among actions of words. To enter elements as parameters, we have several possibilities. In the code above we have words that require the ASCII code of group letters in advance -- and we introduced special words such as /A, /B, /C to allow this to be done conveniently. The word EORDER uses this approach: We say /B EORDER to get the order of B. The word POWERS, however, is based on the handler GETE. We say POWERS B to get the powers of B. Both are words that take an element of the current group as a parameter. Both approaches are valid.

I have found it confusing to have some words that take the parameter in advance, and some that take it afterwards. This practice requires that you remember this matter of placement for each word. It is better to have similar words act consistently, if possible. Words that act on elements should all either take their argument in advance or take it afterward. It is more consistent with the style of Forth to have parameters for the word occur in advance - so, eventually, I eliminated the approach based on GETE. It is easy to change the definition of POWERS to take its argument from the stack:

Syntax: POWERS B	Syntax: /B POWERS
: POWERS GETE ID GLIMITS DO DUP .ELE OVER G* DUP ID = IF 2DROP LEAVE THEN LOOP ;	: POWERS ID GLIMITS DO DUP .ELE OVER G* DUP ID = IF 2DROP LEAVE THEN LOOP ;

When faced with this kind of choice, it is usually not wise to spend a lot of time in meditation before writing the code. The best approach is to implement both possibilities and use them for a while before deciding the winner. (In keeping with the evolutionary theme of this article, one adopts a "survival of the fittest" approach to modifications.)

d. Improving speed

Notice that the group tables store letters A, B, C, ... for the table entries. Since finding the product of two elements involves calculating a location in memory, every use of G* involves first subtracting ID (the ASCII code for the identity) from the two letters. It should be possible to use integers 0, 1, 2, ... for the internal representations of the group elements, which would eliminate some extra calculations for G*. It does require, however, that group elements be converted to and from ASCII upon input and output. Speeding up G* (used many times within a computation) is far more important than speeding up printout (which occurs only at the end). Input and output operations occur relatively infrequently and tend to be sufficiently fast.

I should also be mention at this point that one can define selected Forth directly in assembly language. The word G* is executed very frequently, since most words in the system use it either directly or indirectly. In the following chart, I compare execution times for three different definitions for G*: the original Group16 definition (group elements stored as letters), a Groups32 version using a "high level" Forth definition, and the current Groups32 version using an assembly language definition.

The timings were obtained using the same version of Forth (Win32Forth) running on the same computer - so differences are due entirely to the way the data are represented and the way G* is coded.

Timing for 10,000,000 executions of G*
Original Groups16 // table stores group letters A, B, C, …	98420 ms
Current Groups32 ver 7.0 // table stores 0, 1, 2, …	35750 ms
Same as above with G* written in assembly language	7300 ms

Of course, the value of trying to improve the speed of a single word depends on how much gain can be achieved, how frequently the word is used, and whether or not the additional performance is needed. If a word is used mainly interactively (i.e. a command usually issued from the keyboard) and it returns an answer in 1/10 second, it is hardly worthwhile to try to get it to execute in 1/100 second. A word like G*, however, is executed quite frequently. (Note 8)

One should check code to see if the same computation is made repeatedly and should be replaced by code that computes once and stores the result. A number can be retrieved from a table far more quickly than it can be found by looping through a list. An example of this is the word XG' (badly named) that finds the inverse of an element under a map. In order to compute the product XG*, the inverse must be found twice. In order to display the translated table (RTABLE), the product XG* must be computed for each pair of elements. It is possible to rewrite >XG (the word that stores the map) so that it computes and stores the inverses. I made improvements like this in the eventual ISOMORPHISM command in Groups32.

e. Forth Style

Groups32 is a software system that is designed to be modified and expanded as it is used. We will gradually produce a collection of "top level" commands to be used interactively. These are supported by a collection of foundational and auxiliary words.

The fact that Forth programming results in a large collection of words enhances the ability of a user to extend and modify the system. Forth style typically emphasizes the production of words with fairly short definitions. Deeply nested loops and control structures can be avoided by introducing auxiliary words. By keeping definitions short and uncomplicated, it is easier to spot errors. Equally important for system development is the fact that words can be executed interactively - one can test individual words in piecewise fashion as they are being written.

This process of breaking definitions into smaller parts is called "factoring". The most complicated word we have written so far is ORDERS:

: ORDERS  ( grp# --  )  CR >GROUP
      ." Group number " GNUM .
      ." of Order " GORD .  CR  OCLEAR
      GLIMITS  DO  I EORDER DUP +OCNT I ORD!  LOOP
      GORD 1+ 1 DO  GORD I MOD 0=
         IF  3 SPACES  I 'OCNT @ 2 .R
             ."  elements of order "  I 2 .R ." :   "
             GLIMITS  DO I 'ORD @ J =
                          IF  I .ELE THEN
                      LOOP CR
         THEN LOOP ;

Notice that some factoring has taken place: Clearing the tables (OCLEAR), computing the order of an element (EORDER), incrementing the count (+OCNT), storing the order (ORD!), and printing an element (.ELE) have all been delegated to separate words. When writing ORDERS, we need to be able to perform these tasks, but we can forget the details of how they were performed.

It is certainly possible to factor ORDERS still further. Some Forth writers like to make the top-level word look like this:

: ORDERS ( grp# -- )
   Install-Group
   Initialize-Tables
   Compute-Counts
   Display-Info ;

There is another reason for factoring: to separate sections of code that may be independently useful. Here is a hypothetical word that, for some reason, needs to determine if the current group is abelian. The highlighted code accomplishes this by testing whether each pair of elements commutes.

: BigWord
   <some stuff>

   TRUE
   GLIMITS DO
     GLIMITS DO
       J I G* I J G* <>
   IF DROP FALSE THEN
   LOOP LOOP

      <more stuff> ;

Making this a separate word will clarify the definition of BigWord, and it will also produce a word that is potentially useful elsewhere:

: Abelian? ( -- t/f ) 
  \ is current group abelian?
    TRUE
    GLIMITS DO  GLIMITS DO
        J I G* I J G*  <>
        IF DROP FALSE THEN
    LOOP LOOP ;

Then we replace the code segment in BigWord by the word Abelian?

: BigWord
    <some stuff>
    Abelian?
    <more stuff>

f. Eliminating Magic Numbers

A feature of good style in any programming language is the elimination of "magic numbers," that is, numbers that have special significance to the system or program and that appear in the code as literal numbers. Examples of magic numbers are 16 (which appears as the maximum order of a group in Groups16), or 49 (which is the number of tables in the original Almquist posting), or 20 (which is used to allot storage for arrays -- probably because it was bigger than 16 and seemed like a nice number).

It is preferable to define magic numbers as constants and use the names rather than the numbers. This has the effect of making code more readable. The reader (most often the author) does not have to wonder why a certain number appears in the code. MaxOrder more clearly indicates that (in the current system) the maximum order of a group is being used. This is preferable to using 16, which could be the maximum order of a group -- but, then again, something else. Or that may be the maximum order now, but not when the system is extended. The elimination of magic numbers is also helpful if the code is used in a different setting. In the present case, the system was eventually changed to handle groups of order up to 32 and it was transported from a Forth system with 16-bit stack elements to one with 32.

We have already discussed the use of the word CELLS. This concept (with a different name) was introduced to me by the Kitt Peak VAX-Forth. At one time Forth was used very heavily by astronomers, and Kitt Peak produced a version of Forth allowing code to be written to run, without change, on 16 and 32-bit systems. I avoided a lot of grief in converting Groups16 to Groups32 by the fact that magic numbers appeared mainly in a few low-level words.

It is wisest to write code assuming that you will want to understand it yourself in 10 years, and perhaps modify it and use it for a different purpose. Mathematicians tend to treat computer code like theorems: They prefer not to re-prove something they already have proved.

g. Back to the Isomorphism Problem

We can distinguish groups of order up to 15 by the orders of the elements. The only groups in Almquist’s list with the same set of element orders are groups 20 and 21 (of order 12), and we have found an explicit isomorphism between them.

Groups of order 16 provide the first counterexample to the hypothetical theorem. Groups 43 and 45 have the same numbers of elements of each order, but group 43 is abelian, while group 45 is not.

Further isomorphisms occur in Almquist’s tables among groups of order 16. There are also groups of order 16 that are not isomorphic but that agree in element orders and in commutativity (abelian or not). Thus we still do not have enough properties to classify groups of order 16 up to isomorphism.

Here are some additional, easily computed, properties that may be useful - I wrote them as experiments to find useful invariants. They do not appear in either Groups16 or Groups32, since they did not seem to be of lasting interest. They were, in fact, used to distinguish the original set of group tables.

Number of Commuting Pairs

Eventually (in Groups32) we compute the center of a group and can easily compute the size of the center. At this point we just run through all pairs of elements and count those pairs that commute:

: #Commuting  ( grp -- n )  >GROUP
      0
      GLIMITS DO   GLIMITS DO
      J I G*  I J G* =
      IF 1+ THEN
      LOOP LOOP ;

Number of Squares

We can define #SQUARES to count how many elements in a group are squares:

CREATE SQ  20 ALLOT
: #SQUARES  ( grp# -- n )  >GROUP
    SQ 20 ERASE
    GLIMITS DO
         I I G* ID - SQ + 1 SWAP C!  
            LOOP
    0 
    20 0 DO SQ I + C@  + LOOP ;

If one actually wishes to see which elements are squares, the following word will display the squares:

: SQUARES  ( grp# --  )  >GROUP
   SQ 20 ERASE
   GLIMITS DO
        I I G* ID - SQ + 1 SWAP C!  
           LOOP
   20 0 DO SQ I + C@ 
           IF I ID + .ELE THEN
        LOOP ;

Text Box: The task that motivated this work was accomplished at this point.

It turns out that, at this point, we have implemented enough properties to classify groups up to order 16. Coupled with the mechanism for exhibiting isomorphisms, we can detect which of Almquist's tables are for isomorphic groups. (There are actually only 42 groups of orders 1-16 up to isomorphism.)

The following tables were found to be isomorphic: (20,21), (33,40), (36,38), (32,39,42), (34,41). In the process, I produced a method that will quickly identify any group of order 1-16 up to isomorphism.

John J. Wavrik, "Evolution of a Computer Application - Comments on Improvements," Convergence (December 2004)

JOMA